Search 800 + Posts

Jan 29, 2026

The Technical Blueprint: Implementing Select AI with RAG in Oracle 26ai

In my previous post (Mastering Select AI with Retrieval Augmented Generation, we explored why Retrieval-Augmented Generation (RAG) is the future of the conversational enterprise. 
Today, we’re opening the hood. As an enterprise architect, I know that "vision" only matters if you can implement it securely and at scale.

Implementing Select AI with RAG in Oracle Autonomous Database 26ai is a masterclass in automation. It moves the heavy lifting of document processing, vectorization, and prompt augmentation directly into the database kernel.
Here is your step-by-step technical guide to turning static PDFs into a dynamic knowledge base.


Phase 1: Establishing the Secure Bridge

Before the database can "read," it needs permission to "talk" to your cloud storage. We begin by creating a secure path to your OCI Object Storage bucket where the source documents live.

 1. Secure Credentialing

Use the DBMS_CLOUD package to store your OCI/LLM provider credentials. This ensures your private keys and fingerprints are managed within the database's internal security vault rather than being hardcoded.

-- 1. Create OCI Storage Credential
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'OCI_RAG_CRED',
    user_ocid       => 'ocid1.user...',
    tenancy_ocid    => 'ocid1.tenancy...',
    private_key     => '...your_private_key...',
    fingerprint     => '...your_fingerprint...'
  );
END;
 

 





Validate you credentials by executing below Query

 SELECT * FROM DBMS_CLOUD.LIST_OBJECTS('OCI_RAG_CRED', https://objectstorage.[region].oraclecloud.com/n/[ns]/b/[bucket]/o/",');

Why we need OCI Object Storage Credential:

In Oracle ADW 26ai, even if your bucket is Public, the standard RAG procedures like DBMS_CLOUD_AI.CREATE_VECTOR_INDEX still strictly require a credential_name parameter in their attributes.

While a public bucket allows anyone with the URL to download files, the Select AI internal automation (which uses a background process called a "Pipeline") is designed to expect a credential object to verify the connection.

-- 2. Create OPEN AI Credential
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'AI_RAG_CRED',
    username        => 'OPENAI', -- Or 'OCI'
    password        => 'OPENAI api -key'   <paste your opean AI key>
  );
END;

 3. Network ACL Configuration

Oracle is secure by default. You must explicitly allow the database to communicate with the Object Storage endpoint using DBMS_NETWORK_ACL_ADMIN.
BEGIN
  DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
    host => 'objectstorage.us-phoenix-1.oraclecloud.com', -- Update for your region
    ace  => xs$ace_type(privilege_list => xs$name_list('connect', 'resolve'),
                        principal_name => 'ADMIN',
                        principal_type => xs_acl.ptype_db));
END;


Phase 2: Creating the AI Profile (The Brain-Memory Link)

The AI Profile is the configuration that tells Select AI which LLM to use and which Vector Index contains the "grounded" enterprise knowledge. This links the prompt generator (LLM) with the content retriever (Vector Store).

BEGIN
  DBMS_CLOUD_AI.CREATE_PROFILE(
      profile_name => 'PDF_ASSISTANT',
      attributes   => '{"provider": "openai", -- Or "oci"
                        "credential_name": "AI_CRED",
                        "vector_index_name": "PDF_KNOWLEDGE_BASE",
                        "model": "gpt-4o"}'
  );
END;
/

Query to Check Profile Attribute

SELECT attribute_name, attribute_value FROM user_cloud_ai_profile_attributes WHERE profile_id = (SELECT profile_id FROM user_cloud_ai_profiles WHERE profile_name = 'ENTER_YOUR_PROFILE_NAME');

For my example profile name -  PDF_ASSISTANT


Phase 3: Building the Vector Store (Automated)

In legacy RAG setups, you had to manually extract text, chunk it, call an embedding API, and manage the vector database yourself. In 26ai, the DBMS_CLOUD_AI.CREATE_VECTOR_INDEX procedure automates this entire pipeline.

It converts documents (PDF, DOC, JSON, etc.) to text, chunks them, generates embeddings via a specified model, and populates the vector store automatically.

BEGIN
  DBMS_CLOUD_AI.CREATE_VECTOR_INDEX(
    index_name => 'PDF_KNOWLEDGE_BASE',
    attributes => '{
      "vector_db_provider": "oracle",
      "location": "https://objectstorage.[region].oraclecloud.com/n/[ns]/b/[bucket]/o/",
      "object_storage_credential_name": "OCI_RAG_CRED",
      "profile_name": 'PDF_ASSISTANT' – Profile created in Phase 2

       "vector_table_name":"support_site_vector",

       "vector_distance_metric":"cosine"
    }'
  );
END;
/


Viewing Configured Attributes

Once you have created an index and want to see what attributes were used, you can query the data dictionary: 

SELECT attribute_name, attribute_value FROM user_cloud_vector_index_attributes WHERE index_name =<YOUR_INDEX_NAME>

To check the status of your vector index creation, you can query the USER_CLOUD_PIPELINES view. This view tracks the progress and health of the automated background jobs that handle the document processing and embedding. 

--Query to check Status of Pipeline

SELECT  pipeline_id, pipeline_name, status, last_execution, status_table FROM user_cloud_pipelines;

The STATUS_TABLE column in the USER_CLOUD_PIPELINES view identifies an internal tracking table created by the database to manage a specific pipeline's operations.

When you create a Vector Index using DBMS_CLOUD_AI.CREATE_VECTOR_INDEX, Oracle automatically launches a background pipeline to handle the heavy lifting (downloading files, chunking text, and generating embeddings). The STATUS_TABLE is where the database records the granular progress of every single file in your Object Storage location.

Why is the STATUS_TABLE useful?

While the USER_CLOUD_PIPELINES view tells you if the entire process succeeded or failed, the table named in the STATUS_TABLE column allows you to see:

  • Individual File Status: Which specific PDFs were processed successfully and which ones failed.
  • Error Details: The exact error (e.g., "File too large" or "Unsupported format") for a specific document.
  • Timestamps: When each file began and finished processing.

Find the table name:

SELECT status_table FROM user_cloud_pipelines WHERE pipeline_name like VECTOR_INEX_NAME%;

Query the resulting table (replace PIPELINE_STATUS_XYZ with value in status_table column in above query):

 SELECT object_name, status, error_message, STATUS FROM PIPELINE_STATUS_XYZ

Status Column in the PIPELINE_STATUS_XYZ show if is running/completed or error out.

Status

Comments

FAILED

If you see "FAILED," check the ERROR_MESSAGE column. This will often give you the specific reason, such as a credential issue or an unreachable Object Storage bucket

RUNNING

The index is still processing. Since it has to download, chunk, and embed documents, it can take several minutes depending on the number of files

SUCCEEDED

The pipeline finished. If the index still isn't working, the issue might be with the profile or the query syntax

 Relationship in the RAG Architecture

In the context of Select AI and RAG (Retrieval-Augmented Generation), this table is your primary "debug log" for your knowledge base. If your AI isn't answering questions about a specific document you uploaded, checking this status table will tell you if that document was ever successfully indexed.


Phase 4: Conversational Execution

With the infrastructure in place, querying is simple. We use the SET_PROFILE procedure to activate the context and then issue a natural language query using the narrate action.

-- Activate the profile context
EXEC DBMS_CLOUD_AI.SET_PROFILE('PDF_ASSISTANT');

-- Ask your question based on the document knowledge base
SELECT AI narrate based on the uploaded manual, what is the warranty policy?

When this query runs, the database generates an embedding for your question, performs a similarity search in the vector store to find the most relevant chunks, and sends those chunks to the LLM to generate a grounded response.

 

By moving the RAG process into the database, we haven't just simplified the code; we've made the enterprise data store fundamentally more intelligent.