In my previous post (Mastering Select AI with Retrieval Augmented Generation) , we explored why Retrieval-Augmented
Generation (RAG) is the future of the conversational enterprise. Today,
we’re opening the hood. As an enterprise architect, I know that
"vision" only matters if you can implement it securely and at scale.
Implementing Select AI with RAG in Oracle Autonomous Database
26ai is a masterclass in automation. It moves the heavy lifting of document
processing, vectorization, and prompt augmentation directly into the database
kernel.
Here is your step-by-step technical guide to turning static PDFs
into a dynamic knowledge base.
Here is your step-by-step technical guide to turning static PDFs into a dynamic knowledge base.
Phase 1: Establishing the Secure Bridge
Before the database can "read," it needs permission to
"talk" to your cloud storage. We begin by creating a secure path to
your OCI Object Storage bucket where the source documents live.
1. Secure Credentialing
Use the DBMS_CLOUD package to store your OCI/LLM provider
credentials. This ensures your private keys and fingerprints are managed within
the database's internal security vault rather than being hardcoded.
-- 1. Create OCI
Storage Credential
BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => 'OCI_RAG_CRED',
user_ocid => 'ocid1.user...',
tenancy_ocid => 'ocid1.tenancy...',
private_key => '...your_private_key...',
fingerprint => '...your_fingerprint...'
);
END;
Validate you
credentials by executing below Query
SELECT * FROM DBMS_CLOUD.LIST_OBJECTS('OCI_RAG_CRED', https://objectstorage.[region].oraclecloud.com/n/[ns]/b/[bucket]/o/",');
Why we need OCI Object Storage Credential:
In
Oracle ADW 26ai, even if your bucket is Public, the standard RAG procedures like DBMS_CLOUD_AI.CREATE_VECTOR_INDEX
still strictly require a credential_name
parameter in their attributes.
While a
public bucket allows anyone with the URL to download files, the Select AI internal automation
(which uses a background process called a "Pipeline") is designed to
expect a credential object to verify the connection.
-- 2. Create OPEN AI
Credential
BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => 'AI_RAG_CRED',
username => 'OPENAI', -- Or 'OCI'
password => 'OPENAI api -key' <paste your opean AI key>
);
END;
3. Network ACL Configuration
Oracle is secure by default. You must explicitly allow the
database to communicate with the Object Storage endpoint using DBMS_NETWORK_ACL_ADMIN.
BEGIN
DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
host => 'objectstorage.us-phoenix-1.oraclecloud.com',
-- Update for your region
ace
=> xs$ace_type(privilege_list => xs$name_list('connect', 'resolve'),
principal_name =>
'ADMIN',
principal_type =>
xs_acl.ptype_db));
END;
Phase 2: Creating the AI Profile (The Brain-Memory Link)
The AI Profile is the configuration that tells Select AI
which LLM to use and which Vector Index contains the "grounded"
enterprise knowledge. This links the prompt generator (LLM) with the content
retriever (Vector Store).
BEGIN
DBMS_CLOUD_AI.CREATE_PROFILE(
profile_name => 'PDF_ASSISTANT',
attributes => '{"provider":
"openai", -- Or "oci"
"credential_name": "AI_CRED",
"vector_index_name": "PDF_KNOWLEDGE_BASE",
"model": "gpt-4o"}'
);
END;
/
Query to Check Profile
Attribute
SELECT attribute_name, attribute_value FROM user_cloud_ai_profile_attributes WHERE profile_id = (SELECT profile_id FROM
user_cloud_ai_profiles WHERE profile_name = 'ENTER_YOUR_PROFILE_NAME');
For my example profile name - PDF_ASSISTANT
Phase 3: Building the Vector Store (Automated)
In legacy RAG setups, you had to manually extract text, chunk
it, call an embedding API, and manage the vector database yourself. In 26ai,
the DBMS_CLOUD_AI.CREATE_VECTOR_INDEX procedure automates this entire pipeline.
It converts documents (PDF, DOC, JSON, etc.) to text, chunks
them, generates embeddings via a specified model, and populates the vector
store automatically.
BEGIN
DBMS_CLOUD_AI.CREATE_VECTOR_INDEX(
index_name => 'PDF_KNOWLEDGE_BASE',
attributes => '{
"vector_db_provider":
"oracle",
"location":
"https://objectstorage.[region].oraclecloud.com/n/[ns]/b/[bucket]/o/",
"object_storage_credential_name": "OCI_RAG_CRED",
"profile_name": 'PDF_ASSISTANT'
– Profile created in Phase 2
"vector_table_name":"support_site_vector",
"vector_distance_metric":"cosine"
}'
);
END;
/
Viewing Configured Attributes
Once you
have created an index and want to see what attributes were used, you can query
the data dictionary:
SELECT attribute_name, attribute_value FROM user_cloud_vector_index_attributes WHERE index_name =<YOUR_INDEX_NAME>
To check the status of
your vector index creation, you can query the USER_CLOUD_PIPELINES
view. This view tracks the progress and health of the automated background jobs
that handle the document processing and embedding.
--Query to check Status
of Pipeline
SELECT pipeline_id, pipeline_name, status,
last_execution, status_table FROM
user_cloud_pipelines;
The STATUS_TABLE column in the
USER_CLOUD_PIPELINES
view identifies an internal tracking table created by the database to manage a
specific pipeline's operations.
When you
create a Vector Index using DBMS_CLOUD_AI.CREATE_VECTOR_INDEX,
Oracle automatically launches a background pipeline to handle the heavy lifting
(downloading files, chunking text, and generating embeddings). The STATUS_TABLE
is where the database records the granular progress of every single file in your
Object Storage location.
BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => 'OCI_RAG_CRED',
user_ocid => 'ocid1.user...',
tenancy_ocid => 'ocid1.tenancy...',
private_key => '...your_private_key...',
fingerprint => '...your_fingerprint...'
);
END;
DBMS_CLOUD_AI.CREATE_VECTOR_INDEX
still strictly require a credential_name
parameter in their attributes.BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => 'AI_RAG_CRED',
username => 'OPENAI', -- Or 'OCI'
password => 'OPENAI api -key' <paste your opean AI key>
);
END;
DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
host => 'objectstorage.us-phoenix-1.oraclecloud.com', -- Update for your region
ace => xs$ace_type(privilege_list => xs$name_list('connect', 'resolve'),
principal_name => 'ADMIN',
principal_type => xs_acl.ptype_db));
END;
DBMS_CLOUD_AI.CREATE_PROFILE(
profile_name => 'PDF_ASSISTANT',
attributes => '{"provider": "openai", -- Or "oci"
"credential_name": "AI_CRED",
"vector_index_name": "PDF_KNOWLEDGE_BASE",
"model": "gpt-4o"}'
);
END;
/
DBMS_CLOUD_AI.CREATE_VECTOR_INDEX(
index_name => 'PDF_KNOWLEDGE_BASE',
attributes => '{
"vector_db_provider": "oracle",
"location": "https://objectstorage.[region].oraclecloud.com/n/[ns]/b/[bucket]/o/",
"object_storage_credential_name": "OCI_RAG_CRED",
"profile_name": 'PDF_ASSISTANT' – Profile created in Phase 2
}'
);
END;
/
Viewing Configured Attributes
USER_CLOUD_PIPELINES
view. This view tracks the progress and health of the automated background jobs
that handle the document processing and embedding. STATUS_TABLE column in the
USER_CLOUD_PIPELINES
view identifies an internal tracking table created by the database to manage a
specific pipeline's operations.DBMS_CLOUD_AI.CREATE_VECTOR_INDEX,
Oracle automatically launches a background pipeline to handle the heavy lifting
(downloading files, chunking text, and generating embeddings). The STATUS_TABLE
is where the database records the granular progress of every single file in your
Object Storage location.Why is the STATUS_TABLE useful?
While the USER_CLOUD_PIPELINES
view tells you if the entire
process succeeded or failed, the table named in the STATUS_TABLE column allows
you to see:
- Individual
File Status: Which
specific PDFs were processed successfully and which ones failed.
- Error
Details: The exact
error (e.g., "File too large" or "Unsupported format") for
a specific document.
- Timestamps: When each file began and finished
processing.
USER_CLOUD_PIPELINES
view tells you if the entire
process succeeded or failed, the table named in the STATUS_TABLE column allows
you to see:Find the table name:
SELECT status_table FROM user_cloud_pipelines WHERE pipeline_name like VECTOR_INEX_NAME%;
Query the resulting table
(replace PIPELINE_STATUS_XYZ with value in
status_table column in above query):
SELECT object_name, status, error_message, STATUS FROM PIPELINE_STATUS_XYZ
Status Column in the PIPELINE_STATUS_XYZ show if is
running/completed or error out.
Status
Comments
FAILED
If you see
"FAILED," check the ERROR_MESSAGE
column. This will often give you the specific reason, such as a credential
issue or an unreachable Object Storage bucket
RUNNING
The index is still
processing. Since it has to download, chunk, and embed documents, it can take
several minutes depending on the number of files
SUCCEEDED
The pipeline finished. If
the index still isn't working, the issue might be with the profile or the
query syntax
Relationship in the RAG Architecture
In the
context of Select AI and RAG
(Retrieval-Augmented Generation), this table is your primary "debug
log" for your knowledge base. If your AI isn't answering questions about a
specific document you uploaded, checking this status table will tell you if
that document was ever successfully indexed.
Phase 4: Conversational Execution
With the infrastructure in place, querying is simple. We use the
SET_PROFILE
procedure to activate the context and then issue a natural language query using
the narrate
action.
--
Activate the profile context
EXEC DBMS_CLOUD_AI.SET_PROFILE('PDF_ASSISTANT');
-- Ask your question based
on the document knowledge base
SELECT AI narrate based on the uploaded manual, what is
the warranty policy?
When this query runs,
the database generates an embedding for your question, performs a similarity
search in the vector store to find the most relevant chunks, and sends those
chunks to the LLM to generate a grounded response.
By moving the RAG process into the database, we haven't just
simplified the code; we've made the enterprise data store fundamentally more
intelligent.
PIPELINE_STATUS_XYZ with value in
status_table column in above query):Status
Comments
FAILED
If you see
"FAILED," check the ERROR_MESSAGE
column. This will often give you the specific reason, such as a credential
issue or an unreachable Object Storage bucket
RUNNING
The index is still
processing. Since it has to download, chunk, and embed documents, it can take
several minutes depending on the number of files
SUCCEEDED
The pipeline finished. If
the index still isn't working, the issue might be with the profile or the
query syntax
EXEC DBMS_CLOUD_AI.SET_PROFILE('PDF_ASSISTANT');
-- Ask your question based on the document knowledge base
SELECT AI narrate based on the uploaded manual, what is the warranty policy?