This blog is about the inner
workings of the Oracle Select AI execution pipeline—the sophisticated machinery
that translates human intent into high-performance, secure SQL code.
Oracle Select AI turns natural language into runnable SQL queries through a process called Prompt Augmentation, which bridges the gap between general-purpose Large Language Models (LLMs) and your specific database schema. It serves as a developer assistant that enables both SQL beginners and experts to interact with data using human language
👉 Why Metadata is Important for Select AI
👉 Oracle Select AI: The Architecture of Natural Language Data
The
Step-by-Step Pipeline
The process follows a specific workflow to ensure the generated SQL is accurate and secure:
1. Natural Language Input: The process begins when a user submits a prompt, such as "Who are the top three baby boomer Big Spenders?".
2. Prompt Augmentation: This is the "key enabler" of the pipeline. Select AI automatically combines the user's prompt with database schema metadata. This metadata includes:
◦ Data types.
◦ Optional comments that explain the purpose of generically named columns.
3. LLM Processing: This
"augmented prompt" is sent to an external AI provider (like OCI
Generative AI, OpenAI, or Azure). The LLM uses its general knowledge (e.g.,
knowing the age range for a "Baby Boomer") alongside the provided
metadata to understand the database structure.
4. SQL Generation: The LLM
generates a syntactically correct SQL query specifically tailored to your
schema.
5. Execution and Security: Depending on the chosen action, the query is executed in the database. It is important to note that Oracle security policies and data access rules are enforced just as they would be for any other SQL query.
Key Components Enabling the Pipeline
To manage this pipeline effectively, Select AI utilizes several specific configurations:
AI Profiles: These are predefined configurations created with the dbms_cloud_ai package. They specify which AI provider to use, the specific model (e.g., GPT-4o), and the Object List (a subset of tables the AI is allowed to consider).
Metadata Tuning: The
quality of the pipeline's output depends on the quality of the metadata. Adding
comments to your columns helps the LLM avoid confusion when dealing with
generic table names like "Table 1" or "Table 2".
Conversation History: If enabled, the pipeline remembers up to the last 10 prompts. This allows the user to ask follow-up questions to refine results without respecifying previous details.
Comparison of Generation Actions
|
Action |
Result
of the Pipeline |
Data
Privacy |
|
show_sql |
Returns only the generated SQL
code for review. |
No actual data sent to LLM. |
|
run_sql |
Executes the query and returns a
rowset (default action). |
No actual data sent to LLM. |
|
explain_sql |
Returns the SQL plus a
step-by-step natural language breakdown. |
No actual data sent to LLM. |
|
narrate |
Runs the query and then summarizes
the results in a conversational tone. |
Sends actual table data to the LLM. |
Preventing Hallucinations
A standard LLM (like ChatGPT) might
"hallucinate" and invent table names that don't exist in your
database when asked a question. By using Prompt Augmentation to feed the
LLM your actual schema metadata, Select AI ensures the generated query is
runnable on your specific instance.
Would you like to know more about
how to use the "explain_sql" action to help your team learn SQL, or
should we discuss how the pipeline handles large schemas with hundreds of
tables?