System Workflow
System Workflow
The rtrapy package implements a Retrieval-Augmented Generation (RAG) workflow. It bridges the gap between real-time web data and Large Language Models (LLMs) by fetching live search results and using them as context for text generation.
The following sections detail the data flow from the initial user query to the final synthesized response.
1. Initialization
To begin the workflow, you must initialize the RTRAConnector with your Google Cloud credentials. This setup establishes the connection parameters required for the retrieval phase.
from rtrapy.rtra_connector import RTRAConnector
# Initialize with Google Custom Search API Key and Search Engine ID
connector = RTRAConnector(api_key="YOUR_GOOGLE_API_KEY", engine_id="YOUR_ENGINE_ID")
2. Retrieval Phase (Google Search)
The workflow starts by querying the Google Custom Search JSON API. The system retrieves the most relevant web pages based on the input string.
- Method:
search(query) - Input:
query(string) — The search term or question. - Output:
list— A collection of dictionary objects containing search metadata (titles, snippets, and links).
# Fetch raw search results
results = connector.search("Latest developments in quantum computing")
3. Context Augmentation
Before passing data to the LLM, the system performs an internal transformation to distill the search results into a usable prompt context.
- Internal Process: The
combine_search_resultsmethod iterates through the top 5 search results and extracts thesnippetfield from each. - Formatting: These snippets are concatenated into a single string, separated by newlines, creating a concentrated knowledge base for the model.
4. Inference Phase (LLM Integration)
In the final stage, the synthesized context is dispatched to the Mistral-7B-Instruct model hosted via Hugging Face Inference API.
The model does not rely solely on its pre-trained weights; instead, it uses the provided snippets as the primary source of truth to generate a response.
- Method:
generate_detailed_response(query) - Input:
query(string) — The user's original query. - Output:
string— The LLM-generated response based on the retrieved web context.
# Execute the full workflow: Search -> Augment -> Generate
response = connector.generate_detailed_response("What are the latest developments in quantum computing?")
print(response)
Data Flow Summary
| Stage | Input | Component | Output |
| :--- | :--- | :--- | :--- |
| Search | User Query | Google Custom Search API | List of Search Items |
| Synthesis | Search Items | combine_search_results | Concatenated Text Snippets |
| Inference | Text Snippets | Mistral-7B (Hugging Face) | Synthesized Answer |
Technical Notes
- Model Information: The system currently utilizes
Mistral-7B-Instruct-v0.2for inference. - Data Processing: Only the top 5 snippets are processed to ensure the payload remains within the model's token context limits and API constraints.
- Error Handling: If the Google API fails or the LLM provider is unreachable, the methods return empty strings or lists respectively, allowing for graceful degradation in your application.