System Workflow

The rtrapy package implements a Retrieval-Augmented Generation (RAG) workflow. It bridges the gap between real-time web data and Large Language Models (LLMs) by fetching live search results and using them as context for text generation.

The following sections detail the data flow from the initial user query to the final synthesized response.

1. Initialization

To begin the workflow, you must initialize the RTRAConnector with your Google Cloud credentials. This setup establishes the connection parameters required for the retrieval phase.

from rtrapy.rtra_connector import RTRAConnector

# Initialize with Google Custom Search API Key and Search Engine ID
connector = RTRAConnector(api_key="YOUR_GOOGLE_API_KEY", engine_id="YOUR_ENGINE_ID")

2. Retrieval Phase (Google Search)

The workflow starts by querying the Google Custom Search JSON API. The system retrieves the most relevant web pages based on the input string.

Method: search(query)
Input: query (string) — The search term or question.
Output: list — A collection of dictionary objects containing search metadata (titles, snippets, and links).

# Fetch raw search results
results = connector.search("Latest developments in quantum computing")

3. Context Augmentation

Before passing data to the LLM, the system performs an internal transformation to distill the search results into a usable prompt context.

Internal Process: The combine_search_results method iterates through the top 5 search results and extracts the snippet field from each.
Formatting: These snippets are concatenated into a single string, separated by newlines, creating a concentrated knowledge base for the model.

4. Inference Phase (LLM Integration)

In the final stage, the synthesized context is dispatched to the Mistral-7B-Instruct model hosted via Hugging Face Inference API.

The model does not rely solely on its pre-trained weights; instead, it uses the provided snippets as the primary source of truth to generate a response.

Method: generate_detailed_response(query)
Input: query (string) — The user's original query.
Output: string — The LLM-generated response based on the retrieved web context.

# Execute the full workflow: Search -> Augment -> Generate
response = connector.generate_detailed_response("What are the latest developments in quantum computing?")

print(response)

Data Flow Summary

Technical Notes

Model Information: The system currently utilizes Mistral-7B-Instruct-v0.2 for inference.
Data Processing: Only the top 5 snippets are processed to ensure the payload remains within the model's token context limits and API constraints.
Error Handling: If the Google API fails or the LLM provider is unreachable, the methods return empty strings or lists respectively, allowing for graceful degradation in your application.