Retrieval Augmented Generation (RAG) is the most practical way to give LLMs access to current, factual information. Instead of relying solely on training data, you retrieve relevant documents at query time and include them in the LLM's context.
This tutorial shows you how to build a complete RAG pipeline using Keiro's search and content extraction API. We'll go from zero to a working system in under 10 minutes.
Architecture Overview
Our RAG pipeline has four steps:
- Query — User asks a question
- Search — Keiro searches the web and returns relevant content chunks
- Augment — We inject the retrieved content into the LLM prompt
- Generate — The LLM generates an answer grounded in real sources
Step 1: Install Dependencies
pip install requests openai
Step 2: Search + Extract Content with Keiro
Keiro's /search/content endpoint in medium mode does something unique: it searches the web AND returns pre-chunked content from the top results. This skips the entire "fetch pages → parse HTML → chunk text" pipeline that most RAG systems require.
import requests
KEIRO_KEY = "your_keiro_api_key"
def search_and_chunk(query: str, max_results: int = 3) -> list[dict]:
"""Search the web and get RAG-ready chunks in one API call."""
response = requests.post(
"https://kierolabs.space/api/v2/search/content",
headers={"Authorization": f"Bearer {KEIRO_KEY}"},
json={"query": query, "maxResults": max_results, "mode": "medium"}
)
response.raise_for_status()
return response.json()["results"]
Step 3: Build the RAG Prompt
def build_rag_prompt(query: str, sources: list[dict]) -> str:
"""Build a prompt with retrieved context."""
context_parts = []
for i, source in enumerate(sources, 1):
context_parts.append(
f"Source {i}: {source['title']}\n"
f"URL: {source['url']}\n"
f"Content: {source['content'][:2000]}\n"
)
context = "\n---\n".join(context_parts)
return f"""Answer the following question using ONLY the provided sources.
Cite sources by number [1], [2], etc. If the sources don't contain
enough information, say so.
Sources:
{context}
Question: {query}
Answer:"""
Step 4: Generate with OpenAI
from openai import OpenAI
client = OpenAI(api_key="your_openai_key")
def rag_answer(query: str) -> str:
"""Full RAG pipeline: search → augment → generate."""
# 1. Search + extract chunks
sources = search_and_chunk(query)
# 2. Build augmented prompt
prompt = build_rag_prompt(query, sources)
# 3. Generate answer
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0.1
)
return response.choices[0].message.content
# Try it
answer = rag_answer("What are the latest best practices for RAG pipelines in 2026?")
print(answer)
Step 5: Add Source Attribution
For production, you'll want to return sources alongside the answer:
def rag_with_sources(query: str) -> dict:
sources = search_and_chunk(query)
prompt = build_rag_prompt(query, sources)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0.1
)
return {
"answer": response.choices[0].message.content,
"sources": [{"title": s["title"], "url": s["url"]} for s in sources],
"credits_used": 1.5 # search/content costs 1.5 credits
}
Why Keiro for RAG?
Most RAG systems require a multi-step pipeline: search → fetch pages → parse HTML → chunk → embed. Keiro's medium mode collapses the first four steps into a single API call:
| Step | Traditional RAG | Keiro RAG |
|---|---|---|
| Web search | Tavily/Exa ($3–4/1K) | One call to /search/content ($1.50/1K) |
| Fetch pages | requests + rate limiting | |
| Parse HTML | BeautifulSoup/Trafilatura | |
| Chunk text | LangChain splitters | |
| Embed | OpenAI embeddings | OpenAI embeddings |
The result: fewer dependencies, less code, lower latency, and 60–80% cost savings compared to Tavily or Exa-based RAG pipelines.
Next Steps
- Use
/search/contentdeep mode for full markdown extraction - Add Keiro's
/search/batchendpoint for offline dataset generation - Use the
/search/flashendpoint for real-time agent loops where latency matters - Implement caching to save credits on repeated queries
Keiro plans start at $15/month for 5,000 credits. The /search/content endpoint costs 1.5 credits per call, so the Essential plan gives you ~3,300 RAG queries per month. Start with 300 free credits to test your pipeline — no credit card needed.