February 6, 2026·Tutorials

Build a RAG Pipeline with Keiro in Under 10 Minutes

A step-by-step guide to building a production-ready RAG pipeline using Keiro's search and content extraction API. Includes Python code you can copy-paste.

10 min readKeiro Team

Retrieval Augmented Generation (RAG) is the most practical way to give LLMs access to current, factual information. Instead of relying solely on training data, you retrieve relevant documents at query time and include them in the LLM's context.

This tutorial shows you how to build a complete RAG pipeline using Keiro's search and content extraction API. We'll go from zero to a working system in under 10 minutes.

Architecture Overview

Our RAG pipeline has four steps:

Query — User asks a question
Search — Keiro searches the web and returns relevant content chunks
Augment — We inject the retrieved content into the LLM prompt
Generate — The LLM generates an answer grounded in real sources

Step 1: Install Dependencies

pip install requests openai

Step 2: Search + Extract Content with Keiro

Keiro's /search/content endpoint in medium mode does something unique: it searches the web AND returns pre-chunked content from the top results. This skips the entire "fetch pages → parse HTML → chunk text" pipeline that most RAG systems require.

import requests

KEIRO_KEY = "your_keiro_api_key"

def search_and_chunk(query: str, max_results: int = 3) -> list[dict]:
    """Search the web and get RAG-ready chunks in one API call."""
    response = requests.post(
        "https://kierolabs.space/api/v2/search/content",
        headers={"Authorization": f"Bearer {KEIRO_KEY}"},
        json={"query": query, "maxResults": max_results, "mode": "medium"}
    )
    response.raise_for_status()
    return response.json()["results"]

Step 3: Build the RAG Prompt

def build_rag_prompt(query: str, sources: list[dict]) -> str:
    """Build a prompt with retrieved context."""
    context_parts = []
    for i, source in enumerate(sources, 1):
        context_parts.append(
            f"Source {i}: {source['title']}\n"
            f"URL: {source['url']}\n"
            f"Content: {source['content'][:2000]}\n"
        )

    context = "\n---\n".join(context_parts)

    return f"""Answer the following question using ONLY the provided sources.
Cite sources by number [1], [2], etc. If the sources don't contain
enough information, say so.

Sources:
{context}

Question: {query}

Answer:"""

Step 4: Generate with OpenAI

from openai import OpenAI

client = OpenAI(api_key="your_openai_key")

def rag_answer(query: str) -> str:
    """Full RAG pipeline: search → augment → generate."""
    # 1. Search + extract chunks
    sources = search_and_chunk(query)

    # 2. Build augmented prompt
    prompt = build_rag_prompt(query, sources)

    # 3. Generate answer
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1
    )

    return response.choices[0].message.content

# Try it
answer = rag_answer("What are the latest best practices for RAG pipelines in 2026?")
print(answer)

Step 5: Add Source Attribution

For production, you'll want to return sources alongside the answer:

def rag_with_sources(query: str) -> dict:
    sources = search_and_chunk(query)
    prompt = build_rag_prompt(query, sources)

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1
    )

    return {
        "answer": response.choices[0].message.content,
        "sources": [{"title": s["title"], "url": s["url"]} for s in sources],
        "credits_used": 1.5  # search/content costs 1.5 credits
    }

Why Keiro for RAG?

Most RAG systems require a multi-step pipeline: search → fetch pages → parse HTML → chunk → embed. Keiro's medium mode collapses the first four steps into a single API call:

Step	Traditional RAG	Keiro RAG
Web search	Tavily/Exa ($3–4/1K)	One call to /search/content ($1.50/1K)
Fetch pages	requests + rate limiting
Parse HTML	BeautifulSoup/Trafilatura
Chunk text	LangChain splitters
Embed	OpenAI embeddings	OpenAI embeddings

The result: fewer dependencies, less code, lower latency, and 60–80% cost savings compared to Tavily or Exa-based RAG pipelines.

Next Steps

Use /search/content deep mode for full markdown extraction
Add Keiro's /search/batch endpoint for offline dataset generation
Use the /search/flash endpoint for real-time agent loops where latency matters
Implement caching to save credits on repeated queries

Keiro plans start at $15/month for 5,000 credits. The /search/content endpoint costs 1.5 credits per call, so the Essential plan gives you ~3,300 RAG queries per month. Start with 300 free credits to test your pipeline — no credit card needed.

RAGTutorialPythonKeiroAI SearchLLM

Architecture Overview

Step 1: Install Dependencies

Step 2: Search + Extract Content with Keiro

Step 3: Build the RAG Prompt

Step 4: Generate with OpenAI

Step 5: Add Source Attribution

Why Keiro for RAG?

Next Steps

Further reading

Building RAG Pipelines with Keiro API: A Complete Guide

Why Your AI Chatbot Needs Real-Time Web Data

How to Add Web Search to Your LLM Application

Ready to build something?