# Building Agentic RAG with Adaline Canonical URL: https://www.adaline.ai/blog/building-agentic-rag LLM text URL: https://www.adaline.ai/blog/building-agentic-rag/llms.txt Published: 2025-12-08T00:00:00.000Z Modified: 2025-12-08T21:16:39.952Z Author: Nilesh Barla Category: Tutorials Visibility: public Reading time: 20 min Topics: Tutorials, Adaline, AI agent observability, agent evals, self-improving agents ## Summary How to build intelligent AI systems that make decisions. ## Article # Introduction What happens when an AI system needs to answer questions? It needs context, tools and it also needs to decide what to do next. Traditional systems retrieve information every time. They call tools every time. They use the same sequential process for every query the user gives. Agentic RAG changes this. The system decides when to retrieve context. It chooses which tools to use and it adapts itself to each question. This guide explains how to build such systems using Adaline. Adaline provides the infrastructure for deploying prompts, managing tools, and tracking performance. The following sections walk through each step. They explain how the pieces fit together and shows how to build something that works. # What is Agentic RAG? Before moving to defining agentic RAG, let's first answer "What makes a system agentic?" Essentially, it is a system that makes decisions. It chooses its own path. Consider a simple question: "What is the weather today?" This question does not need context from documents. It needs current weather data. Now consider: "What are the best practices for running in hot weather?" This question needs both **context** and **current data**. It needs training documents. It needs weather information. Agentic RAG handles both cases. It routes simple questions directly to the language model. It routes complex questions through **retrieval** and **tools, **and then to the LLM. ## The Agentic Approach One important thing to note is that Agentic RAG adds intelligence. The system examines each query and decides if retrieval is needed. It also decides if tools are needed. Essentially, it builds a custom path for each question. The workflow looks like this: ```markdown User Query ↓ Query Routing ↓ Conditional RAG (if needed) ↓ Agent Creation ↓ Tool Execution (if needed) ↓ Response Generation ``` Simple queries skip retrieval. They go straight to the language model. Complex queries get complete treatment. Meaning, they retrieve context, call them tools, and synthesize everything. As a result, costs drop, and latency improves as well. Users get faster answers. The system uses resources efficiently. # Why Should Product Leaders Care about this Architecture? Agentic RAG solves real problems. It saves money and improves the user experience. ## The Cost Problem Traditional RAG systems retrieve context for every query. This costs money. Embedding generation costs money and Vector search as well. Language model calls with a large context cost money. Most queries are simple. They do not need document retrieval. They do not need a complex context, and yet traditional systems retrieve anyway. Agentic RAG changes this. Simple queries skip retrieval. They use the language model directly. Costs drop immediately. Consider one thousand queries; traditional RAG costs eighteen cents per query. Agentic RAG costs fourteen cents per query. That's a twenty-two percent reduction. ## The Speed Problem Retrieval takes time, and so does Embedding generation. Not to forget that vector search takes time as well. Simple queries do not need these steps. Agentic RAG routes simple queries directly. They respond two to three times faster. And users notice the difference. ## The Accuracy Problem Sometimes retrieval adds noise. Irrelevant documents confuse the model; they introduce context pollution. The response quality suffers. Agentic RAG retrieves only when needed. It retrieves only relevant documents. Response quality improves. ## The Scalability Problem Traditional systems are hard to extend. Adding new capabilities requires changing core logic. Testing becomes difficult. Agentic RAG, on the other hand, uses tools. Tools are independent modules. New tools can be added without changing the core. This way, the system grows naturally. # Building Your First System The **orchestrator** is the core component. It coordinates **query routing**, **retrieval, agent creation**, and **tool execution**. Image: https://a-us.storyblok.com/f/1023026/1320x1542/b6ad60d698/agentic-rag-trace-and-span.webp The image shows how the orchestrator decides which component to execute based on the user query. Here is how it works. ## The Orchestrator Function The main function receives a **user query** and **decides** the execution path: ```typescript export async function orchestrateAgenticRAG( systemMessage: string, userMessage: string, model: string, deployedTools: any[], settings?: Record ): Promise<{ finalResponse: string }> { ``` It initializes observability first. Every operation gets tracked: ```typescript getOrCreateTrace(); ``` ## Query Routing The system, then, examines the query to decide if retrieval is needed: ```typescript const userRequestsRAG = /\bRAG\b/i.test(userMessage) || /\bRAG_CONTEXT\b/i.test(userMessage) || /\brag:\b/i.test(userMessage); const intent = userRequestsRAG ? 'rag_enabled' : 'direct_query'; ``` If the query contains "RAG" or "context", retrieval is enabled. Image: https://a-us.storyblok.com/f/1023026/613x495/a81afa928c/the-rag-phrase.webp If the user message contains the phrase "RAG," then the RAG is enabled. This is known as **intent classification.** Otherwise, the system skips retrieval and goes directly to the agent. ## Conditional Retrieval Image: https://a-us.storyblok.com/f/1023026/1572x700/9dec8f25db/rag-workflow.png A simple workflow for retrieving information from a vector database. When retrieval is needed, the system generates an embedding and queries Pinecone: ```typescript const matches = await retrieveTopK(5, getOrCreateTrace(), userMessage, ragPhaseRefId); ``` The `retrieveTopK` Function creates an embedding using [Adaine Gateway:](https://github.com/adaline/gateway) ```typescript export async function createQueryEmbedding(text: string): Promise { const model = openai.embeddingModel({ modelName: 'text-embedding-3-small', apiKey }); const resp = await gateway.getEmbeddings({ model, config: Config().parse({}), embeddingRequests: { modality: 'text', requests: [text] }, }); return resp.response.embeddings[0].embedding; } ``` Then it queries Pinecone: ```typescript const index = await getIndex(); const qemb = await createQueryEmbedding(userMessage); const results = await index.query({ vector: qemb, topK: 5, includeMetadata: true }); ``` The system retrieves the top five matches and assembles context from the original files: ```typescript for (const m of matches) { const { fileName, chunkNum } = await parseMatchMetadata(m); const content = await readChunkContent(fileName, chunkNum); lines.push(`Source: ${fileName}#${chunkNum}\n${content}`); } const ragSummary = lines.join('\n\n'); ``` Tools are converted from Adaline's format to the agent's format: ```typescript function createAgentTool(deployedTool: any) { const toolName = deployedTool.definition?.schema?.name; const toolDescription = deployedTool.definition?.schema?.description; const toolParams = deployedTool.definition?.schema?.parameters; return tool({ name: toolName, description: toolDescription, parameters: zodObject, execute: async (args: any) => { // Execute tool handler return result; }, }); } ``` ## Tool Execution When the agent calls a tool, the execute function runs the following script: ```typescript execute: async (args: any) => { switch (toolName) { case 'weather_checker': result = await weather_checker(args); break; case 'nutrition_planner': result = await nutrition_planner(args); break; } return result; } ``` Tool handlers are simple functions: ```typescript export async function weather_checker(args: WeatherCheckerArgs): Promise { const location = args.location || 'Unknown location'; const temperature = 15; // In production, call weather API return { name: 'weather_checker', summary: `Weather for ${location}: ${temperature}°C`, weatherData: { temperature, humidity: 65, conditions: 'Clear' } }; } ``` ## Adaline Integration The system fetches the deployed prompt hosted in Adaline. Image: https://a-us.storyblok.com/f/1023026/1558x752/7c86269d4f/hosted-prompt-adaline.png ```typescript export async function getDeploymentInfo() { const response = await fetch(url, { headers: { 'Authorization': `Bearer ${apiKey}` } }); const data = await response.json(); return { model: data.prompt.config.model, tools: data.prompt.tools, settings: data.prompt.config.settings }; } ``` Variables are injected into the fetched prompt template. ```typescript export function injectVariables(template: string, variables: Record): string { return template.replace(/\{\{([^}]+)\}\}/g, (match, variableName) => { const key = variableName.trim(); return variables[key] || match; }); } ``` ## Observability Every operation creates a span. ```typescript addSpanToTrace({ name: 'query_routing', status: 'success', startedAt: queryRoutingStart, endedAt: queryRoutingEnd, content: { type: 'Function', input: { userMessage }, output: { intent, plan: executionPlan } } }); ``` The trace is submitted to Adaline at the end: ```typescript await safeSubmitTrace(trace); ``` # Understanding the Components Each component has a specific role. Here is how they work together. ## Query Routing Routing examines the query and decides the execution path. The implementation uses pattern matching: ```typescript const userRequestsRAG = /\bRAG\b/i.test(userMessage) || /\bRAG_CONTEXT\b/i.test(userMessage); const intent = userRequestsRAG ? 'rag_enabled' : 'direct_query'; ``` The system creates an execution plan: ```typescript const executionPlan = { useRAG: userRequestsRAG, tools: deployedTools.map(t => t.definition?.schema?.name), phases: userRequestsRAG ? ['rag', 'agent', 'synthesis'] : ['agent', 'synthesis'] }; ``` This plan determines which phases run. Simple queries skip the RAG phase entirely. ## Conditional Retrieval Retrieval generates embeddings through Adaline Gateway: ```typescript export async function createQueryEmbedding(text: string): Promise { const model = openai.embeddingModel({ modelName: 'text-embedding-3-small', apiKey: process.env.OAI_API_KEY }); const resp = await gateway.getEmbeddings({ model, config: Config().parse({}), embeddingRequests: { modality: 'text', requests: [text] } }); const emb = resp.response.embeddings[0].embedding; return projectToDim(emb, PINECONE_DIMENSION); } ``` The `projectToDim` function adjusts embedding dimensions to match the Pinecone index. Then it queries: ```typescript export async function retrieveTopK(k = 5, trace?: Trace, query?: string) { const index = await getIndex(); const qemb = await createQueryEmbedding(query || ''); const results = await index.query({ vector: qemb, topK: k, includeMetadata: true }); return results.matches ?? []; } ``` Metadata parsing extracts file and chunk information: ```typescript export async function parseMatchMetadata(match: any) { let fileName = match.metadata?.file || match.metadata?.source; let chunkNum = match.metadata?.chunk || match.metadata?.chunkIndex; if (!fileName && match.id) { const idMatch = String(match.id).match(/(.+)-chunk-(\d+)$/); if (idMatch) { fileName = idMatch[1]; chunkNum = Number(idMatch[2]); } } return { fileName, chunkNum }; } ``` Context assembly combines retrieved chunks: ```typescript const lines: string[] = []; for (const m of matches) { const { fileName, chunkNum } = await parseMatchMetadata(m); const content = await readChunkContent(fileName, chunkNum); lines.push(`Source: ${fileName}#${chunkNum}\n${content}`); } const ragSummary = lines.join('\n\n'); ``` ## Agent Creation Image: https://a-us.storyblok.com/f/1023026/1588x950/5d154a3424/creating-agent.png The agent is created to incorporate the necessary tools further. The orchestrator creates agents with tools. ```typescript const agentTools = deployedTools.map((tool) => createAgentTool(tool, orchestratorRefId, toolExecutionPhaseRefId) ); const agent = new Agent({ name: 'Running Coach Agent', model, instructions: finalSystemMessage, tools: agentTools }); ``` Tool conversion handles schema differences: ```typescript function createAgentTool(deployedTool: any) { const properties = deployedTool.definition?.schema?.parameters?.properties || {}; const zodSchema: any = {}; for (const [key, value] of Object.entries(properties)) { const prop = value as any; let fieldSchema = z.string(); if (prop.type === 'number') fieldSchema = z.number(); if (prop.type === 'boolean') fieldSchema = z.boolean(); if (prop.type === 'array') fieldSchema = z.array(z.string()); zodSchema[key] = fieldSchema; } return tool({ name: toolName, description: toolDescription, parameters: z.object(zodSchema), execute: async (args: any) => { /* ... */ } }); } ``` The agent runs with the user message: ```typescript export async function nutrition_planner(args: NutritionPlannerArgs): Promise { const run = (args.run_block || '').trim(); const cover = (args.what_to_cover || '').trim(); return { name: 'nutrition_planner', summary: `Hydration plan for: ${run}`, hydrationPlan: { preRun: 'Drink 200–300 ml water 20–30 min before start.', duringRun: 'Sip 100–200 ml every 15–20 min', electrolytes: 'Add 200–300 mg sodium per hour' } }; } ``` The orchestrator tracks each tool call: ```typescript addSpanToTrace({ name: `tool_call_${toolName}`, status: 'success', startedAt: toolStart, endedAt: toolStart, content: { type: 'Tool', input: { toolName, arguments: args }, output: { called: true } } }); ``` ## Observability Traces capture the complete flow: ```typescript export function createTrace(name: string, projectId: string, promptId?: string): Trace { return { name, status: 'success', startedAt: Date.now(), endedAt: 0, referenceId: uuidv4(), spans: [], projectId, promptId, sessionId: uuidv4() }; } ``` Spans are added for each operation: ```typescript addSpan(trace, { name: 'pinecone_query', status: 'success', startedAt: startTime, endedAt: endTime, content: { type: 'Retrieval', input: { top_k: 5, query: userMessage }, output: { matchesCount: matches.length } } }); ``` # Making It Work in Production How do you take an Agentic RAG system from prototype to production? Focus on reliability. Focus on performance. Focus on monitoring. ## Reliability Production systems must handle errors gracefully. Tool calls can fail. Retrieval can fail. Language model calls can fail. Implement error handling in tool execution: ```typescript execute: async (args: any) => { let result: any; let status: 'success' | 'error' = 'success'; try { switch (toolName) { case 'weather_checker': result = await weather_checker(args); break; default: throw new Error(`Unknown tool: ${toolName}`); } } catch (error) { status = 'error'; addSpanToTrace({ name: `tool_response_${toolName}`, status: 'error', content: { type: 'Tool', output: { error: error.message } } }); throw error; } return result; } ``` If retrieval fails, continue without context: ```typescript try { const matches = await retrieveTopK(5, trace, userMessage); // ... assemble context } catch (e) { ragStatus = 'error'; ragSummary = `RAG retrieval error: ${e.message}`; // Continue without RAG context } finalSystemMessage = ragSummary ? `${systemMessage}\n\n[RAG_CONTEXT]\n${ragSummary}` : systemMessage; ``` ## Performance Production systems must be fast. Optimize retrieval with caching: ```typescript const queryCache = new Map(); async function retrieveTopKWithCache(query: string, topK: number) { const cacheKey = `${query}:${topK}`; if (queryCache.has(cacheKey)) { return queryCache.get(cacheKey); } const results = await retrieveTopK(topK, trace, query); queryCache.set(cacheKey, results); return results; } ``` Execute tools in parallel when independent: ```typescript const toolResults = await Promise.all([ weather_checker(weatherArgs), nutrition_planner(nutritionArgs) ]); ``` Track latency in spans: ```typescript const startTime = now(); // ... operation ... const endTime = now(); addSpanToTrace({ name: 'operation_name', startedAt: startTime, endedAt: endTime, // Latency = endTime - startTime }); ``` # Monitoring Image: https://a-us.storyblok.com/f/1023026/2330x1684/88978fd626/dashboaard.png _Adaline's dashboard allows you to monitor important metrics in real-time._ Production systems need monitoring. Meaning, you should have a proper dashboard to track key metrics. Track and monitor trends, identify peak times, and track costs. ## Gradual Rollout Do not launch to everyone at once. Start with internal testing. Move to beta users. Gradually increase traffic. Week one and two: Internal testing with ten percent of queries. Catch obvious bugs. Verify basic functionality. Week three and four: Beta users with twenty-five percent of queries. Gather feedback. Monitor performance. Week five and six: Gradual increase to one hundred percent. Watch metrics closely and be ready to roll back. This approach reduces risk. It catches problems early and allows adjustments before full launch. # Common Mistakes What mistakes do teams make when building Agentic RAG systems? Learn from others. Avoid these pitfalls. ## Over-Engineering Routing Some teams build complex routing systems. They use machine learning models. They add multiple classification layers. Start simple instead. Use keyword detection. It works for most cases and only upgrade if needed. When to upgrade? Routing accuracy drops below eighty-five percent. False positive rates are high. Query patterns become complex. ## Ignoring Costs Some teams do not monitor costs. They build the system. They deploy it. Costs spiral out of control. Set up cost monitoring from day one. Essentially, tracking costs per query. Also, set budget alerts and review costs weekly. Implement cost optimizations, such as caching frequently queried queries. Use smaller models for routing. Optimize retrieval parameters. ## Poor Error Handling Some systems fail when one component errors. A tool failure stops everything. A retrieval failure stops everything. Implement graceful degradation. If retrieval fails, continue without context. If a tool fails, continue without that tool. Always return something useful. Implement retry logic. Transient failures should retry. Use exponential backoff, like limit attempts. Implement user-friendly error messages. Explain what went wrong. Suggest alternatives. Do not show technical details. ## Neglecting User Experience Some teams focus on technical implementation. They ignore user experience. Users get confused. Show loading states. Indicate when tools are running. Show progress. Keep users informed. Provide source attribution. Show where the information came from. Build trust. Enable verification. Explain tool usage. Tell users what tools are being used. Explain why. Build understanding. Collect user feedback. Ask for ratings. Monitor comments. Act on suggestions. ## Insufficient Testing Some teams deploy without thorough testing. They test happy paths only. Production reveals problems. Test all components. Test routing logic. Test retrieval. Test tools. Test error handling. Test integration. Test the full flow. Test edge cases. Test error scenarios. Test under load. Simulate production traffic. Measure performance. Identify bottlenecks. ## Tool Overload Some teams add too many tools. They think more tools mean more capabilities. The agent gets confused. Start with two or three essential tools. Add tools incrementally. Monitor usage. Remove unused tools. Each tool should have a clear purpose. Each tool should be well tested. Each tool should add value. # Conclusion Agentic RAG represents the next step in AI systems. **It combines retrieval with intelligence** and combines tools with autonomy. It adapts to each query. Building such systems requires understanding. Understanding of routing, retrieval, agents, and of tools. Adaline provides the infrastructure. It handles prompt deployment. It manages tool integration. It tracks performance. It simplifies development. The path forward is clear. Start simple. Add complexity gradually. Monitor everything. Iterate based on results. The benefits are real. Costs drop. Latency improves. Quality increases. Users are happier.