# Building Agentic RAG with Adaline

Canonical URL: https://www.adaline.ai/blog/building-agentic-rag
LLM text URL: https://www.adaline.ai/blog/building-agentic-rag/llms.txt
Published: 2025-12-08T00:00:00.000Z
Modified: 2025-12-08T21:16:39.952Z
Author: Nilesh Barla
Category: Tutorials
Visibility: public
Reading time: 20 min
Topics: Tutorials, Adaline, AI agent observability, agent evals, self-improving agents

## Summary

How to build intelligent AI systems that make decisions.

## Article

# Introduction

What happens when an AI system needs to answer questions? It needs context, tools and it also needs to decide what to do next. Traditional systems retrieve information every time. They call tools every time. They use the same sequential process for every query the user gives. Agentic RAG changes this. The system decides when to retrieve context. It chooses which tools to use and it adapts itself to each question.

This guide explains how to build such systems using Adaline. Adaline provides the infrastructure for deploying prompts, managing tools, and tracking performance. The following sections walk through each step. They explain how the pieces fit together and shows how to build something that works.

# What is Agentic RAG?

Before moving to defining agentic RAG, let's first answer "What makes a system agentic?" Essentially, it is a system that makes decisions. It chooses its own path. Consider a simple question: "What is the weather today?" This question does not need context from documents. It needs current weather data. Now consider: "What are the best practices for running in hot weather?" This question needs both **context** and **current data**. It needs training documents. It needs weather information. Agentic RAG handles both cases. It routes simple questions directly to the language model. It routes complex questions through **retrieval** and **tools, **and then to the LLM.

## The Agentic Approach

One important thing to note is that Agentic RAG adds intelligence. The system examines each query and decides if retrieval is needed. It also decides if tools are needed. Essentially, it builds a custom path for each question. The workflow looks like this:

```markdown
User Query
    ↓
Query Routing
    ↓
Conditional RAG (if needed)
    ↓
Agent Creation
    ↓
Tool Execution (if needed)
    ↓
Response Generation
```

Simple queries skip retrieval. They go straight to the language model. Complex queries get complete treatment. Meaning, they retrieve context, call them tools, and synthesize everything. As a result, costs drop, and latency improves as well. Users get faster answers. The system uses resources efficiently.

# Why Should Product Leaders Care about this Architecture?

Agentic RAG solves real problems. It saves money and improves the user experience.

## The Cost Problem

Traditional RAG systems retrieve context for every query. This costs money. Embedding generation costs money and Vector search as well. Language model calls with a large context cost money.

Most queries are simple. They do not need document retrieval. They do not need a complex context, and yet traditional systems retrieve anyway.

Agentic RAG changes this. Simple queries skip retrieval. They use the language model directly. Costs drop immediately. Consider one thousand queries; traditional RAG costs eighteen cents per query. Agentic RAG costs fourteen cents per query. That's a twenty-two percent reduction.

## The Speed Problem

Retrieval takes time, and so does Embedding generation. Not to forget that vector search takes time as well. Simple queries do not need these steps. Agentic RAG routes simple queries directly. They respond two to three times faster. And users notice the difference.

## The Accuracy Problem

Sometimes retrieval adds noise. Irrelevant documents confuse the model; they introduce context pollution. The response quality suffers. Agentic RAG retrieves only when needed. It retrieves only relevant documents. Response quality improves.

## The Scalability Problem

Traditional systems are hard to extend. Adding new capabilities requires changing core logic. Testing becomes difficult. Agentic RAG, on the other hand, uses tools. Tools are independent modules. New tools can be added without changing the core. This way, the system grows naturally.

# Building Your First System

The **orchestrator** is the core component. It coordinates **query routing**, **retrieval, agent creation**, and **tool execution**.

Image: https://a-us.storyblok.com/f/1023026/1320x1542/b6ad60d698/agentic-rag-trace-and-span.webp

The image shows how the orchestrator decides which component to execute based on the user query.

Here is how it works.

## The Orchestrator Function

The main function receives a **user query** and **decides** the execution path:

```typescript
export async function orchestrateAgenticRAG(
  systemMessage: string,
  userMessage: string,
  model: string,
  deployedTools: any[],
  settings?: Record<string, any>
): Promise<{ finalResponse: string }> {
```

It initializes observability first. Every operation gets tracked:

```typescript
getOrCreateTrace();

```

## Query Routing

The system, then, examines the query to decide if retrieval is needed:

```typescript
const userRequestsRAG = /\bRAG\b/i.test(userMessage) || 
                        /\bRAG_CONTEXT\b/i.test(userMessage) || 
                        /\brag:\b/i.test(userMessage);
const intent = userRequestsRAG ? 'rag_enabled' : 'direct_query';
```

If the query contains "RAG" or "context", retrieval is enabled.

Image: https://a-us.storyblok.com/f/1023026/613x495/a81afa928c/the-rag-phrase.webp

If the user message contains the phrase "RAG," then the RAG is enabled. This is known as **intent classification.**

Otherwise, the system skips retrieval and goes directly to the agent.

## Conditional Retrieval

Image: https://a-us.storyblok.com/f/1023026/1572x700/9dec8f25db/rag-workflow.png

A simple workflow for retrieving information from a vector database.

When retrieval is needed, the system generates an embedding and queries Pinecone:

```typescript
const matches = await retrieveTopK(5, getOrCreateTrace(), userMessage, ragPhaseRefId);
```

The `retrieveTopK` Function creates an embedding using [Adaine Gateway:](https://github.com/adaline/gateway)

```typescript
export async function createQueryEmbedding(text: string): Promise<number[]> {
  const model = openai.embeddingModel({ modelName: 'text-embedding-3-small', apiKey });
  const resp = await gateway.getEmbeddings({
    model,
    config: Config().parse({}),
    embeddingRequests: { modality: 'text', requests: [text] },
  });
  return resp.response.embeddings[0].embedding;
}
```

Then it queries Pinecone:

```typescript
const index = await getIndex();
const qemb = await createQueryEmbedding(userMessage);
const results = await index.query({ 
  vector: qemb, 
  topK: 5, 
  includeMetadata: true 
});
```

The system retrieves the top five matches and assembles context from the original files:

```typescript
for (const m of matches) {
  const { fileName, chunkNum } = await parseMatchMetadata(m);
  const content = await readChunkContent(fileName, chunkNum);
  lines.push(`Source: ${fileName}#${chunkNum}\n${content}`);
}
const ragSummary = lines.join('\n\n');
```

Tools are converted from Adaline's format to the agent's format:

```typescript
function createAgentTool(deployedTool: any) {
  const toolName = deployedTool.definition?.schema?.name;
  const toolDescription = deployedTool.definition?.schema?.description;
  const toolParams = deployedTool.definition?.schema?.parameters;
  
  return tool({
    name: toolName,
    description: toolDescription,
    parameters: zodObject,
    execute: async (args: any) => {
      // Execute tool handler
      return result;
    },
  });
}
```

## Tool Execution

When the agent calls a tool, the execute function runs the following script:

```typescript
execute: async (args: any) => {
  switch (toolName) {
    case 'weather_checker':
      result = await weather_checker(args);
      break;
    case 'nutrition_planner':
      result = await nutrition_planner(args);
      break;
  }
  return result;
}
```

Tool handlers are simple functions:

```typescript
export async function weather_checker(args: WeatherCheckerArgs): Promise<ToolResult> {
  const location = args.location || 'Unknown location';
  const temperature = 15; // In production, call weather API
  return {
    name: 'weather_checker',
    summary: `Weather for ${location}: ${temperature}°C`,
    weatherData: { temperature, humidity: 65, conditions: 'Clear' }
  };
}
```

## Adaline Integration

The system fetches the deployed prompt hosted in Adaline.

Image: https://a-us.storyblok.com/f/1023026/1558x752/7c86269d4f/hosted-prompt-adaline.png

```typescript
export async function getDeploymentInfo() {
  const response = await fetch(url, {
    headers: { 'Authorization': `Bearer ${apiKey}` }
  });
  const data = await response.json();
  return {
    model: data.prompt.config.model,
    tools: data.prompt.tools,
    settings: data.prompt.config.settings
  };
}
```

Variables are injected into the fetched prompt template.

```typescript
export function injectVariables(template: string, variables: Record<string, any>): string {
  return template.replace(/\{\{([^}]+)\}\}/g, (match, variableName) => {
    const key = variableName.trim();
    return variables[key] || match;
  });
}
```

## Observability

Every operation creates a span.

```typescript
addSpanToTrace({
  name: 'query_routing',
  status: 'success',
  startedAt: queryRoutingStart,
  endedAt: queryRoutingEnd,
  content: {
    type: 'Function',
    input: { userMessage },
    output: { intent, plan: executionPlan }
  }
});
```

The trace is submitted to Adaline at the end:

```typescript
await safeSubmitTrace(trace);
```

# Understanding the Components

Each component has a specific role. Here is how they work together.

## Query Routing

Routing examines the query and decides the execution path. The implementation uses pattern matching:

```typescript
const userRequestsRAG = /\bRAG\b/i.test(userMessage) || 
                        /\bRAG_CONTEXT\b/i.test(userMessage);
const intent = userRequestsRAG ? 'rag_enabled' : 'direct_query';
```

The system creates an execution plan:

```typescript
const executionPlan = {
  useRAG: userRequestsRAG,
  tools: deployedTools.map(t => t.definition?.schema?.name),
  phases: userRequestsRAG ? ['rag', 'agent', 'synthesis'] : ['agent', 'synthesis']
};
```

This plan determines which phases run. Simple queries skip the RAG phase entirely.

## Conditional Retrieval

Retrieval generates embeddings through Adaline Gateway:

```typescript
export async function createQueryEmbedding(text: string): Promise<number[]> {
  const model = openai.embeddingModel({ 
    modelName: 'text-embedding-3-small', 
    apiKey: process.env.OAI_API_KEY 
  });
  const resp = await gateway.getEmbeddings({
    model,
    config: Config().parse({}),
    embeddingRequests: { modality: 'text', requests: [text] }
  });
  const emb = resp.response.embeddings[0].embedding;
  return projectToDim(emb, PINECONE_DIMENSION);
}
```

The `projectToDim` function adjusts embedding dimensions to match the Pinecone index. Then it queries:

```typescript
export async function retrieveTopK(k = 5, trace?: Trace, query?: string) {
  const index = await getIndex();
  const qemb = await createQueryEmbedding(query || '');
  const results = await index.query({ 
    vector: qemb, 
    topK: k, 
    includeMetadata: true 
  });
  return results.matches ?? [];
}
```

Metadata parsing extracts file and chunk information:

```typescript
export async function parseMatchMetadata(match: any) {
  let fileName = match.metadata?.file || match.metadata?.source;
  let chunkNum = match.metadata?.chunk || match.metadata?.chunkIndex;
  
  if (!fileName && match.id) {
    const idMatch = String(match.id).match(/(.+)-chunk-(\d+)$/);
    if (idMatch) {
      fileName = idMatch[1];
      chunkNum = Number(idMatch[2]);
    }
  }
  return { fileName, chunkNum };
}
```

Context assembly combines retrieved chunks:

```typescript
const lines: string[] = [];
for (const m of matches) {
  const { fileName, chunkNum } = await parseMatchMetadata(m);
  const content = await readChunkContent(fileName, chunkNum);
  lines.push(`Source: ${fileName}#${chunkNum}\n${content}`);
}
const ragSummary = lines.join('\n\n');
```

## Agent Creation

Image: https://a-us.storyblok.com/f/1023026/1588x950/5d154a3424/creating-agent.png

The agent is created to incorporate the necessary tools further.

The orchestrator creates agents with tools.

```typescript
const agentTools = deployedTools.map((tool) => 
  createAgentTool(tool, orchestratorRefId, toolExecutionPhaseRefId)
);

const agent = new Agent({
  name: 'Running Coach Agent',
  model,
  instructions: finalSystemMessage,
  tools: agentTools
});
```

Tool conversion handles schema differences:

```typescript
function createAgentTool(deployedTool: any) {
  const properties = deployedTool.definition?.schema?.parameters?.properties || {};
  const zodSchema: any = {};
  
  for (const [key, value] of Object.entries(properties)) {
    const prop = value as any;
    let fieldSchema = z.string();
    
    if (prop.type === 'number') fieldSchema = z.number();
    if (prop.type === 'boolean') fieldSchema = z.boolean();
    if (prop.type === 'array') fieldSchema = z.array(z.string());
    
    zodSchema[key] = fieldSchema;
  }
  
  return tool({
    name: toolName,
    description: toolDescription,
    parameters: z.object(zodSchema),
    execute: async (args: any) => { /* ... */ }
  });
}
```

The agent runs with the user message:

```typescript
export async function nutrition_planner(args: NutritionPlannerArgs): Promise<ToolResult> {
  const run = (args.run_block || '').trim();
  const cover = (args.what_to_cover || '').trim();
  
  return {
    name: 'nutrition_planner',
    summary: `Hydration plan for: ${run}`,
    hydrationPlan: {
      preRun: 'Drink 200–300 ml water 20–30 min before start.',
      duringRun: 'Sip 100–200 ml every 15–20 min',
      electrolytes: 'Add 200–300 mg sodium per hour'
    }
  };
}
```

The orchestrator tracks each tool call:

```typescript
addSpanToTrace({
  name: `tool_call_${toolName}`,
  status: 'success',
  startedAt: toolStart,
  endedAt: toolStart,
  content: {
    type: 'Tool',
    input: { toolName, arguments: args },
    output: { called: true }
  }
});
```

## Observability

Traces capture the complete flow:

```typescript
export function createTrace(name: string, projectId: string, promptId?: string): Trace {
  return {
    name,
    status: 'success',
    startedAt: Date.now(),
    endedAt: 0,
    referenceId: uuidv4(),
    spans: [],
    projectId,
    promptId,
    sessionId: uuidv4()
  };
}
```

Spans are added for each operation:

```typescript
addSpan(trace, {
  name: 'pinecone_query',
  status: 'success',
  startedAt: startTime,
  endedAt: endTime,
  content: {
    type: 'Retrieval',
    input: { top_k: 5, query: userMessage },
    output: { matchesCount: matches.length }
  }
});
```

# Making It Work in Production

How do you take an Agentic RAG system from prototype to production? Focus on reliability. Focus on performance. Focus on monitoring.

## Reliability

Production systems must handle errors gracefully. Tool calls can fail. Retrieval can fail. Language model calls can fail. Implement error handling in tool execution:

```typescript
execute: async (args: any) => {
  let result: any;
  let status: 'success' | 'error' = 'success';
  
  try {
    switch (toolName) {
      case 'weather_checker':
        result = await weather_checker(args);
        break;
      default:
        throw new Error(`Unknown tool: ${toolName}`);
    }
  } catch (error) {
    status = 'error';
    addSpanToTrace({
      name: `tool_response_${toolName}`,
      status: 'error',
      content: {
        type: 'Tool',
        output: { error: error.message }
      }
    });
    throw error;
  }
  return result;
}
```

If retrieval fails, continue without context:

```typescript
try {
  const matches = await retrieveTopK(5, trace, userMessage);
  // ... assemble context
} catch (e) {
  ragStatus = 'error';
  ragSummary = `RAG retrieval error: ${e.message}`;
  // Continue without RAG context
}
finalSystemMessage = ragSummary 
  ? `${systemMessage}\n\n[RAG_CONTEXT]\n${ragSummary}` 
  : systemMessage;
```

## Performance

Production systems must be fast. Optimize retrieval with caching:

```typescript
const queryCache = new Map<string, any>();

async function retrieveTopKWithCache(query: string, topK: number) {
  const cacheKey = `${query}:${topK}`;
  if (queryCache.has(cacheKey)) {
    return queryCache.get(cacheKey);
  }
  const results = await retrieveTopK(topK, trace, query);
  queryCache.set(cacheKey, results);
  return results;
}
```

Execute tools in parallel when independent:

```typescript
const toolResults = await Promise.all([
  weather_checker(weatherArgs),
  nutrition_planner(nutritionArgs)
]);
```

Track latency in spans:

```typescript
const startTime = now();
// ... operation ...
const endTime = now();
addSpanToTrace({
  name: 'operation_name',
  startedAt: startTime,
  endedAt: endTime,
  // Latency = endTime - startTime
});
```

# Monitoring

Image: https://a-us.storyblok.com/f/1023026/2330x1684/88978fd626/dashboaard.png

_Adaline's dashboard allows you to monitor important metrics in real-time._

Production systems need monitoring. Meaning, you should have a proper dashboard to track key metrics. Track and monitor trends, identify peak times, and track costs.

## Gradual Rollout

Do not launch to everyone at once. Start with internal testing. Move to beta users. Gradually increase traffic. Week one and two: Internal testing with ten percent of queries. Catch obvious bugs. Verify basic functionality.

Week three and four: Beta users with twenty-five percent of queries. Gather feedback. Monitor performance.

Week five and six: Gradual increase to one hundred percent. Watch metrics closely and be ready to roll back. This approach reduces risk. It catches problems early and allows adjustments before full launch.

# Common Mistakes

What mistakes do teams make when building Agentic RAG systems? Learn from others. Avoid these pitfalls.

## Over-Engineering Routing

Some teams build complex routing systems. They use machine learning models. They add multiple classification layers. Start simple instead. Use keyword detection. It works for most cases and only upgrade if needed. When to upgrade? Routing accuracy drops below eighty-five percent. False positive rates are high. Query patterns become complex.

## Ignoring Costs

Some teams do not monitor costs. They build the system. They deploy it. Costs spiral out of control. Set up cost monitoring from day one. Essentially, tracking costs per query. Also, set budget alerts and review costs weekly. Implement cost optimizations, such as caching frequently queried queries. Use smaller models for routing. Optimize retrieval parameters.

## Poor Error Handling

Some systems fail when one component errors. A tool failure stops everything. A retrieval failure stops everything. Implement graceful degradation. If retrieval fails, continue without context. If a tool fails, continue without that tool. Always return something useful. Implement retry logic. Transient failures should retry. Use exponential backoff, like limit attempts. Implement user-friendly error messages. Explain what went wrong. Suggest alternatives. Do not show technical details.

## Neglecting User Experience

Some teams focus on technical implementation. They ignore user experience. Users get confused. Show loading states. Indicate when tools are running. Show progress. Keep users informed. Provide source attribution. Show where the information came from. Build trust. Enable verification. Explain tool usage. Tell users what tools are being used. Explain why. Build understanding. Collect user feedback. Ask for ratings. Monitor comments. Act on suggestions.

## Insufficient Testing

Some teams deploy without thorough testing. They test happy paths only. Production reveals problems. Test all components. Test routing logic. Test retrieval. Test tools. Test error handling. Test integration. Test the full flow. Test edge cases. Test error scenarios. Test under load. Simulate production traffic. Measure performance. Identify bottlenecks.

## Tool Overload

Some teams add too many tools. They think more tools mean more capabilities. The agent gets confused. Start with two or three essential tools. Add tools incrementally. Monitor usage. Remove unused tools. Each tool should have a clear purpose. Each tool should be well tested. Each tool should add value.

# Conclusion

Agentic RAG represents the next step in AI systems. **It combines retrieval with intelligence** and combines tools with autonomy. It adapts to each query. Building such systems requires understanding. Understanding of routing, retrieval, agents, and of tools.

Adaline provides the infrastructure. It handles prompt deployment. It manages tool integration. It tracks performance. It simplifies development. The path forward is clear. Start simple. Add complexity gradually. Monitor everything. Iterate based on results. The benefits are real. Costs drop. Latency improves. Quality increases. Users are happier.