Skip to main content
Prompt changes can regress quality in ways that are invisible without evaluation. A tweak that improves one flow can break another, or push cost per request over budget. CI/CD integration lets you run Adaline evaluations inside your existing pipeline — so every prompt deployment is validated against your dataset and evaluators before it reaches production.

How it works

Adaline’s deployment webhooks and evaluation API connect directly to your CI/CD platform. The flow:
  1. A prompt is deployed to a staging environment in Adaline
  2. Adaline fires a webhook to your CI platform (e.g. GitHub Actions, GitLab CI, Jenkins)
  3. Your pipeline calls the Adaline API to run evaluations against a dataset
  4. Evaluation scores are checked against thresholds you define
  5. If scores pass — the pipeline promotes the deployment or updates your app config. If scores fail — the pipeline blocks the release with a clear error
This gives you the same quality gate you have in the Adaline Dashboard, but fully automated and embedded in your development workflow.

Triggering evaluations from CI

Webhook trigger

When you deploy a prompt, Adaline fires a create-deployment webhook to every configured endpoint. Use this to trigger your pipeline automatically:
name: Prompt deployment gate

on:
  repository_dispatch:
    types: [adaline-prompt-deployed]
  workflow_dispatch:
    inputs:
      environment:
        description: 'Deployment environment'
        required: true
        default: 'staging'
If webhooks are not configured, you can trigger the same workflow manually via workflow_dispatch or on a schedule.

Fetching the deployment

Pull the deployed prompt from the Adaline API using your prompt ID and environment:
curl -H "Authorization: Bearer $ADALINE_API_KEY" \
  "https://api.adaline.ai/v2/deployments?promptId=${PROMPT_ID}&deploymentId=latest&deploymentEnvironmentId=${ENV_ID}"
See Get deployment for the full API reference.

Running evaluations

Trigger an evaluation run against a dataset connected to your prompt:
curl -X POST \
  -H "Authorization: Bearer $ADALINE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"datasetId": "'$DATASET_ID'"}' \
  "https://api.adaline.ai/v2/prompts/${PROMPT_ID}/evaluations"
Poll the evaluation status until complete, then fetch results:
curl -H "Authorization: Bearer $ADALINE_API_KEY" \
  "https://api.adaline.ai/v2/prompts/${PROMPT_ID}/evaluations/${EVAL_ID}/results"
See Create evaluation and Get evaluation results for full API references.

Gating deployments on evaluation results

The core of the CI/CD gate is a threshold check. Parse the evaluation results and compare each evaluator’s score against the minimum you require:
- name: Check thresholds
  run: |
    HELPFULNESS="${{ steps.eval.outputs.EVAL_HELPFULNESS }}"
    TONE="${{ steps.eval.outputs.EVAL_TONE }}"

    if (( $(echo "$HELPFULNESS < 0.7" | bc -l) )) || \
       (( $(echo "$TONE < 0.8" | bc -l) )); then
      echo "::error::Eval failed. Helpfulness=$HELPFULNESS (min 0.7), Tone=$TONE (min 0.8)"
      exit 1
    fi
When the check passes, the pipeline continues — updating your app’s prompt config, committing the change, and triggering the release:
- name: Update prompt config
  if: success()
  run: |
    jq --arg id "$DEPLOYMENT_ID" \
      '.response_generator.deployment_id = $id' config/prompts.json > tmp && mv tmp config/prompts.json

- name: Commit and push
  if: success()
  run: |
    git config user.name "github-actions"
    git config user.email "actions@github.com"
    git add config/prompts.json
    git commit -m "chore: promote prompt deployment (evals passed)"
    git push
When the check fails, the job exits with a non-zero code and the release steps are skipped. The error message surfaces in your CI dashboard so the team knows exactly which evaluator failed and by how much.

What you can gate on

Adaline evaluations support multiple evaluator types, all of which can serve as CI/CD gates:
Evaluator typeGate example
LLM-as-a-JudgeBlock if helpfulness or factuality score drops below 0.7
JavaScriptBlock if output format validation fails
Text MatcherBlock if required phrases are missing or banned patterns appear
CostBlock if average cost per request exceeds $0.005
LatencyBlock if p95 latency exceeds 3 seconds
Response LengthBlock if responses consistently exceed or fall short of length targets
Combine multiple evaluators to create a comprehensive quality gate. A prompt only ships if every evaluator meets its threshold.

CI platform examples

GitHub Actions

GitHub Actions is the most common integration pattern. Use repository_dispatch to receive Adaline webhooks and run evaluations in a workflow job.

Setup

Store the following as repository secrets and variables in your GitHub repo:
NameTypeValue
ADALINE_API_KEYSecretYour Adaline API key
ADALINE_PROMPT_IDVariableThe prompt ID to evaluate
ADALINE_DEPLOYMENT_ENV_IDVariableThe deployment environment ID (e.g. staging)
ADALINE_DATASET_IDVariableThe dataset ID to run evaluations against

Complete workflow

Below is a complete workflow you can copy into .github/workflows/prompt-gate.yml. It receives the webhook (or runs manually), pulls the deployed prompt, runs evaluations, checks scores against thresholds, and only updates your prompt config if everything passes.
name: Prompt deployment gate

on:
  repository_dispatch:
    types: [adaline-prompt-deployed]
  workflow_dispatch:
    inputs:
      environment:
        description: 'Deployment environment (e.g. staging or production)'
        required: true
        default: 'staging'

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - name: Pull deployed prompt from Adaline API
        id: deployment
        env:
          ADALINE_API_KEY: ${{ secrets.ADALINE_API_KEY }}
          ADALINE_PROMPT_ID: ${{ vars.ADALINE_PROMPT_ID }}
          ADALINE_DEPLOYMENT_ENV_ID: ${{ vars.ADALINE_DEPLOYMENT_ENV_ID }}
        run: |
          RESP=$(curl -sS -H "Authorization: Bearer $ADALINE_API_KEY" \
            "https://api.adaline.ai/v2/deployments?promptId=${ADALINE_PROMPT_ID}&deploymentId=latest&deploymentEnvironmentId=${ADALINE_DEPLOYMENT_ENV_ID}")
          echo "deployment_id=$(echo "$RESP" | jq -r '.id')" >> $GITHUB_OUTPUT
          echo "project_id=$(echo "$RESP" | jq -r '.projectId')" >> $GITHUB_OUTPUT

      - name: Run evaluation via Adaline API
        id: eval
        env:
          ADALINE_API_KEY: ${{ secrets.ADALINE_API_KEY }}
          ADALINE_PROMPT_ID: ${{ vars.ADALINE_PROMPT_ID }}
          ADALINE_DATASET_ID: ${{ vars.ADALINE_DATASET_ID }}
        run: |
          EVAL_RESP=$(curl -sS -X POST \
            -H "Authorization: Bearer $ADALINE_API_KEY" \
            -H "Content-Type: application/json" \
            -d "{\"datasetId\": \"$ADALINE_DATASET_ID\"}" \
            "https://api.adaline.ai/v2/prompts/${ADALINE_PROMPT_ID}/evaluations")
          EVAL_ID=$(echo "$EVAL_RESP" | jq -r '.id')
          echo "EVAL_ID=$EVAL_ID" >> $GITHUB_ENV
          # Poll GET .../evaluations/$EVAL_ID until status=completed, then fetch results:
          # curl -sS -H "Authorization: Bearer $ADALINE_API_KEY" \
          #   "https://api.adaline.ai/v2/prompts/${ADALINE_PROMPT_ID}/evaluations/${EVAL_ID}/results"
          # Parse per-evaluator scores and set outputs accordingly
          echo "EVAL_HELPFULNESS=0.85" >> $GITHUB_OUTPUT
          echo "EVAL_TONE=0.9" >> $GITHUB_OUTPUT
          echo "EVAL_COST_PASS=1" >> $GITHUB_OUTPUT

      - name: Check thresholds
        run: |
          HELPFULNESS="${{ steps.eval.outputs.EVAL_HELPFULNESS }}"
          TONE="${{ steps.eval.outputs.EVAL_TONE }}"
          COST_PASS="${{ steps.eval.outputs.EVAL_COST_PASS }}"
          THRESHOLD_HELP=0.7
          THRESHOLD_TONE=0.8
          if (( $(echo "$HELPFULNESS < $THRESHOLD_HELP" | bc -l) )) || \
             (( $(echo "$TONE < $THRESHOLD_TONE" | bc -l) )) || \
             [ "$COST_PASS" != "1" ]; then
            echo "::error::Eval failed. Helpfulness=$HELPFULNESS (min $THRESHOLD_HELP), Tone=$TONE (min $THRESHOLD_TONE), Cost pass=$COST_PASS"
            exit 1
          fi

      - name: Checkout repo
        if: success()
        uses: actions/checkout@v4

      - name: Update prompt config JSON
        if: success()
        env:
          DEPLOYMENT_ID: ${{ steps.deployment.outputs.deployment_id }}
        run: |
          jq --arg id "$DEPLOYMENT_ID" '.response_generator.deployment_id = $id' config/prompts.json > tmp && mv tmp config/prompts.json

      - name: Commit and push
        if: success()
        run: |
          git config user.name "github-actions"
          git config user.email "actions@github.com"
          git add config/prompts.json
          git commit -m "chore: promote prompt deployment (evals passed)"
          git push

      - name: Trigger release pipeline
        if: success()
        run: |
          echo "Trigger release (e.g. workflow_dispatch or API call)"
The evaluation step above includes placeholder score values. In a production workflow, you would poll GET .../evaluations/$EVAL_ID until status is completed, then parse the evaluation results to extract real per-evaluator scores.

When evaluation fails

When the threshold check fails, the job exits with exit 1 and the Checkout, Update prompt config, Commit and push, and Trigger release steps are all skipped. The ::error:: message surfaces in your GitHub Actions dashboard so the team knows exactly which evaluator failed and by how much. Fix the prompt or thresholds in Adaline and redeploy to try again.

GitLab CI

Use a webhook-triggered pipeline or a scheduled job that calls the Adaline API. The same API calls and threshold logic apply — fetch the deployment, run evaluations, check scores, and gate the release.

Jenkins

Trigger a Jenkins pipeline via a webhook endpoint or a polling job. Use curl in shell steps to call the Adaline API, then parse evaluation results and set the build status based on threshold checks.

Other platforms

Any CI/CD platform that supports webhooks or scheduled triggers and can make HTTP requests works with Adaline’s evaluation API. The integration pattern is always the same: receive trigger, call API, check scores, gate release.

Next steps

Configure webhooks

Set up real-time deployment notifications for your CI pipeline.

Create evaluation API

API reference for triggering evaluations programmatically.

Setup evaluators

Configure the evaluators that power your CI/CD gate.