Python SDK

The Python SDK package name is redraven and supports Python 3.10+.

Install

Use uv:

uv add redraven

Or with pip:

pip install redraven

Configure

Set your Redraven credentials:

export REDRAVEN_API_KEY="rr_..."
export REDRAVEN_ORGANIZATION_ID="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
export REDRAVEN_BASE_URL="https://app.redraven.fireraven.ai"

Pass only the host URL. The SDK automatically adds the /api/v1 prefix.

The organization ID is the UUID of your Redraven organization (the same scope as the API key). The public API requires X-Organization-Id together with X-API-Key so keys are verified per organization without scanning all keys in the database.

Get the API key and organization id from your Redraven organization settings.

If you pass credentials directly, include organization_id:

client = redraven.Client(
    api_key="rr_...",
    organization_id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    base_url="https://app.redraven.fireraven.ai",
)

Core methods

Typical sequence

Redraven runs in three phases. Do not call wait_for_evaluation_ready or expect a finished eval until call_agent has returned — that return means every case response was submitted; only then does the backend start (or finish) scoring.

Dataset ready — create/generate the test and wait until dataset artifacts exist: generate_test(..., wait_for_dataset=...) and/or wait_for_dataset_ready(test_id).
Agent phase — run your LLM on each case and submit responses: call_agent(test_id, llm, ...) -> RunAgainstClient. When this await completes, the client-response phase is done and evaluation can proceed on the server.
Evaluation phase — scoring runs asynchronously. Either wait then read, or block in one call:
- Wait: wait_for_evaluation_ready(test_id) — call after step 2 (blocks until eval artifacts reach a terminal state).
- Read: get_eval_summary(test_id, ...) — call after waiting if you used wait_for_evaluation_ready; use wait_for_completion=True if you want a single call that waits and then returns the summary.

By default, get_eval_summary without waiting performs a single GET (like generate_test without wait_for_dataset). If eval is not ready yet, you get state="pending" (or another non-terminal state) and summary=None.

All-at-once convenience: generate_and_run_test(generate_kwargs, llm, ...) chains dataset generation, call_agent, waiting for eval, and returning a terminal EvalSummary.

Method reference

Method	Role
`generate_test(generate_kwargs, wait_for_dataset=False, ...) -> str`	Returns `test_id`; optionally waits for dataset materialization.
`wait_for_dataset_ready(test_id, ...) -> None`	Blocks until dataset manifest is ready.
`call_agent(test_id, llm, *, resume=True, ...) -> RunAgainstClient`	Submits all case responses; must complete before eval can finish. With `resume=True` (default) cases that already have an `ok` or `failed` response on the server are skipped; pass `resume=False` to re-run every case. In image-mode tests, cases can include `messages` payloads; declare `messages` in your LLM signature to receive multimodal inputs.
`wait_for_evaluation_ready(test_id, ...) -> None`	Waits for server-side scoring to finish (eval manifest `completed` / `failed`), not for `call_agent`. “Agent done” = `await call_agent` has returned; this is the next phase. On entry it calls `ensure_evaluation_from_client_responses` once so a stuck run can schedule repair after `resume` skipped all POSTs.
`ensure_evaluation_from_client_responses(test_id, *, force=False)`	Asks the backend to insert or repair the pipeline job that builds eval artifacts from submitted client responses. Rarely needed directly; used internally before waiting for eval.
`get_eval_summary(test_id, wait_for_completion=False, ...) -> EvalSummary`	Reads eval summary; non-blocking by default, or pass `wait_for_completion=True` to wait inside this call.
`generate_and_run_test(generate_kwargs, llm, ...) -> EvalSummary`	End-to-end: generate → dataset → agent → wait for eval → summary.

Reading results via HTTP (without SDK helpers)

After evaluation, you can read metrics, recommendations, and the PDF report with plain HTTP (curl or httpx), using the same credentials as above:

HTTP route	Role
`GET /tests/{id}/results`	Dashboard-style pass rates by certification and policy (`pass_rate` as 0.0–1.0)
`GET /tests/{id}/recommendations`	Stored policy recommendations; `limit=0` exports all (`pass_rate` typically 0–100)
`GET /tests/{id}/report/download`	PDF report bytes (after generate + status ready)

Do not confuse these with the SDK eval pipeline:

get_eval_summary and GET /tests/{id}/results/summary?kind=eval — pipeline manifest / worker materialization
GET /tests/{id}/results — stable metrics overview for integrations and exports

The Python SDK does not yet expose typed methods for the three read routes above. Example with httpx:

import os
import httpx

base = os.environ["REDRAVEN_BASE_URL"].rstrip("/")
headers = {
    "X-API-Key": os.environ["REDRAVEN_API_KEY"],
    "X-Organization-Id": os.environ["REDRAVEN_ORGANIZATION_ID"],
}
test_id = "..."

overview = httpx.get(f"{base}/api/v1/tests/{test_id}/results", headers=headers, timeout=30)
overview.raise_for_status()

Pipeline worker and local development

Materializing eval results (GET /tests/{id}/results/summary?kind=eval) is done by the redraven-pipeline-worker process against Postgres (pipeline_jobs). The API and worker must use the same DATABASE_URL; otherwise jobs enqueue in one database while the worker polls another and nothing appears to run.

If every client response is already in object storage and the client manifest is completed, but evaluation never starts, run the worker and retry: wait_for_evaluation_ready triggers POST /tests/{id}/evaluate/ensure-from-client-responses first so the evaluate job can be created or moved back to queued when appropriate. Until the worker processes that job, GET …/summary?kind=eval returns HTTP 404 — that is normal and does not mean the ensure call failed.

Run the worker from the same virtualenv / env as the API, for example: uv run redraven-pipeline-worker.

Image mode and multimodal LLMs

When a test is created with image mode (metadata.modes.image=true), dataset cases can include OpenAI-style messages content (text and/or image blocks). The SDK supports both text-only and multimodal callables:

Text-only callable (works for all tests):
- def llm(prompt: str) -> str
Multimodal callable (recommended for image understanding):
- def llm(prompt: str, messages: list[dict] | None = None) -> str

If your callable does not accept messages, the SDK forwards only prompt and logs a warning when multimodal payloads are present. For image-only rows with no text, the SDK now uses a safe placeholder prompt so runs can still complete and evaluation can be enqueued.

Example multimodal signature:

def my_llm(prompt: str, messages: list[dict] | None = None) -> str:
    payload = messages or [{"role": "user", "content": prompt}]
    # call your provider with payload
    return "..."

Quickstart

import asyncio
import redraven

async def my_llm(prompt: str, messages: list[dict] | None = None) -> str:
    # Call your own LLM provider here.
    _ = messages
    return f"echo: {prompt}"

async def main():
    async with redraven.Client() as client:
        handshake = await client.call_agent(
            test_id="<your-existing-test-id>",
            llm=my_llm,
            concurrency=4,
            retries=2,
        )
        await client.wait_for_evaluation_ready(test_id="<your-existing-test-id>")
        result = await client.get_eval_summary(
            test_id="<your-existing-test-id>",
            expected_cases=handshake.expected_cases,
            allow_partial=True,
        )
        print(f"state={result.state} received={result.received} failed={result.failed}")

asyncio.run(main())

Prepare a test first

Create a test in the Redraven app (or via the SDK), then use its test_id with call_agent(...) and get_eval_summary(...).

Generate tests with the SDK

test_id = await client.generate_test(
    generate_kwargs={
        "project_id": "11111111-1111-1111-1111-111111111111",
        "test_name": "SDK generated test",
        "business_context": "Healthcare SaaS for clinicians.",
        "use_case": "Symptom triage assistant.",
        "certifications": ["HIPAA"],
        "max_policies": 5,
        "max_prompts_per_policy": 2,
    },
    wait_for_dataset=True,
)

generate_test(...) returns the created test_id (string).

You can also run an explicit wait step:

await client.wait_for_dataset_ready(test_id)

Run an existing test

handshake = await client.call_agent(
    test_id=test_id,
    llm=my_llm,
    concurrency=8,
)
await client.wait_for_evaluation_ready(test_id=test_id)
result = await client.get_eval_summary(
    test_id=test_id,
    expected_cases=handshake.expected_cases,
)

Alternatively, without a separate wait call:

result = await client.get_eval_summary(
    test_id=test_id,
    expected_cases=handshake.expected_cases,
    wait_for_completion=True,
)

Resuming an interrupted agent run

call_agent is resumable by default: cases that already have a terminal (ok or failed) response on the server are skipped and the user LLM is not invoked for them. Just call it again with the same test_id:

handshake = await client.call_agent(test_id=test_id, llm=my_llm)

Force a full re-run with resume=False:

handshake = await client.call_agent(test_id=test_id, llm=my_llm, resume=False)

One-call flow (generate + run)

result = await client.generate_and_run_test(
    generate_kwargs={
        "project_id": "11111111-1111-1111-1111-111111111111",
        "test_name": "SDK generated test",
        "business_context": "Healthcare SaaS for clinicians.",
        "use_case": "Symptom triage assistant.",
        "certifications": ["HIPAA"],
        "max_policies": 5,
        "max_prompts_per_policy": 2,
    },
    llm=my_llm,
    concurrency=8,
)

Result fields

After the eval has reached a terminal state, get_eval_summary(...) returns an EvalSummary with:

state, expected_cases, received, failed, failed_case_ids
summary (aggregated evaluation output)
manifest (evaluation trace metadata)

If you call get_eval_summary without waiting and the eval is not ready yet, summary may be None and state may be non-terminal (for example pending).

Common errors

RedravenConfigError: missing API key, organization ID, or base URL
RedravenHTTPError: backend returned non-2xx response
RedravenTimeoutError: raised when wait_for_evaluation_ready(...) or get_eval_summary(..., wait_for_completion=True) does not see a terminal eval state within the timeout
RedravenPartialRunError: raised when allow_partial=False and the eval summary reports failed cases (only applies once the eval has reached a terminal state)

Resumability

If a run is interrupted, call call_agent(...) again with the same test_id. Previously submitted cases are safely reused by the backend.

Notes

The LLM call runs in your process, so your model API key stays local.
my_llm can be sync or async.

Install​

Configure​

Core methods​

Typical sequence​

Method reference​

Reading results via HTTP (without SDK helpers)​

Pipeline worker and local development​

Image mode and multimodal LLMs​

Quickstart​

Prepare a test first​

Generate tests with the SDK​

Run an existing test​

Resuming an interrupted agent run​

One-call flow (generate + run)​

Result fields​

Common errors​

Resumability​

Notes​