Response-level LLM guardrail

Verify the answer · gate the action

Catch the hallucination before it ships.

Director-AI scores every answer against retrieved evidence through five tiers — rules, embeddings, NLI — and halts unsupported output at the token level. The commercial edition extends the same gate to an agent's real-world actions.

14.6 msverify latency, NLI tier
75.8%balanced accuracy, LLM-AggreFact
<0.5 msheuristic tier
FreeLite tier on PyPI

Try the gate

Verify a claim, or gate a risky action. Watch it pass or halt.

An illustrative, client-side version of the gate logic — production uses real NLI scoring and policy rules. Nothing you type leaves your browser.

Try editing the claim — drop “audited by KPMG” and it passes.

Three editions

Start free. Grow into output, then action control.

One guardrail, three commitments — drop in a three-line guard, run the full open core, or gate an autonomous agent's real-world actions.

01 / entry

Director-Lite

A three-line guard facade with a model-free heuristic default and a facts/RAG handoff. Optional NLI upgrade when you need it.

Free · pip install director-ai-lite

What you get →
02 / output

Director-AI

The open-core runtime: 5-tier scoring, REST/gRPC server, SDK and framework integrations, evidence packets, and the opt-in token-level halt.

Free core · commercial licence to ship closed-source

Compare licences →
03 / action

Director-Class-AI

Runtime action control. Reviews high-impact shell, SQL, infrastructure, API, and MCP actions before dispatch; routes high-risk ones to human approval — then pairs with HushLine to contain the execution: secrets redacted from output, directories gated.

Commercial engagement · custom

Talk to us →

Quickstart

Three lines between your model and a hallucination.

pip install director-ai

from director_ai import guard

verdict = guard(answer, evidence)
if not verdict.supported:
    answer = verdict.safe_fallback   # halt the unsupported claim

guard() returns a verdict with the score, the tier that decided it, and the evidence behind it — so every decision is auditable.

Point it at Remanentia recall to check answers against your own indexed sources, not just the prompt.

All integration surfaces
Works with any modelOpenAI, Anthropic, or a local model — the guard sits beside your model, it doesn't replace it.
Runs where your agent runsSDK, middleware, or sidecar. Self-hosted and local-first by default, no mandatory SaaS.
Your data stays localEvidence and prompts never have to leave your network to be checked.
Open source on PyPI8,253 tests12 Rust acceleratorsscored on LLM-AggreFactv3.15.3

How the gate works

Five tiers, cheapest first. Stop as soon as you can.

Each answer climbs only as far as it needs to. Rules and heuristics settle the easy cases for free; embeddings and NLI handle the rest. The streaming halt cuts an unsupported claim mid-token instead of after it ships.

1Rules & format checks reject malformed or policy-violating output for free.
2Heuristics score claims against compiled facts without a model call.
3Embeddings measure semantic support from the retrieved evidence.
4NLI decides entailment vs contradiction on the claims that are still uncertain.
5The streaming halt stops an unsupported answer at the token level, before it leaves.

No marketing math

Most guardrails sell a “99% reliability” number. We publish the command.

Director-AI's headline is a named public benchmark — balanced accuracy on LLM-AggreFact — not a round figure with no method behind it. Every number ships with the script that produced it, and the streaming halt is labelled opt-in, not sold as a silent guarantee.

A named public benchmark (LLM-AggreFact) — not a private “reliability” score.
A reproducibility command beside every number.
The opt-in halt is marked opt-in, not a guarantee.
Latency you can verify: 14.6 ms NLI, <0.5 ms heuristic.

How it compares

Where Director-AI is different.

Generalised from the public positioning of hallucination-guardrail vendors as of June 2026. Specifics vary by plan.

 Director-AITypical vendor
Verify latency14.6 ms NLI · <0.5 ms heuristic50–200 ms inline
Headline metricNamed public benchmark + repro commandPrivate “reliability %”
PricingPublic, one-click checkout“Contact sales”
DeploymentLocal-first, self-hostedSaaS-first
SourceOpen core on PyPIClosed
EntryFree (Lite) or CHF 290/yrDemo-gated

What ships in the core

A guardrail you can run anywhere your agent runs.

guard() SDK

One call wraps any generation path and returns a verdict with the supporting evidence.

REST & gRPC server

Run the guard as a sidecar service; FastAPI middleware drops it into existing apps.

Evidence packets

Every verdict carries the sources and scores behind it, so a decision is auditable.

Injection detection

Catches prompt-injection attempts in tool output and retrieved content.

Agent & MCP preflight

Checks an agent's planned tool call before it runs, not after.

Rust accelerators

12 Rust-accelerated compute paths keep the hot scoring loops fast.

Ready to ship

Try it on PyPI. Buy the licence when you go closed-source.