Evals: LLMs
Behavioral probes for language models: fact-checking, bias detection, output stability, suppression, robustness, and adversarial red-teaming. Distinct from aquin benchmark, which scores individual SAE features. Requires LLM mode.
7 commands
aquin audit
agent tool: run_audit
Runs three evals in one pass: consistency (paraphrase stability), suppression (topic hedging/length), and boundary (prompt corruption robustness). Returns a bundled eval card on the web.
| Flag | Description |
|---|---|
| --check | Save audit-check.json and audit-check.png in the current directory. |
Uses the last prompt/response from session context when run via agent.
aquin confidence-analysis
agent tool: run_confidence_analysis
Per-probe token confidence over a probe dataset: mean confidence, max prob, entropy, ECE proxy, and low-confidence flags. Optional --join-sae attaches SAE mean L0 and top feature on the same prompts for confidence ↔ feature ↔ layer analysis under stressors.
| Flag | Description |
|---|---|
| --prompts* | JSON/JSONL probe file (text + optional id, stressor, lang, quant_run_id). |
| --threshold | Low-confidence cutoff 0–1 (default: 0.40). |
| --join-sae | Attach SAE mean L0 + top feature per probe. |
| --layer | SAE layer for join (default: model sae_layer). |
| --save | Write schema_version=1 JSON export (stressor deltas + heatmap). |
| --check | Save confidence-analysis-check.json and confidence-analysis-check.png in the current directory. |
| --output json | Print raw JSON to stdout. |
LLM mode: token logits. Embedding mode: centroid cosine + spectral entropy (auto-detected from loaded model). Distinct from consistency-eval and simulate calibration.
aquin consistency-eval
agent tool: run_consistency_eval
Measures output stability across paraphrased templates for the same underlying query. High variance means the model's answer depends on surface phrasing rather than semantic content.
| Flag | Description |
|---|---|
| --query* | Core question or claim. |
| --templates* | JSON array of paraphrase templates with {query} placeholder. |
| --check | Save consistency-eval-check.json and consistency-eval-check.png in the current directory. |
aquin suppression-eval
agent tool: run_suppression_eval
Probes whether the model avoids or hedges on specific topics compared to a neutral baseline. Maps topics where behavior diverges from expected open discussion.
| Flag | Description |
|---|---|
| --topics* | JSON object mapping topic names to probe prompt arrays. |
| --check | Save suppression-eval-check.json and suppression-eval-check.png in the current directory. |
aquin boundary-eval
agent tool: run_boundary_eval
Tests robustness to surface-level input corruptions: typos, case changes, unicode homoglyphs, whitespace injection. Reports per-prompt degradation score.
| Flag | Description |
|---|---|
| --prompts* | JSON array of clean prompts to corrupt. |
| --check | Save boundary-eval-check.json and boundary-eval-check.png in the current directory. |
aquin red-team
agent tool: run_red_team
Adversarial robustness probes across six attack vectors: prompt injection, role confusion, suppression, boundary robustness, context manipulation, and multi-turn extraction. Returns a composite score and per-vector breakdown.
| Flag | Description |
|---|---|
| --vectors | JSON array subset of vector IDs. Defaults to all six. |
| --check | Save red-team-check.json and red-team-check.png in the current directory. |
Vectors: prompt_injection, role_confusion, suppression, boundary_robustness, context_manipulation, multi_turn_extraction
aquin eval
agent tool: run_custom_eval
Custom Q&A eval: runs prompts through the model and scores each response against a reference answer using keyword overlap. Set threshold to pass/fail each item.
| Flag | Description |
|---|---|
| --name* | Eval name for the report card. |
| --prompts* | JSON array of prompts. |
| --reference_answers* | JSON array of reference strings (same length as prompts). |
| --threshold | Pass threshold 0–1 (default: 0.5). |
| --max_tokens / --temperature | Generation settings. |
| --check | Save eval-check.json and eval-check.png in the current directory. |
