The Aquin Python SDK
python sdktraining metricsmodel diffsaedataset auditwandbmlflow

The Python SDK

Aquin Labs · April 2026

install

pip install aquin
# with PyTorch helpers:
pip install "aquin[torch]"

Aquin from inside your training script

The Aquin dashboard can run on any model or checkpoint you already have. The SDK is how you connect the training loop to it. Three lines, attach, step, stop, and every metric Aquin tracks starts streaming live: loss history, gradient norms per layer, weight norms, optimizer state, learning rate, dead layers, activation statistics. No separate logging infra required.

The same package also exposes the post-training tools: ModelDiff for behavioral evaluation of a fine-tuned checkpoint, SAEDiff for feature-level activation analysis, and DataSession for dataset inspection. All of it posts to the same dashboard and produces structured output you can use programmatically.

AQUIN · EARLY ACCESS
Start streaming metrics from your next training run
join waitlist

Quick start

Two calls wrap the entire training loop. aquin.attach() takes your model and optimizer and immediately posts a meta event: total and trainable parameter counts, model class name, whether it's a PEFT/LoRA model. session.step() then fires once per step, collecting everything the tracer recorded since the last flush.

session lifecycle

attach()model + optimizerstep()per training stepsave()checkpointstop()session endevery step: loss · lr · gradNorms · weightNorms · activations → /api/training/ingest
import aquin

session = aquin.attach(model, optimizer, api_key="aq-...")

for epoch in range(num_epochs):
    for batch, (inputs, labels) in enumerate(dataloader):
        loss = train_step(inputs, labels)
        session.step(loss,
            epoch=epoch,
            batch=batch,
            total_batches=len(dataloader),
        )

session.stop()

Open the Training tab in the dashboard, connect with your API key, and the metrics stream in live. No polling, no log files to tail. The step payload lands directly in the UI as it's posted.

What streams on every step

Each call to step() collects and posts a structured payload. Gradient and weight norms are recorded per named parameter, so you can see which layers are drifting or stalling individually, not just the global max. The deadLayers field flags any parameter with a gradient norm of exactly zero. gradHistory keeps a rolling window of the last 20 values per layer, which is enough to distinguish a transient spike from a sustained instability.

step payload fields

loss0.3241lossStatsmin/max/mean/δlearning_rate2e-05stepMs148gradNorms{ layer: norm }weightNorms{ layer: norm }maxGrad0.0412deadLayers[]optimizerStatemom · var normsactivationsmean/std/dead %gradHistorylast 20 per layerelapsedSec412.3epoch2batch184totalBatches500

Activation tracking

Call session.trackActivations() once after attaching and Aquin hooks into every Linear, Conv2d, and LayerNorm layer. From that point, every Nth step includes per-layer activation mean, standard deviation, and dead-neuron ratio in the step payload. The forward pass runs under torch.no_grad() so it doesn't affect your training gradients.

# sample_input can be a tensor or a dict (for HF-style forward)
session.trackActivations(sample_input=next(iter(loader)), every=20)

The every parameter controls how often the capture runs. For a model with many layers, running every step is unnecessary. Every 10 to 20 steps is enough to catch dead-neuron accumulation before it compounds.

SDK classes

The four entry points cover the full training-to-evaluation pipeline. attach() returns a Session that handles everything during training. The three analysis classes, ModelDiff, SAEDiff, and DataSession, run after training on checkpoints and datasets.

class map

attach()step()trackActivations()addSink()save()diff()halt()resume()stop()ModelDiffrun(prompts)SAEDiffrun(prompts)DataSessioninspect(checks)

ModelDiff

ModelDiff compares a fine-tuned checkpoint against its base model using the Aquin VM. It runs three behavioral evaluations: consistency (does the model behave predictably on known prompts), suppression (has fine-tuning reduced or redirected knowledge), and robustness (does behavior hold under paraphrasing). Results stream to the Training tab and come back as a structured dict.

diff = aquin.ModelDiff(
    base="meta-llama/Llama-3.2-1B",
    ft_path="./checkpoints/final",
    api_key="aq-...",
)
result = diff.run(prompts=["Summarise the French Revolution"])

If ft_path is a directory, the SDK picks the first file with extension .safetensors, .bin, or .pt. The return value includes consistency_score, suppression_score, and robustness_score alongside per-prompt breakdowns for each.

SAEDiff

SAEDiff shows which Sparse Autoencoder features changed most between the base model and your fine-tuned checkpoint. Given prompts, it runs activations through both models and computes per-feature deltas across the SAE decomposition. The result tells you not just that the model changed, but which concepts, as represented by the SAE, shifted and by how much.

sae = aquin.SAEDiff(
    base="meta-llama/Llama-3.2-1B",
    ft_path="./checkpoints/final",
    api_key="aq-...",
)
result = sae.run(prompts=["Tell me about X"])
layer16
n_features16384
n_changed412
mean_abs_delta0.0023
max_abs_delta0.184

SAE diffs are only available for models in the Aquin registry. Unsupported models return an empty dict with a logged warning.

DataSession

DataSession runs the same inspection pipeline available in the dashboard directly on a local file. Point it at a .jsonl, .json, .csv, or .tsv file and call inspect(). Five checks, PII, toxicity, bias, synthetic detection, and compliance, run in parallel threads, with compliance aggregating after the others complete. Each result streams to the Training tab as it lands.

ds = aquin.DataSession(
    dataset_path="./training_data.jsonl",
    api_key="aq-...",
    max_rows=300,
)
results = ds.inspect()
# or run a subset:
results = ds.inspect(checks=["pii", "toxicity"])

Export sinks

Sinks mirror the step payload to external loggers without any additional instrumentation. The same dict posted to Aquin on every step is forwarded to whatever sinks are attached. WandbSink and MLflowSink are built in; both accept standard constructor kwargs for their respective init calls.

step payload routing

Sessionstep payloadAquin API/api/training/ingestWandbSinkwandb.log()MLflowSinkmlflow.log_metric()
session.addSink(aquin.WandbSink(project="my-project"))
session.addSink(aquin.MLflowSink(run_name="run-1"))

API event types

Everything the SDK posts goes to /api/training/ingest with a type field. The dashboard routes each event type to the corresponding panel. You can also read the raw stream directly if you're building on top of the API.

event types → /api/training/ingest

typesent bydescription
metaattach()Model param counts, class name, LoRA flag
stepsession.step()Full per-step snapshot: loss, grads, weights, LR, activations
statehalt/resume/stopSession lifecycle change
diffsession.diff()Weight delta between two checkpoints
compareSession.compare()Side-by-side loss comparison of two sessions
modelDiffModelDiff.run()Consistency / suppression / robustness scores
saeDiffSAEDiff.run()SAE feature activation deltas
dataSessionDataSession.inspect()Per-check dataset audit result

Full example

Training loop with activation tracking and a W&B sink, followed by post-training analysis using all three classes.

import aquin
from transformers import AutoModelForCausalLM
from torch.optim import AdamW
from torch.utils.data import DataLoader

model     = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
optimizer = AdamW(model.parameters(), lr=2e-5)
loader    = DataLoader(dataset, batch_size=4)

session = aquin.attach(model, optimizer, api_key="aq-...")
session.trackActivations(sample_input=next(iter(loader)), every=20)
session.addSink(aquin.WandbSink(project="llama-ft"))

for epoch in range(3):
    for batch_idx, batch in enumerate(loader):
        optimizer.zero_grad()
        outputs = model(**batch)
        loss    = outputs.loss
        loss.backward()
        optimizer.step()

        session.step(loss,
            epoch=epoch,
            batch=batch_idx,
            total_batches=len(loader),
        )

session.save("./checkpoints/final")
session.stop()

# Post-training analysis
aquin.ModelDiff(
    base="meta-llama/Llama-3.2-1B",
    ft_path="./checkpoints/final",
    api_key="aq-...",
).run(prompts=["Summarise the French Revolution"])

aquin.SAEDiff(
    base="meta-llama/Llama-3.2-1B",
    ft_path="./checkpoints/final",
    api_key="aq-...",
).run(prompts=["Summarise the French Revolution"])

aquin.DataSession(
    dataset_path="./training_data.jsonl",
    api_key="aq-...",
).inspect()
AQUIN · EARLY ACCESS
Start streaming metrics from your next training run
join waitlist
Aquin Labsaquin@aquin.app

Join the Aquin Research Community

LLM researchers & ML engineers — open research, fellowships, hackathons, and early beta access.

Join Discord

Not sure if Aquin is right for you?

StatusPoliciesResearchCommunity·© 2026 Aquin. All rights reserved.

Aquin