Research Community Policies Join Waitlist

python sdktraining metricsmodel diffsaedataset auditwandbmlflow

The Python SDK

Aquin Labs · April 2026

links

install

pip install aquin
# with PyTorch helpers:
pip install "aquin[torch]"

Aquin from inside your training script

The Aquin dashboard can run on any model or checkpoint you already have. The SDK is how you connect the training loop to it. Three lines, attach, step, stop, and every metric Aquin tracks starts streaming live: loss history, gradient norms per layer, weight norms, optimizer state, learning rate, dead layers, activation statistics. No separate logging infra required.

The same package also exposes the post-training tools: ModelDiff for behavioral evaluation of a fine-tuned checkpoint, SAEDiff for feature-level activation analysis, and DataSession for dataset inspection. All of it posts to the same dashboard and produces structured output you can use programmatically.

AQUIN · EARLY ACCESS

Start streaming metrics from your next training run

join waitlist

Quick start

Two calls wrap the entire training loop. aquin.attach() takes your model and optimizer and immediately posts a meta event: total and trainable parameter counts, model class name, whether it's a PEFT/LoRA model. session.step() then fires once per step, collecting everything the tracer recorded since the last flush.

session lifecycle

import aquin

session = aquin.attach(model, optimizer, api_key="aq-...")

for epoch in range(num_epochs):
    for batch, (inputs, labels) in enumerate(dataloader):
        loss = train_step(inputs, labels)
        session.step(loss,
            epoch=epoch,
            batch=batch,
            total_batches=len(dataloader),
        )

session.stop()

Open the Training tab in the dashboard, connect with your API key, and the metrics stream in live. No polling, no log files to tail. The step payload lands directly in the UI as it's posted.

What streams on every step

Each call to step() collects and posts a structured payload. Gradient and weight norms are recorded per named parameter, so you can see which layers are drifting or stalling individually, not just the global max. The deadLayers field flags any parameter with a gradient norm of exactly zero. gradHistory keeps a rolling window of the last 20 values per layer, which is enough to distinguish a transient spike from a sustained instability.

step payload fields

Activation tracking

Call session.trackActivations() once after attaching and Aquin hooks into every Linear, Conv2d, and LayerNorm layer. From that point, every Nth step includes per-layer activation mean, standard deviation, and dead-neuron ratio in the step payload. The forward pass runs under torch.no_grad() so it doesn't affect your training gradients.

# sample_input can be a tensor or a dict (for HF-style forward)
session.trackActivations(sample_input=next(iter(loader)), every=20)

The every parameter controls how often the capture runs. For a model with many layers, running every step is unnecessary. Every 10 to 20 steps is enough to catch dead-neuron accumulation before it compounds.

SDK classes

The four entry points cover the full training-to-evaluation pipeline. attach() returns a Session that handles everything during training. The three analysis classes, ModelDiff, SAEDiff, and DataSession, run after training on checkpoints and datasets.

class map

ModelDiff

ModelDiff compares a fine-tuned checkpoint against its base model using the Aquin VM. It runs three behavioral evaluations: consistency (does the model behave predictably on known prompts), suppression (has fine-tuning reduced or redirected knowledge), and robustness (does behavior hold under paraphrasing). Results stream to the Training tab and come back as a structured dict.

diff = aquin.ModelDiff(
    base="meta-llama/Llama-3.2-1B",
    ft_path="./checkpoints/final",
    api_key="aq-...",
)
result = diff.run(prompts=["Summarise the French Revolution"])

If ft_path is a directory, the SDK picks the first file with extension .safetensors, .bin, or .pt. The return value includes consistency_score, suppression_score, and robustness_score alongside per-prompt breakdowns for each.

SAEDiff

SAEDiff shows which Sparse Autoencoder features changed most between the base model and your fine-tuned checkpoint. Given prompts, it runs activations through both models and computes per-feature deltas across the SAE decomposition. The result tells you not just that the model changed, but which concepts, as represented by the SAE, shifted and by how much.

sae = aquin.SAEDiff(
    base="meta-llama/Llama-3.2-1B",
    ft_path="./checkpoints/final",
    api_key="aq-...",
)
result = sae.run(prompts=["Tell me about X"])

layer16

n_features16384

n_changed412

mean_abs_delta0.0023

max_abs_delta0.184

SAE diffs are only available for models in the Aquin registry. Unsupported models return an empty dict with a logged warning.

DataSession

DataSession runs the same inspection pipeline available in the dashboard directly on a local file. Point it at a .jsonl, .json, .csv, or .tsv file and call inspect(). Five checks, PII, toxicity, bias, synthetic detection, and compliance, run in parallel threads, with compliance aggregating after the others complete. Each result streams to the Training tab as it lands.

ds = aquin.DataSession(
    dataset_path="./training_data.jsonl",
    api_key="aq-...",
    max_rows=300,
)
results = ds.inspect()
# or run a subset:
results = ds.inspect(checks=["pii", "toxicity"])

Export sinks

Sinks mirror the step payload to external loggers without any additional instrumentation. The same dict posted to Aquin on every step is forwarded to whatever sinks are attached. WandbSink and MLflowSink are built in; both accept standard constructor kwargs for their respective init calls.

step payload routing

session.addSink(aquin.WandbSink(project="my-project"))
session.addSink(aquin.MLflowSink(run_name="run-1"))

API event types

Everything the SDK posts goes to /api/training/ingest with a type field. The dashboard routes each event type to the corresponding panel. You can also read the raw stream directly if you're building on top of the API.

event types → /api/training/ingest

typesent bydescription

metaattach()Model param counts, class name, LoRA flag

stepsession.step()Full per-step snapshot: loss, grads, weights, LR, activations

statehalt/resume/stopSession lifecycle change

diffsession.diff()Weight delta between two checkpoints

compareSession.compare()Side-by-side loss comparison of two sessions

modelDiffModelDiff.run()Consistency / suppression / robustness scores

saeDiffSAEDiff.run()SAE feature activation deltas

dataSessionDataSession.inspect()Per-check dataset audit result

Full example

Training loop with activation tracking and a W&B sink, followed by post-training analysis using all three classes.

import aquin
from transformers import AutoModelForCausalLM
from torch.optim import AdamW
from torch.utils.data import DataLoader

model     = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
optimizer = AdamW(model.parameters(), lr=2e-5)
loader    = DataLoader(dataset, batch_size=4)

session = aquin.attach(model, optimizer, api_key="aq-...")
session.trackActivations(sample_input=next(iter(loader)), every=20)
session.addSink(aquin.WandbSink(project="llama-ft"))

for epoch in range(3):
    for batch_idx, batch in enumerate(loader):
        optimizer.zero_grad()
        outputs = model(**batch)
        loss    = outputs.loss
        loss.backward()
        optimizer.step()

        session.step(loss,
            epoch=epoch,
            batch=batch_idx,
            total_batches=len(loader),
        )

session.save("./checkpoints/final")
session.stop()

# Post-training analysis
aquin.ModelDiff(
    base="meta-llama/Llama-3.2-1B",
    ft_path="./checkpoints/final",
    api_key="aq-...",
).run(prompts=["Summarise the French Revolution"])

aquin.SAEDiff(
    base="meta-llama/Llama-3.2-1B",
    ft_path="./checkpoints/final",
    api_key="aq-...",
).run(prompts=["Summarise the French Revolution"])

aquin.DataSession(
    dataset_path="./training_data.jsonl",
    api_key="aq-...",
).inspect()

AQUIN · EARLY ACCESS

Start streaming metrics from your next training run

join waitlist

Aquin Labsaquin@aquin.app

Join the Aquin Research Community

LLM researchers & ML engineers — open research, fellowships, hackathons, and early beta access.

Join Discord

Not sure if Aquin is right for you?

Aquin