
The Python SDK
Aquin Labs · April 2026
install
pip install aquin # with PyTorch helpers: pip install "aquin[torch]"
Aquin from inside your training script
The Aquin dashboard can run on any model or checkpoint you already have. The SDK is how you connect the training loop to it. Three lines, attach, step, stop, and every metric Aquin tracks starts streaming live: loss history, gradient norms per layer, weight norms, optimizer state, learning rate, dead layers, activation statistics. No separate logging infra required.
The same package also exposes the post-training tools: ModelDiff for behavioral evaluation of a fine-tuned checkpoint, SAEDiff for feature-level activation analysis, and DataSession for dataset inspection. All of it posts to the same dashboard and produces structured output you can use programmatically.
Quick start
Two calls wrap the entire training loop. aquin.attach() takes your model and optimizer and immediately posts a meta event: total and trainable parameter counts, model class name, whether it's a PEFT/LoRA model. session.step() then fires once per step, collecting everything the tracer recorded since the last flush.
session lifecycle
import aquin
session = aquin.attach(model, optimizer, api_key="aq-...")
for epoch in range(num_epochs):
for batch, (inputs, labels) in enumerate(dataloader):
loss = train_step(inputs, labels)
session.step(loss,
epoch=epoch,
batch=batch,
total_batches=len(dataloader),
)
session.stop()Open the Training tab in the dashboard, connect with your API key, and the metrics stream in live. No polling, no log files to tail. The step payload lands directly in the UI as it's posted.
What streams on every step
Each call to step() collects and posts a structured payload. Gradient and weight norms are recorded per named parameter, so you can see which layers are drifting or stalling individually, not just the global max. The deadLayers field flags any parameter with a gradient norm of exactly zero. gradHistory keeps a rolling window of the last 20 values per layer, which is enough to distinguish a transient spike from a sustained instability.
step payload fields
Activation tracking
Call session.trackActivations() once after attaching and Aquin hooks into every Linear, Conv2d, and LayerNorm layer. From that point, every Nth step includes per-layer activation mean, standard deviation, and dead-neuron ratio in the step payload. The forward pass runs under torch.no_grad() so it doesn't affect your training gradients.
# sample_input can be a tensor or a dict (for HF-style forward) session.trackActivations(sample_input=next(iter(loader)), every=20)
The every parameter controls how often the capture runs. For a model with many layers, running every step is unnecessary. Every 10 to 20 steps is enough to catch dead-neuron accumulation before it compounds.
SDK classes
The four entry points cover the full training-to-evaluation pipeline. attach() returns a Session that handles everything during training. The three analysis classes, ModelDiff, SAEDiff, and DataSession, run after training on checkpoints and datasets.
class map
ModelDiff
ModelDiff compares a fine-tuned checkpoint against its base model using the Aquin VM. It runs three behavioral evaluations: consistency (does the model behave predictably on known prompts), suppression (has fine-tuning reduced or redirected knowledge), and robustness (does behavior hold under paraphrasing). Results stream to the Training tab and come back as a structured dict.
diff = aquin.ModelDiff(
base="meta-llama/Llama-3.2-1B",
ft_path="./checkpoints/final",
api_key="aq-...",
)
result = diff.run(prompts=["Summarise the French Revolution"])If ft_path is a directory, the SDK picks the first file with extension .safetensors, .bin, or .pt. The return value includes consistency_score, suppression_score, and robustness_score alongside per-prompt breakdowns for each.
SAEDiff
SAEDiff shows which Sparse Autoencoder features changed most between the base model and your fine-tuned checkpoint. Given prompts, it runs activations through both models and computes per-feature deltas across the SAE decomposition. The result tells you not just that the model changed, but which concepts, as represented by the SAE, shifted and by how much.
sae = aquin.SAEDiff(
base="meta-llama/Llama-3.2-1B",
ft_path="./checkpoints/final",
api_key="aq-...",
)
result = sae.run(prompts=["Tell me about X"])SAE diffs are only available for models in the Aquin registry. Unsupported models return an empty dict with a logged warning.
DataSession
DataSession runs the same inspection pipeline available in the dashboard directly on a local file. Point it at a .jsonl, .json, .csv, or .tsv file and call inspect(). Five checks, PII, toxicity, bias, synthetic detection, and compliance, run in parallel threads, with compliance aggregating after the others complete. Each result streams to the Training tab as it lands.
ds = aquin.DataSession(
dataset_path="./training_data.jsonl",
api_key="aq-...",
max_rows=300,
)
results = ds.inspect()
# or run a subset:
results = ds.inspect(checks=["pii", "toxicity"])Export sinks
Sinks mirror the step payload to external loggers without any additional instrumentation. The same dict posted to Aquin on every step is forwarded to whatever sinks are attached. WandbSink and MLflowSink are built in; both accept standard constructor kwargs for their respective init calls.
step payload routing
session.addSink(aquin.WandbSink(project="my-project")) session.addSink(aquin.MLflowSink(run_name="run-1"))
API event types
Everything the SDK posts goes to /api/training/ingest with a type field. The dashboard routes each event type to the corresponding panel. You can also read the raw stream directly if you're building on top of the API.
event types → /api/training/ingest
Full example
Training loop with activation tracking and a W&B sink, followed by post-training analysis using all three classes.
import aquin
from transformers import AutoModelForCausalLM
from torch.optim import AdamW
from torch.utils.data import DataLoader
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
optimizer = AdamW(model.parameters(), lr=2e-5)
loader = DataLoader(dataset, batch_size=4)
session = aquin.attach(model, optimizer, api_key="aq-...")
session.trackActivations(sample_input=next(iter(loader)), every=20)
session.addSink(aquin.WandbSink(project="llama-ft"))
for epoch in range(3):
for batch_idx, batch in enumerate(loader):
optimizer.zero_grad()
outputs = model(**batch)
loss = outputs.loss
loss.backward()
optimizer.step()
session.step(loss,
epoch=epoch,
batch=batch_idx,
total_batches=len(loader),
)
session.save("./checkpoints/final")
session.stop()
# Post-training analysis
aquin.ModelDiff(
base="meta-llama/Llama-3.2-1B",
ft_path="./checkpoints/final",
api_key="aq-...",
).run(prompts=["Summarise the French Revolution"])
aquin.SAEDiff(
base="meta-llama/Llama-3.2-1B",
ft_path="./checkpoints/final",
api_key="aq-...",
).run(prompts=["Summarise the French Revolution"])
aquin.DataSession(
dataset_path="./training_data.jsonl",
api_key="aq-...",
).inspect()Not sure if Aquin is right for you?
Aquin
