Welcome to DevInterp’s documentation!

DevInterp is a Python library for conducting research on developmental interpretability, a novel AI safety research agenda rooted in Singular Learning Theory (SLT). DevInterp proposes tools for detecting, locating, and ultimately controlling the development of structure over training.

Read more about developmental interpretability here!

For questions, join the DevInterp discord!

Warning

This library is under active development. The API may change between releases.

Installation

devinterp is distributed through PyPI. Install with uv:

uv add devinterp

Requirements: Python 3.10 or higher.

Quick Start

Compute the Local Learning Coefficient

from devinterp.slt.llc import llc

result = llc(
    model=model,
    dataset=dataset,              # HuggingFace Dataset with "input_ids"
    observables={"train": dataset},
    lr=0.001,
    n_beta=30,
    num_chains=4,
    num_draws=200,
)
print(result["llc_mean"])         # scalar LLC
print(result["llc_per_chain"])    # (num_chains,) per-chain LLC
print(result["loss_trace"])       # (num_chains, num_steps) per-step loss,
                                  # num_steps = num_draws * num_steps_bw_draws + num_burnin_steps

Sample with Observables

from devinterp.slt.sampling import sample

tree = sample(
    model=model,
    dataset=train_data,
    observables={
        "train": train_data,
        "code": (code_data, 5),   # (dataset, batches_per_draw)
    },
    lr=0.001,
    n_beta=30,
    num_chains=4,
    num_draws=200,
)
# tree is an xr.DataTree backed by Zarr with full per-token loss traces

Concepts

Posterior Sampling with SGLD

The core workflow:

  1. Start at a checkpoint \(\hat{w}^*\)

  2. Take SGLD steps (SGD + noise) using one dataset for gradients

  3. Evaluate losses on multiple datasets (observables) at each draw

  4. Store the full per-token loss chains as Zarr datasets

  5. Compute observables (LLC, susceptibilities, BIF) from these chains

The SGLD noise allows exploring low-loss directions while staying near the original checkpoint. This samples from the local posterior distribution around the checkpoint.

Local Learning Coefficient (LLC)

The LLC measures model complexity by counting “effective parameters” in a region of weight space:

\[\hat{\lambda}(\hat{w}^*) = n\beta \cdot (\bar{L}_n - L_n(\hat{w}^*))\]

Unlike parameter count or Hessian rank, LLC accounts for singularities – regions where multiple parameter configurations produce identical outputs. This makes it suitable for neural networks.

Why LLC matters:

  • Detect phase transitions during training (sudden capability changes)

  • Predict generalization via the Free Energy formula

  • Compare checkpoints across training

Susceptibilities

Susceptibilities measure how a model component responds to distribution shifts. For example, how does an attention head’s behavior change when shifting from general text toward code or math?

This is computed by sampling with different weight restrictions (parameter subsets) and measuring the covariance between sampling loss and observable loss.

See Structural Inference: Interpreting Small Language Models with Susceptibilities (Baker et al., 2025) for details.

Bayesian Influence Functions (BIF)

BIF computes pairwise correlations between observable loss traces across sequences from SGLD sampling results. This reveals which sequences influence each other’s loss under posterior sampling, providing a measure of functional similarity.

Architecture

Each analysis has two entry points:

  • High-level (llc(), bif(), susceptibilities()): runs sampling and post-processing in one call

  • Low-level (compute_llc(), compute_bif()): takes a pre-computed xr.DataTree from sample(), useful when you want to run sampling once and compute multiple analyses. compute_susceptibilities() takes a dict[str, xr.DataTree] (one tree per weight restriction), since susceptibilities require a separate sampling run for each restriction.

The sampling pipeline stores full per-token losses to Zarr via sample(), and post-processing functions operate on the resulting xr.DataTree.

Model Requirements

The current API assumes autoregressive language models with fixed-length tokenized sequences:

  • Model must accept input_ids and return logits (HuggingFace models, TransformerLens HookedTransformer, or any model returning a tensor or object with .logits)

  • Dataset must be a HuggingFace Dataset with an "input_ids" column of uniform-length sequences

  • Loss is next-token cross-entropy

For non-standard models, sample_single_chain() in devinterp.slt.sampler accepts a custom evaluate callable.

Hyperparameter selection

All sampling is sensitive to hyperparameters. See our Sampling Hyperparameter Guide.

Further Reading

Credits & Citations

This package was created by Timaeus. Most of the sampling, LLC, susceptibility, and BIF implementations were developed internally; this package is a port of that joint work.

If this package was useful in your work, please cite it as:

@misc{devinterp2026,
  title   = {DevInterp},
  author  = {Snell, William and Wind, Johan Sokrates and Snikkers, Billy
             and Fraser, Sandy and Newgas, Adam and Hoogland, Jesse
             and Wang, George and Gordon, Andrew and Zhou, William
             and van Wingerden, Stan},
  year    = {2026},
  version = {2.0},
  howpublished = {\url{https://github.com/timaeus-research/devinterp}},
}

Guides

API Reference