AI Research•February 6, 2026•🇰🇷 한국어

TransformerLens in Practice: Reading Model Circuits with Activation Patching

Using TransformerLens to directly manipulate model activations, we trace which layers and heads causally produce the answer. A hands-on guide to activation patching.

TransformerLens in Practice: Reading Model Circuits with Activation Patching

In the previous post, we treated Lens as a window into the model's intermediate thoughts.

But "reading" alone cannot answer the most important question:

Does the model actually *use* this information?

Just because a hidden state at some layer contains "Paris" does not mean that layer causally contributes to the final answer. Information can be present but unused. A layer might hold the right answer in its representation, yet the model might arrive at its output through entirely different pathways.

🔒

Sign in to continue reading

Create a free account to access the full content.

AI Research

SAE and TensorLens: The Age of Feature Interpretability

Individual neurons are uninterpretable. Sparse Autoencoders extract monosemantic features from model internals, and TensorLens analyzes the entire Transformer as a single unified tensor.

AI Research

From Logit Lens to Tuned Lens: Reading the Intermediate Thoughts of Transformers

What happens inside an LLM between input and output? Logit Lens and Tuned Lens let us observe how Transformers build predictions layer by layer.

AI Research

We Benchmarked MiniCPM-o 4.5 in Korean. Here's What Actually Happens.

We benchmarked MiniCPM-o 4.5's Korean performance side by side with English. Image descriptions, OCR, document extraction — what works, what breaks, and why the root cause is architecture, not prompts.

TransformerLens in Practice: Reading Model Circuits with Activation Patching

Sign in to continue reading

Related Posts

SAE and TensorLens: The Age of Feature Interpretability

From Logit Lens to Tuned Lens: Reading the Intermediate Thoughts of Transformers

We Benchmarked MiniCPM-o 4.5 in Korean. Here's What Actually Happens.