From Logit Lens to Tuned Lens: Reading the Intermediate Thoughts of Transformers
What happens inside an LLM between input and output? Logit Lens and Tuned Lens let us observe how Transformers build predictions layer by layer.

From Logit Lens to Tuned Lens: Reading the Intermediate Thoughts of Transformers
You type "The capital of France is" into an LLM and get back "Paris." But *where* inside the model did that answer actually form?
TL;DR
- Logit Lens projects intermediate hidden states to vocabulary space using the model's final unembedding matrix
- This reveals how Transformers build predictions incrementally, layer by layer
Related Posts

AI Research
SAE and TensorLens: The Age of Feature Interpretability
Individual neurons are uninterpretable. Sparse Autoencoders extract monosemantic features from model internals, and TensorLens analyzes the entire Transformer as a single unified tensor.

AI Research
TransformerLens in Practice: Reading Model Circuits with Activation Patching
Using TransformerLens to directly manipulate model activations, we trace which layers and heads causally produce the answer. A hands-on guide to activation patching.

AI Research
We Benchmarked MiniCPM-o 4.5 in Korean. Here's What Actually Happens.
We benchmarked MiniCPM-o 4.5's Korean performance side by side with English. Image descriptions, OCR, document extraction — what works, what breaks, and why the root cause is architecture, not prompts.