AI Research🇰🇷 한국어

TransformerLens in Practice: Reading Model Circuits with Activation Patching

Using TransformerLens to directly manipulate model activations, we trace which layers and heads causally produce the answer. A hands-on guide to activation patching.

TransformerLens in Practice: Reading Model Circuits with Activation Patching

TransformerLens in Practice: Reading Model Circuits with Activation Patching

In the previous post, we treated Lens as a window into the model's intermediate thoughts.

But "reading" alone cannot answer the most important question:

Does the model actually *use* this information?

Just because a hidden state at some layer contains "Paris" does not mean that layer causally contributes to the final answer. Information can be present but unused. A layer might hold the right answer in its representation, yet the model might arrive at its output through entirely different pathways.

🔒

Sign in to continue reading

Create a free account to access the full content.

Related Posts