Imagine mapping the "thoughts" of an AI, pinpointing which regions of its neural network light up in response to specific concepts. This is the fascinating field of mechanistic interpretability, a discipline that helps us understand how AI organizes and retrieves knowledge.
On a recent episode of the Lex Fridman Podcast, Chris Olah, a leading thinker in …
Keep reading with a 7-day free trial
Subscribe to Markus’s Substack to keep reading this post and get 7 days of free access to the full post archives.