2026
Decomposing Query-Key Feature Interactions Using Contrastive Covariances
PRISM Journal Club (Harvard)
Boston University
ML Collective Deep Learning: Classics and Trends
2025
Understanding Representations to Understand Phenomena
Yonsei University
Shared Global and Local Geometry of Language Model Embeddings
Google DeepMind
New England Mechanistic Interpretability Workshop
PRISM Journal Club (Harvard)
University of Michigan
Reverse-Engineering Language Models to Understand Alignment, Reasoning
MIT Language & Intelligence Lab
Microsoft Research
Princeton University
Oxford University
University of Chicago
Northeastern University
The Geometry of Self-Verification in a Task-Specific Reasoning Model
ML Collective Deep Learning: Classics and Trends
Eleuther AI Reading Group
2024
A Mechanistic Understanding of Alignment Algorithms
University of Texas - Austin: Social Applications and Impact of NLP
University of Cambridge