About

Andrew Lee

About
Hello! I am a post-doctoral fellow at the Insight + Interaction Lab at Harvard, hosted by professors Martin Wattenberg and Fernanda Viégas.

I did my PhD in computer science at the University of Michigan, advised by Professor Rada Mihalcea.

My research interests are at the intersection of machine learning and interpretability. I am particularly interested in understanding the representations learned by neural networks: what kind of feature geometry do they learn and why? How do we uncover them? How do they affect the model’s behavior? I am also interested in how these representations and computations are (dis)similar in other intelligent systems.

During my PhD, I spent time at Meta and Microsoft Research. Prior to the PhD, I led the Core AI R&D team at Clinc, where we built virtual assistants and conversational AI platforms for financial institutions.

When I am not reverse-engineering neural networks, I enjoy playing tennis and rowing.

Select Publications
- Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions. Preprint.
- Decomposing Query-Key Feature Interactions Using Contrastive Covariances. ICML 2026.
- Shared Global and Local Geometry of Language Model Embeddings. COLM 2025. Outstanding Paper Award.
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity. ICML 2024. Oral.
- Emergent linear representations in world models of self-supervised sequence models. BlackboxNLP 2023. Honorable Mention, Best Paper.

Andrew Lee

About

Select Publications