Andrew Lee

Andrew Lee

  • About
  • Publications
  • Talks
  • CV

© 2026

  • About

  • Hello! I am a post-doctoral fellow at the Insight + Interaction Lab at Harvard, hosted by professors Martin Wattenberg and Fernanda Viégas.

    I did my PhD in computer science at the University of Michigan, advised by Professor Rada Mihalcea.

    My research interests are at the intersection of machine learning and interpretability. I am particularly interested in understanding the representations learned by neural networks: what kind of feature geometry do they learn and why? How do we uncover them? How do they affect the model’s behavior? I am also interested in how these representations and computations are (dis)similar in other intelligent systems.

    During my PhD, I spent time at Meta AI, once on the Reasoning, Attention, and Memory team (Advised by Jason Weston), and once on the Diplomacy team (Advised by Emily Dinan, Mike Lewis). I have also spent time at Microsoft Research on the Knowledge Technologies and Intelligent Experiences (KTX) team advised by Silviu-Petru Cucerzan. Prior to the PhD, I led the Core AI R&D team at Clinc, where we built virtual assistants and conversational AI platforms for financial institutions.

    When I am not reverse-engineering neural networks, I enjoy playing tennis and rowing.

    Select Publications

    • Decomposing Query-Key Feature Interactions Using Contrastive Covariances. ICML 2026.
    • Shared Global and Local Geometry of Language Model Embeddings. COLM 2025. Outstanding Paper Award.
    • A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity. ICML 2024. Oral.
    • Emergent linear representations in world models of self-supervised sequence models. BlackboxNLP 2023. Honorable Mention, Best Paper.
  • About
  • Publications
  • Talks
  • CV

© 2026