Andy Lee

• master student at SIST, ShanghaiTech University
• research focus: Interpretability, LLM Safety & Eval
• open-source contributor

Hi, I'm Andy :)

I conduct research on building AI that is not only capable but reliably aligned with intended use - which I believe to be crucial for real-world deployment. To achieve this, I approach the problem from both the data side (what the model learns from) and the model side (how it learns and internally operates).

Two projects I led on this topic embody this philosophy: Δ-Influence tackles data integrity against poisoning attacks by tracing model failures back to root-cause training samples through an observed phenomenon we term Influence Collapse, enabling targeted correction without prior attack knowledge. NeuronLLM enables precise behavioral control by revealing Functional Antagonism in LLMs - task performance is jointly determined by opposing "good" and "bad" neurons through their coordinated interaction. This discovery opens new possibilities for targeted model steering, such as suppressing harmful capabilities or enhancing task-specific performance.

💡 Tip: Hover over the crystal brain above to discover where to find me online!

Latest Papers

View All →

Andy Lee

Latest Papers

The Question is the Answer: Weak-to-Strong Benchmarking

Identifying Good and Bad Neurons for Task-Level Controllable LLMs

GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents

Delta-Influence: Unlearning Poisons via Influence Functions