Andy Lee

• master student at SIST, ShanghaiTech University
• research focus: Interpretability, LLM Safety & Eval
• open-source contributor

Hi, I'm Wenjie :)

I conduct research on building AI that is not only capable but reliably aligned with intended use - which I believe to be crucial for real-world deployment. To achieve this, I approach the problem from both the data side (what the model learns from) and the model side (how it learns and internally operates).

Two projects I led on this topic embody this philosophy: Δ-Influence tackles data integrity against poisoning attacks by tracing model failures back to root-cause training samples through an observed phenomenon we term Influence Collapse, enabling targeted correction without prior attack knowledge. NeuronLLM enables precise behavioral control by revealing Functional Antagonism in LLMs - task performance is jointly determined by opposing "good" and "bad" neurons through their coordinated interaction. Using only a small number of task examples, NeuronLLM can identify these critical neurons, opening new possibilities for targeted model steering, such as suppressing harmful capabilities or enhancing task-specific performance.

💡 Tip: Hover over the crystal brain above to discover where to find me online!

Latest Papers

View All →

Andy Lee

Latest Papers

Identifying Good and Bad Neurons for Task-Level Controllable LLMs

DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification

Delta-Influence: Identifying Poisons via Influence Functions

GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents