
Andy Lee
- • master student at SIST, ShanghaiTech University
- • research focus: Interpretability, LLM Safety & Eval
- • open-source contributor
Hi, I'm Wenjie :)
I conduct research on building AI that is not only capable but reliably aligned with intended use - which I believe to be crucial for real-world deployment. To achieve this, I approach the problem from both the data side (what the model learns from) and the model side (how it learns and internally operates).
Two projects I led on this topic embody this philosophy: Δ-Influence tackles data integrity against poisoning attacks by tracing model failures back to root-cause training samples through an observed phenomenon we term Influence Collapse, enabling targeted correction without prior attack knowledge. NeuronLLM enables precise behavioral control by revealing Functional Antagonism in LLMs - task performance is jointly determined by opposing "good" and "bad" neurons through their coordinated interaction. Using only a small number of task examples, NeuronLLM can identify these critical neurons, opening new possibilities for targeted model steering, such as suppressing harmful capabilities or enhancing task-specific performance.
💡 Tip: Hover over the crystal brain above to discover where to find me online!
Latest Papers
View All →Identifying Good and Bad Neurons for Task-Level Controllable LLMs
In Submission
DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification
International Conference on Learning Representations (ICLR)
Delta-Influence: Identifying Poisons via Influence Functions
Transactions on Machine Learning Research (TMLR)
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
LanGame @ NeurIPS 2024
