Thinking robot with laptop
Animated brain
GitHub
Google Scholar
LinkedIn

Andy Lee

  • • master student at SIST, ShanghaiTech University
  • • research focus: Interpretability, LLM Safety & Eval
  • • open-source contributor

Hi, I'm Wenjie :)

I conduct research on building AI that is not only capable but reliably aligned with intended use - which I believe to be crucial for real-world deployment. To achieve this, I approach the problem from both the data side (what the model learns from) and the model side (how it learns and internally operates).

Two projects I led on this topic embody this philosophy: Δ-Influence tackles data integrity against poisoning attacks by tracing model failures back to root-cause training samples through an observed phenomenon we term Influence Collapse, enabling targeted correction without prior attack knowledge. NeuronLLM enables precise behavioral control by revealing Functional Antagonism in LLMs - task performance is jointly determined by opposing "good" and "bad" neurons through their coordinated interaction. Using only a small number of task examples, NeuronLLM can identify these critical neurons, opening new possibilities for targeted model steering, such as suppressing harmful capabilities or enhancing task-specific performance.

💡 Tip: Hover over the crystal brain above to discover where to find me online!

Latest Papers

View All →