Home | Twitter / X | Google Scholar
My Bio:
Shu Yang is a second-year Ph.D. student in Computer Science at King Abdullah University of Science and Technology (KAUST), advised by Prof. Di Wang in the PRADA Lab (Provable Responsible AI and Data Analytics Lab). She received her M.Sc. in Computational Linguistics from the University of Macau, where she was advised by Prof. Derek F. Wong. Shu has also been a visiting scholar at the University of Edinburgh and Tsinghua University. Her research interests lie in responsible AI, large language models (LLMs), and computational linguistics. She has published more than ten papers in top-tier conferences and journals, including ACL, EMNLP, NeurIPS, and CL. In addition, she is one of the organizers of the First PersonaLLM Workshop at NeurIPS 2025.
My research is centered on advancing the trustworthiness and alignment of Large Language Models. I tackle key challenges in AI safety and explainability, with a particular emphasis on two critical areas. Firstly, I investigate emergent deceptive behaviors such as specification gaming and reward hacking, aiming to understand their underlying mechanisms. Secondly, I leverage these insights to develop LLMs that are not only more aligned with complex human intent but are also designed for more effective monitoring and supervision, ensuring that their expressed reasoning accurately reflects their internal states.
CoLM 2024 | Model Autophagy Analysis to Explicate Self-consumption within Human-AI Interactions
https://openreview.net/pdf/259f10aec40499dd63119bf34937e1ec89729faa.pdf