Home | Twitter / X | Google Scholar
About me
I am a 1st year Ph.D. student in Computer Science at King Abdullah University of Science and Technology (KAUST) starting from Fall 2024. I am very fortunate to be advised by Prof. Di Wang in PRADA Lab (Provable Responsible AI and Data Analytics Lab). Before that, I received my Master’s degree in Computational Linguistics from the University of Macau, where I was advised by Derek F. Wong. Meanwhile, I visited the Tsinghua NLP mentored by Prof. Zhiyuan Liu.
My research is centered on advancing the trustworthiness and alignment of Large Language Models. I tackle key challenges in AI safety and explainability, with a particular emphasis on two critical areas. Firstly, I investigate emergent deceptive behaviors such as specification gaming and reward hacking, aiming to understand their underlying mechanisms. Secondly, I leverage these insights to develop LLMs that are not only more aligned with complex human intent but are also designed for more effective monitoring and supervision, ensuring that their expressed reasoning accurately reflects their internal states.
Featured Research
Arxiv 2025| Understanding Aha Moments: from External Observations to Internal Mechanisms https://arxiv.org/abs/2504.02956
- Shu Yang, Junchao Wu, Xin Chen, Yunze Xiao, Xinyi Yang, Derek F. Wong, Di Wang
- keywords: LLM reasoning, XAI
- Large Reasoning Models (LRMs), capable of reasoning through complex problems, have become crucial for tasks like programming, mathematics, and commonsense reasoning. However, a key challenge lies in understanding how these models acquire reasoning capabilities and exhibit "aha moments" when they reorganize their methods to allocate more thinking time to problems. In this work, we systematically study "aha moments" in LRMs, from linguistic patterns, description of uncertainty, "Reasoning Collapse" to analysis in latent space. We demonstrate that the "aha moment" is externally manifested in a more frequent use of anthropomorphic tones for self-reflection and an adaptive adjustment of uncertainty based on problem difficulty. This process helps the model complete reasoning without succumbing to "Reasoning Collapse". Internally, it corresponds to a separation between anthropomorphic characteristics and pure reasoning, with an increased anthropomorphic tone for more difficult problems. Furthermore, we find that the "aha moment" helps models solve complex problems by altering their perception of problem difficulty. As the layer of the model increases, simpler problems tend to be perceived as more complex, while more difficult problems appear simpler.
Arxiv 2024 | What makes your model a low-empathy or warmth person: Exploring the Origins of Personality in LLMs https://arxiv.org/pdf/2410.10863
- Shu Yang*, Shenzhe Zhu*, Ruoxuan Bao, Liang Liu, Yu Chen, Lijie Hu, Mengdi Li, Di Wang
- keywords: Personality in LLMs, SAE, XAI
- Drawing on the theory of social determinism, we investigate how long-term background factors, such as family environment and cultural norms, interact with short-term pressures like external instructions, shaping and influencing LLMs' personality traits.
Arxiv2024 | Understanding Reasoning in Chain-of-Thought from the Hopfieldian View https://arxiv.org/html/2410.03595v1
- Lijie Hu*, Liang Liu*, Shu Yang*, Xin Chen, Zhen Tan, Muhammad Asif Ali, Mengdi Li, Di Wang