I am a 1st year Ph.D. student in Computer Science at King Abdullah University of Science and Technology (KAUST) starting from Fall 2024. I am very fortunate to be advised by Prof. Di Wang in PRADA Lab (Provable Responsible AI and Data Analytics Lab). Before that, I received my Master’s degree in Computational Linguistics from the University of Macau, where I was advised by Derek F. Wong. Meanwhile, I visited the Tsinghua NLP mentored by Prof. Zhiyuan Liu.

My research is centered on advancing the trustworthiness and alignment of Large Language Models. I tackle key challenges in AI safety and explainability, with a particular emphasis on two critical areas. Firstly, I investigate emergent deceptive behaviors such as specification gaming and reward hacking, aiming to understand their underlying mechanisms. Secondly, I leverage these insights to develop LLMs that are not only more aligned with complex human intent but are also designed for more effective monitoring and supervision, ensuring that their expressed reasoning accurately reflects their internal states.

Featured Research

Arxiv 2025| Understanding Aha Moments: from External Observations to Internal Mechanisms https://arxiv.org/abs/2504.02956

Shu Yang, Junchao Wu, Xin Chen, Yunze Xiao, Xinyi Yang, Derek F. Wong, Di Wang
keywords: LLM reasoning, XAI
Large Reasoning Models (LRMs), capable of reasoning through complex problems, have become crucial for tasks like programming, mathematics, and commonsense reasoning. However, a key challenge lies in understanding how these models acquire reasoning capabilities and exhibit "aha moments" when they reorganize their methods to allocate more thinking time to problems. In this work, we systematically study "aha moments" in LRMs, from linguistic patterns, description of uncertainty, "Reasoning Collapse" to analysis in latent space. We demonstrate that the "aha moment" is externally manifested in a more frequent use of anthropomorphic tones for self-reflection and an adaptive adjustment of uncertainty based on problem difficulty. This process helps the model complete reasoning without succumbing to "Reasoning Collapse". Internally, it corresponds to a separation between anthropomorphic characteristics and pure reasoning, with an increased anthropomorphic tone for more difficult problems. Furthermore, we find that the "aha moment" helps models solve complex problems by altering their perception of problem difficulty. As the layer of the model increases, simpler problems tend to be perceived as more complex, while more difficult problems appear simpler.

CoLM 2024 | Model Autophagy Analysis to Explicate Self-consumption within Human-AI Interactions

https://openreview.net/pdf/259f10aec40499dd63119bf34937e1ec89729faa.pdf

Shu Yang*, Muhammad Asif Ali, Lu Yu, Lijie Hu, and Di Wang
keywords: XAI, Self-consumption, Human-AI Interactions
A progressive prevalence of model-generated synthetic information over time within training datasets compared to human-generated information.
The discernible tendency of large models, when acting as information transmitters across multiple iterations, to selectively modify or prioritize specific contents.
The potential for a reduction in the diversity of socially or human-generated information, leading to bottlenecks in the performance enhancement of large models and confining them to local optima.