Home | Twitter / X | Google Scholar

About me

I am a 1st year Ph.D. student in Computer Science at King Abdullah University of Science and Technology (KAUST) starting from Fall 2024. I am very fortunate to be advised by Prof. Di Wang in PRADA Lab (Provable Responsible AI and Data Analytics Lab). Before that, I received my Master’s degree in Computational Linguistics from the University of Macau, where I was advised by Derek F. Wong. Meanwhile, I visited the Tsinghua NLP mentored by Prof. Zhiyuan Liu.

My research is centered on advancing the trustworthiness and alignment of Large Language Models. I tackle key challenges in AI safety and explainability, with a particular emphasis on two critical areas. Firstly, I investigate emergent deceptive behaviors such as specification gaming and reward hacking, aiming to understand their underlying mechanisms. Secondly, I leverage these insights to develop LLMs that are not only more aligned with complex human intent but are also designed for more effective monitoring and supervision, ensuring that their expressed reasoning accurately reflects their internal states.


Featured Research

Arxiv 2025| Understanding Aha Moments: from External Observations to Internal Mechanisms https://arxiv.org/abs/2504.02956

Arxiv 2024 | What makes your model a low-empathy or warmth person: Exploring the Origins of Personality in LLMs https://arxiv.org/pdf/2410.10863

Arxiv2024 | Understanding Reasoning in Chain-of-Thought from the Hopfieldian View https://arxiv.org/html/2410.03595v1