About

Hi, I am a first-year Computer Science Ph.D. student at UC Davis, co-advised by Prof. Zhe Zhao and Prof. Junshan Zhang. Before that, I obtained my Bachelor's degree in Mathematics and Computer Science at The University of Hong Kong.

My technical work is in LLM post-training, with a current focus on multi-turn RL and reward modeling. More broadly, my research is driven by a vision of Personalized Interactive Generation — generative models that adapt to individual preferences and interact with users in a personalized way.

Education

Ph.D.	Sep. 2025 - Present University of California, Davis (UC Davis), Davis, CA, U.S. Ph.D. student in Computer Science Advisor: Prof. Zhe Zhao and Prof. Junshan Zhang

B.S.	Sep. 2020 - Jun. 2025 The University of Hong Kong (HKU), Hong Kong SAR, China B.S. (First Class Honors) in Mathematics and Computer Science Advisor: Prof. Chao Huang

Experiences

Jun. 2026 -- now, Microsoft Research, Redmond, WA, U.S. Research Intern

Jun. 2025 -- Sept. 2025, WeChat (Tencent), Shenzhen, China Research Intern, Working on LLM Reward Modeling

Jun. 2024 -- Sept. 2024, Yale University, New Haven, CT, U.S. Visiting Research Student, Working on LLMs for Recommendation Advisor: Prof. Rex Ying

News

Personalized RewardBench is accepted to COLM 2026. See you in San Francisco!
HERec is accepted to KDD 2026.
Two papers accepted to ICLR 2026.
I'm attending NeurIPS 2025 in San Diego. Let's meet up!
Arrived in Davis to begin my Ph.D. journey.
I'm attending EMNLP 2024 in Miami. Let's meet up!
XRec is accepted to EMNLP 2024.

Selected Publications and Manuscripts

From Faithfulness to Correctness: Generative Reward Models that Think Critically

Qiyao Ma, Yunsheng Shi, Hongtao Tian, Weiming Chang, Ting Yao

Preprint 2026

[Paper] [Code] [Hugging Face]

Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization

Qiyao Ma, Dechen Gao*, Rui Cai*, Boqi Zhao, Hanchu Zhou, Junshan Zhang†, Zhe Zhao† (* co-second author, † equal advising)

COLM 2026

[Paper] [Code] [Hugging Face]

Breaking Information Cocoons: A Hyperbolic Framework for Balancing Exploration and Exploitation in Recommender Systems

Qiyao Ma, Menglin Yang, Mingxuan Ju, Tong Zhao, Neil Shah, Rex Ying

KDD 2026

[Paper] [Code]

MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale

Ya Wen*, Jixuan Cai*, Qiyao Ma, Linyan Li, Xinhua Chen, Chris Webster, Yulun Zhou (* equal contribution)

ICLR 2026

[Paper] [Code]

XRec: Large Language Models for Explainable Recommendation

Qiyao Ma, Xubin Ren, Chao Huang

EMNLP 2024

[Paper] [Code] [Media]

Honors & Awards

Academics

Talent Development Scholarship, The University of Hong Kong, 2022 & 2023
Third place in Simon Marais Mathematics Competition, The University of Hong Kong
First Prize of Chinese Mathematics Olympiad, Jiangsu Province
First Prize of Chinese Physics Olympiad, Jiangsu Province

Sports

Sportsmanship Award Athlete, Chinese Youth Badminton Championships
Bronze Medal (Team) in Chinese Youth Badminton Championships, Jiangsu Province
Fifth place (Men's Double) in Chinese Youth Badminton Championships, Jiangsu Province
Champion (Captain) in City High School Badminton League

Miscellaneous

The ENFP Disclaimer: I am a dedicated enthusiast of sleeping and strategic laziness, with a natural allergy to uninspiring work.
🏸 Before trading the court for college degree, I trained as a professional badminton athlete.
⚽️ I played for HKU's Chinese football team during our championship-winning season, albeit strictly as a happy-go-lucky amateur.
🎞️ Cinema is my ultimate escape. I am always up for a deep dive into the filmographies of Christopher Nolan and Quentin Tarantino.
♠️ I enjoy the strategic mind games of poker, even if I usually end up playing the role of "fish".

Top