Qiyao Ma
Qiyao Ma (马琦尧)
CS Ph.D. Student @ UC Davis

About

Hi, I am a first-year Computer Science Ph.D. student at UC Davis, co-advised by Prof. Zhe Zhao and Prof. Junshan Zhang. Before that, I obtained my Bachelor's degree in Mathematics and Computer Science at The University of Hong Kong.

My research centers on Personalized Interactive Generation, which aims to build generative models that can adapt to personal preferences and interact with users in a personalized manner. Technically, I work on LLM post-training, with a current emphasis on building reward models.

Education

Ph.D. Sep. 2025 - Present
University of California, Davis (UC Davis), Davis, CA, U.S.
Ph.D. student in Computer Science
Advisor: Prof. Zhe Zhao and Prof. Junshan Zhang
B.S. Sep. 2020 - Jun. 2025
The University of Hong Kong (HKU), Hong Kong SAR, China
B.S. (First Class Honors) in Mathematics and Computer Science
Advisor: Prof. Chao Huang

Experiences

  • Jun. 2026 -- now, Microsoft Research, Redmond, WA, U.S.
         Research Intern
  • Jun. 2025 -- Sept. 2025, WeChat (Tencent), Shenzhen, China
         Research Intern, Working on LLM Reward Modeling
  • Jun. 2024 -- Sept. 2024, Yale University, New Haven, CT, U.S.
         Visiting Research Student, Working LLMs for Recommendation
         Advisor: Prof. Rex Ying
  • News

    • HERec is accepted to KDD 2026.
    • Two papers accepted to ICLR 2026.
    • I'm attending NeurIPS 2025 in San Diego. Let's meet up!
    • Arrived in Davis to begin my Ph.D. journey.
    • I'm attending EMNLP 2024 in Miami. Let's meet up!
    • XRec is accepted to EMNLP 2024.

    Selected Publications and Manuscripts

    Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
    Qiyao Ma, Dechen Gao*, Rui Cai*, Boqi Zhao, Hanchu Zhou, Junshan Zhang†, Zhe Zhao† (* co-second author, † equal advising)
    Preprint 2026

    From Faithfulness to Correctness: Generative Reward Models that Think Critically
    Preprint 2026

    Breaking Information Cocoons: A Hyperbolic Framework for Balancing Exploration and Exploitation in Recommender Systems
    KDD 2026
    [Paper] [Code]

    MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale
    ICLR 2026
    [Paper] [Code]
    XRec: Large Language Models for Explainable Recommendation
    EMNLP 2024
    [Paper] [Code] [Media]

    Honors & Awards

    Academics
    • Talent Development Scholarship, The University of Hong Kong, 2022 & 2023
    • Third place in Simon Marais Mathematics Competition, The University of Hong Kong
    • First Prize of Chinese Mathematics Olympiad, Jiangsu Province
    • First Prize of Chinese Physics Olympiad, Jiangsu Province
    Sports
    • Sportsmanship Award Athlete, Chinese Youth Badminton Championships
    • Bronze Medal (Team) in Chinese Youth Badminton Championships, Jiangsu Province
    • Fifth place (Men's Double) in Chinese Youth Badminton Championships, Jiangsu Province
    • Champion (Captain) in City High School Badminton League

    Miscellaneous

    The ENFP Disclaimer: I am a dedicated enthusiast of sleeping and strategic laziness, with a natural allergy to uninspiring work.
    🏸 Before trading the court for college degree, I trained as a professional badminton athlete.
    ⚽️ I played for HKU's Chinese football team during our championship-winning season, albeit strictly as a happy-go-lucky amateur.
    🎞️ Cinema is my ultimate escape. I am always up for a deep dive into the filmographies of Christopher Nolan and Quentin Tarantino.
    ♠️ I enjoy the strategic mind games of poker, even if I usually end up playing the role of "fish".

    Top