Chengshuai Shi

picture 

Postdoctoral Fellow

Princeton Language and Intellegence, Princeton University

Email: cs1083 at princeton dot edu

Phone: 434-218-9860

Google Scholar Profile

I am currently a Postdoctoral Fellow at the Princeton Language and Intelligence (PLI) initiative at Princeton University, working with Professor Chi Jin. Prior to joining PLI, I worked for a year as a Senior Machine Learning Research Engineer in the AI group at Bloomberg, New York City.

I received my Ph.D. in Electrical Engineering from the University of Virginia in 2024, where I was advised by Professor Cong Shen. During my Ph.D. (2021–2024), I was honored to be supported by the Bloomberg Data Science Ph.D. fellowship.

My research interests lie in machine learning, with a focus on intelligent decision-making. Specifically, I work on:

  • Foundational principles in areas such as reinforcement learning, multi-armed bandits, game theory, and multi-agent systems;

  • Applications in emerging fields including wireless communication, recommender systems, and large language models

News

  • 09/2025: One paper accepted to NeurIPS 2025!

    • “Greedy Sampling Is Provably Efficient For RLHF”: We show that, under both the Bradley–Terry model and a more general preference model, greedy sampling based on empirical estimates is provably efficient for RLHF with the KL-regularized objective. This work is a joint work with Di Wu (UVA), Prof. Jing Yang (UVA), and Prof. Cong Shen (UVA).

  • 01/2025: One paper accepted to ICLR 2025!

    • Building Math Agents with Multi-Turn Iterative Preference Learning”: We propose a multi-turn direct preference learning framework to enhance the mathematical reasoning capabilities of large language models (LLMs), with its superiority demonstrated by both theoretical guarantees and empirical results. This is a joint work with many amazing researchers from UIUC, Princeton, Google Research, and Google Deepmind.

  • 01/2025: One paper accepted to AISTATS 2025!

    • Cost-Aware Optimal Pairwise Pure Exploration”: We extend the canonical best-arm identification (BAI) study in two directions: (1) tackling a broader set of tasks, termed pairwise exploration tasks, covering BAI, ranking identification, top-k identification and others in a unified manner; (2) minimizing the arm-specific costs, instead of total number of arm-pulling. This is a joint work with Di Wu (UVA), Ruida Zhou (UCLA), and Prof. Cong Shen (UVA).