Chengshuai Shi

picture 

Postdoctoral Fellow

Princeton Language and Intellegence, Princeton University

Email: cs1083 at princeton dot edu

Phone: 434-218-9860

Google Scholar Profile

I am currently a Postdoctoral Fellow at the Princeton Language and Intelligence (PLI) initiative at Princeton University, where I work closely with Professor Chi Jin, Professor Karthik Narasimhan, and Professor Danqi Chen. Prior to joining PLI, I worked for a year as a Senior Machine Learning Research Engineer in the AI group at Bloomberg, New York City.

I received my Ph.D. in Electrical Engineering from the University of Virginia in 2024, where I was advised by Professor Cong Shen. During my Ph.D. (2021–2024), I was honored to be supported by the Bloomberg Data Science Ph.D. fellowship.

My research interests lie in machine learning, with a focus on intelligent decision-making. Specifically, I work on:

  • Foundational principles in areas such as reinforcement learning, multi-armed bandits, game theory, and multi-agent systems;

  • Applications in emerging fields including wireless communication, recommender systems, and large language models

News

  • 04/2026: Two papers accepted to ICML 2026!

    • “f-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses”: This work provides the first unified theory and provably efficient online RLHF algorithms for general f-divergence regularization, covering alternatives beyond reverse KL such as forward KL and chi-squared divergence.

    • “SMILE: Extended Deep Submodular Function-Based Instruction and In-context Learning Demonstration Selection”: This work proposes a framework SMILE to jointly optimize instructions and ICL demonstrations by modeling their interactions with a submodular surrogate, yielding more robust prompt optimization than separately tuning each component.

  • 01/2026: One paper accepted to ICLR 2026!

    • “Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits”: This work studies multi-objective prompt optimization under a budget constraint using tools from pure-exploration bandits. It extends the framework introduced in our previous work (NeurIPS 2024), which established a connection between bandit algorithms and prompt optimization.

  • 09/2025: One paper accepted to NeurIPS 2025!

    • Greedy Sampling Is Provably Efficient For RLHF”: We show that, under both the Bradley–Terry model and a more general preference model, greedy sampling based on empirical estimates is provably efficient for RLHF with the KL-regularized objective.