Chengshuai Shi

Postdoctoral Fellow

Princeton Language and Intellegence, Princeton University

Phone: 434-218-9860

I am currently a Postdoctoral Fellow at the Princeton Language and Intelligence (PLI) initiative at Princeton University, where I work closely with Professor Chi Jin, Professor Karthik Narasimhan, and Professor Danqi Chen. Prior to joining PLI, I worked for a year as a Senior Machine Learning Research Engineer in the AI group at Bloomberg, New York City.

I received my Ph.D. in Electrical Engineering from the University of Virginia in 2024, where I was advised by Professor Cong Shen. During my Ph.D. (2021–2024), I was honored to be supported by the Bloomberg Data Science Ph.D. fellowship.

My research centers on intelligent decision-making, with a growing focus on integrating reinforcement learning and large language models. I develop principled methods grounded in reinforcement learning, multi-armed bandits, game theory, and multi-agent systems, and apply them to emerging problems in wireless communications, recommender systems, and language-model-based agents. My broader goal is to build adaptive, reliable, and scalable intelligent systems capable of learning and making decisions in complex, interactive environments.

News

05/2026: Project Odysseus is now released!
- “Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning”, with the project website here
- We study how to make reinforcement learning stable and effective for training VLM agents in long-horizon, visually grounded environments, using Super Mario Land as a testbed. Successful play in Super Mario Land often requires 100+ turns of closed-loop control, whereas existing VLM-RL work has mostly focused on shorter-horizon settings, typically around 20–30 turns. We propose Odysseus, a framework that combines lightweight SFT initialization with multi-task RL. The resulting trained model outperforms the base model by 5× and the strongest frontier model we evaluated by 3.6× in game performance. It also shows clear generalization to unseen levels and cross-game transfer to Super Mario Bros., while preserving the base model’s capabilities on general-purpose multimodal benchmarks.
- 07/2026: Odysseus is accepted to COLM 2026!

04/2026: Two papers accepted to ICML 2026!
- “f-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses”: This work provides the first unified theory and provably efficient online RLHF algorithms for general f-divergence regularization, covering alternatives beyond reverse KL such as forward KL and chi-squared divergence.
- “SMILE: Extended Deep Submodular Function-Based Instruction and In-context Learning Demonstration Selection”: This work proposes a framework SMILE to jointly optimize instructions and ICL demonstrations by modeling their interactions with a submodular surrogate, yielding more robust prompt optimization than separately tuning each component.

01/2026: One paper accepted to ICLR 2026!
- “Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits”: This work studies multi-objective prompt optimization under a budget constraint using tools from pure-exploration bandits. It extends the framework introduced in our previous work (NeurIPS 2024), which established a connection between bandit algorithms and prompt optimization.

10/2025: Will attend the 2025 EAS Trailblazers Symposium at Caltech as one of the seven trailblazers!

09/2025: One paper accepted to NeurIPS 2025!
- “Greedy Sampling Is Provably Efficient For RLHF”: We show that, under both the Bradley–Terry model and a more general preference model, greedy sampling based on empirical estimates is provably efficient for RLHF with the KL-regularized objective.

09/2025: I have joined Princeton University as a Postdoctoral Fellow at the Princeton Language and Intelligence (PLI) initiative! Feeling excited for this new adventure!