Chengshuai Shi

Senior Researcher
Bloomberg AI, New York, NY

Ph.D. in Electrical Engineering (Graduated in 2024)
University of Virginia, Charlottesville, VA
Advisor: Prof. Cong Shen

Email (for Bloomberg-related contacts): cshi128 at bloomberg dot net
Email (for general contacts): cs7ync at virginia dot edu

Phone: 434-218-9860

Research Interest

I am interested in general machine learning research and am actively focusing on topics related to intelligent decision-making:

Foundational principles in areas of reinforcement learning, multi-armed bandits, game theory, and multi-agent systems
Applications in emerging fields, such as wireless communication, recommender systems, and large language models.

My research is towards a vision of building intelligent decision-making agents that are both broadly applicable and finely adaptable.

05/2025: One paper accepted to UAI 2025!
- “Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis”: We propose a unified RL approach that combines offline data with online exploration, yielding improved sub-optimality and regret bounds while revealing distinct coverage requirements. This is a joint work with Ruiquan Huang (PSU), Donghao Li (UVA), Prof. Cong Shen (UVA), and Prof. Jing Yang (UVA).

01/2025: One paper accepted to ICLR 2025!
- “Building Math Agents with Multi-Turn Iterative Preference Learning”: We propose a multi-turn direct preference learning framework to enhance the mathematical reasoning capabilities of large language models (LLMs), with its superiority demonstrated by both theoretical guarantees and empirical results. This is a joint work with many amazing researchers from UIUC, Princeton, Google Research, and Google Deepmind.

01/2025: One paper accepted to AISTATS 2025!
- “Cost-Aware Optimal Pairwise Pure Exploration”: We extend the canonical best-arm identification (BAI) study in two directions: (1) tackling a broader set of tasks, termed pairwise exploration tasks, covering BAI, ranking identification, top-k identification and others in a unified manner; (2) minimizing the arm-specific costs, instead of total number of arm-pulling. This is a joint work with Di Wu (UVA), Ruida Zhou (UCLA), and Prof. Cong Shen (UVA).

06/2024: One paper accepted to Transactions on Machine Learning Research (TMLR)!
- [First-authored] Our paper “Harnessing the Power of Federated Learning in Federated Contextual Bandits” is accepted to Transactions on Machine Learning Research, which provides a novel federated contextual bandits design capable of flexibly incorporating federated learning protocols. This is a joint work with Ruida Zhou (UCLA), Kun Yang (UVA), and Prof. Cong Shen (UVA).
- A preliminary version appears in NeurIPS 2023 Workshop on Multi-Agent Security, Dec. 2023.