Xinsong Feng (冯欣淞)

LINKS

I am currently affiliated with the Data-Driven Decision Intelligence (D3I) Lab at William & Mary. I received my M.S. degree in Electrical and Computer Engineering from UCLA (2023–2025), where I was a member of the Wireless Lab at UCLA, supervised by Prof. Ian P. Roberts. Before that, I obtained my B.Eng. in Communication Engineering from Chongqing University (2019–2023).

My research focuses on reinforcement learning, generative models, and large language models. I am currently especially interested in diffusion language models, with a focus on better understanding their behavior and improving their inference speed and performance. More broadly, I am also interested in generative modeling theory and reinforcement learning algorithms, including post-training methods for LLMs. Previously, I worked on wireless communications, and I remain interested in opportunities to apply AI methods in this domain.

Feel free to contact me if you are interested in further discussion or potential collaboration.

Projects

Solving Constrained Optimization Problems as ODE-based Models Using Reinforcement Learning

Han Meng, Xinsong Feng, Yang Li, Chenan Wang, Kishansingh Rajput, Malachi Schram, Haipeng Chen

Submitted to AISTATS 2026

We propose CMFO (Constrained Markov Flow Optimizer), which unifies flow-matching generative models and reinforcement learning to solve constrained optimization problems with improved efficiency and feasibility.

Offline Reinforcement Learning with Generative Trajectory Policies

Xinsong Feng, Leshu Tang, Chenan Wang, Haipeng Chen

Submitted to ICLR 2026

We propose Generative Trajectory Policies (GTPs), an ODE-based framework that unifies generative policies in offline RL, overcoming the performance–efficiency trade-off and achieving state-of-the-art results on D4RL benchmarks.

PAPER

Sequential stochastic combinatorial optimization using hierarchal reinforcement learning

Xinsong Feng, Zihan Yu, Yanhai Xiong, Haipeng Chen

ICLR 2025

We propose Wake-Sleep Option (WS-option), a hierarchical reinforcement learning framework for sequential stochastic combinatorial optimization that jointly optimizes budget allocation and node selection in a two-layer MDP.

PAPER POSTER CODE