Mian Wu

Research

( / )

RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks

Mian Wu, Gavin Zhang, Sewon Min, Sergey Levine, Aviral Kumar

arXiv preprint, 2025

We propose a post-training methodology for improving language models on open-ended generation tasks. Instead of relying on static reward models, RLAC trains a dynamic LLM critic alongside the generator using adversarial gameplay. The critic identifies the most likely failure modes which are then validated externally, adapting as the generator improves. We demonstrate improvements in factual accuracy for text generation and correctness for code generation across multiple benchmarks.

Paper Project

Research Interests

Machine Learning, Natural Language Processing, Computer Vision, AI Safety, Reinforcement Learning

Education

B.S. in Electrical and Computer Engineering, Shanghai Jiao Tong University

2021 - 2024, 2025 - 2026

Mian Wu

Research

Research Interests

Education

Art