My research focuses on robot learning, which I approach as a deeply systematic challenge spanning hardware understanding, data acquisition, and scalable training. Currently, I focus on humanoid whole-body manipulation, spanning both robotic foundation models and humanoid control. Pursuing general intelligence on today's humanoids is like building autonomous driving on a unicycle — my goal is to make that as tractable as driving a car, by tackling both ends of the stack together.
We present HuMI π€ (Humanoid Manipulation Interface), the first robot-free π«π€ framework for learning diverse humanoid whole-body manipulation tasks π across various environments π .
We introduce OneTwoVLA, a single unified vision-language-action model capable of both acting (System One)β‘ and reasoning (System Two)π€. Importantly, it adaptively determines when to engage each mode.
We propose HuB (Humanoid Balance) π€, a framework that enables humanoids to perform challenging quasi-static balance tasks βοΈ, including extreme single-legged poses 𦡠such as the Swallow Balance ποΈ and Bruce Lee's Kick π¦Άπ₯.