My research focuses on advancing embodied intelligence—enabling robots to perceive, reason, and learn through cutting-edge machine learning technologies. I believe that the pursuit of embodied intelligence involves a cycle of understanding, reasoning, and reinforcement learning. Agents first acquire knowledge from data, make decisions based on this knowledge, and continuously improve through their own experiences.
We introduce OneTwoVLA, a single unified vision-language-action model capable of both acting (System One)⚡ and reasoning (System Two)🤔. Importantly, it adaptively determines when to engage each mode.
We propose HuB (Humanoid Balance) 🤖, a framework that enables humanoids to perform challenging quasi-static balance tasks ⚖️, including extreme single-legged poses 🦵 such as the Swallow Balance 🕊️ and Bruce Lee's Kick 🦶🥋.