We introduce OneTwoVLA, a single unified vision-language-action model capable of both acting (System One)β‘ and reasoning (System Two)π€. Importantly, it adaptively determines when to engage each mode.
We propose HuB (Humanoid Balance) π€, a framework that enables humanoids to perform challenging quasi-static balance tasks βοΈ, including extreme single-legged poses 𦡠such as the Swallow Balance ποΈ and Bruce Lee's Kick π¦Άπ₯.