Qwen Drops RobotSuite: 3 Open Models for Robots

Alibaba's Qwen team released RobotSuite, three open embodied-AI models for robot manipulation, video world modeling and navigation, with code on GitHub.

↻ Published 2026-06-16◷ 5 min readEA

Evgenii Arsentev · PhD

New posts every dayFollow me on TelegramWhere the AI world lives — daily AI news + Claude Code tipsFollow →❯_▌ Free courseLearn to build it yourself — freeNo upsells, no cross-sells. There's nothing to buy here — just learn.Start the free course →

Alibaba's Qwen team has released Qwen-RobotSuite, a set of three open embodied-AI models that each tackle a different layer of the robotics stack: RobotManip for manipulation, RobotWorld for video world modeling, and RobotNav for navigation. Two of the three — RobotManip and RobotNav — ship with public GitHub repositories, while RobotWorld arrives as a research paper for now. The framing is deliberate: robotics has been fragmented into one-off models per task and per robot, and Qwen is pitching a single family that spans the major jobs.

RobotManip is a vision-language-action model built on the Qwen3.5-4B backbone that outputs continuous robot actions. It was trained on roughly 38,100 hours drawn only from open datasets and human video — no proprietary data — and it tops the RoboChallenge Table30-v1 leaderboard with a 20% relative gain over the previous best. The technical hook is a unified 80-dimensional canonical action vector with per-dimension masking, which lets data from very different robots train together without interfering. On cross-embodiment transfer it reaches 23.9% success, about 3.2 times the 7.5% of the π0.5 baseline.

Three jobs, one family

RobotWorld is the heavyweight: a 20-billion-parameter, 60-layer multimodal diffusion transformer that predicts future video frames from a language instruction, trained on 8.6 million video-text pairs spanning more than 200 million observation frames. It ranks first overall on EWMBench and DreamGen Bench and first among open models on WorldModelBench. RobotNav comes in 2B, 4B and 8B sizes, emits eight-waypoint trajectories, and posts solid numbers on standard navigation tests — 76.5% success on VLN-CE RxR and 75.6% on HM3D ObjectNav — with an agentic mode that cuts steps by 77% on one embodied-question benchmark.

Why it matters to you

For most people the useful signal here isn't any single leaderboard number — it's that the open-model wave that reshaped chatbots is now reaching robots. Shipping usable code instead of a glossy demo lowers the bar for university labs, startups and hobbyist builders who could never license a closed robotics stack. That's how the chatbot field accelerated two years ago: once capable weights and code were public, thousands of people improved on them in parallel rather than waiting on a handful of vendors.

The detail I'd actually watch is that 38,100-hour figure assembled without any proprietary data. If competitive robot models can be trained on public datasets and human video alone, the advantage shifts away from whoever owns the largest pile of robot logs and toward whoever engineers the training best — and that kind of shift tends to speed the whole field up.

ℹWhat I'd actually do

If you follow robotics, bookmark the GitHub repos rather than the headline benchmarks: the leaderboard scores will be beaten within months, but the released code and the 80-dimensional action format are the durable, reusable parts. Treat the RobotWorld paper as a preview, not a product, until the weights actually appear.

#qwen#robotics#open-source

Related guides

Author

Evgenii Arsentev

PhD · Chief Product Officer at a tech company

About the author →

Want to actually build this?

Guides explain. The free course transforms — personalized, gamified, and built to get you shipping fast.

◉ Start the free course

← All news

Source: marktechpost.com