An open-source generalist model for robot object manipulation
These are the robots we tested Octo on – you can see that there is a wide range of different robot arms, from small to large, single arm to bimanual. Octo was able to control all these robots. Credit: Team et al.

The public release of ChatGPT and other large language models (LLMs) has allowed developers worldwide to start experimenting with these models to enhance the interactive capabilities of their own systems. Similar generalizable models for robotic manipulation, however, remain scarce.

Researchers at University of California, Berkeley (UC Berkeley), Stanford University and CMU recently introduced Octo, an open-source generalist model for robotic manipulation that could allow different robotic systems to effectively manipulate a wide range of objects. This model, presented in a paper pre-published on the server arXiv, could open new avenues for the development of robots that can tackle manual tasks.

"Much of the current progress in AI is driven by and large models," Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black and Oier Mees, told Tech Xplore. "In the robotics community, we recently assembled the Open X-Embodiment dataset, a big manipulation dataset that pools data from many . While this new dataset is a really exciting resource, at the time there weren't many models that could make use of it yet."

The recent work by this research team had two main objectives. The first was to develop a good generalist robotics model that could be applied to various robots and the second was to create open-source code that would allow other researchers to build similar models in the future.

"Octo is what we call a 'generalist' model, a that can control many different types of robots and make them fulfill requests like 'pick up the spoon,' 'close the drawer,' 'wipe the table' etc.," Ghosh, Walke, Pertsch, Black and Mees explained.

"Being a generalist and working on many robots is key, because if you look at research labs around the world, many of them use different robots, so the only way to ensure Octo can be used by many researchers is by supporting a wide range of robots."

Within the technology research and development community, highly performing computational tools that can be applied across multiple systems are often referred to as foundational models. An example of these models is ChatGPT, which can be used to equip various agents and systems with natural language processing (NLP) capabilities.

"We want to build similar foundation models, but for robot control, or in other words, models that can control many robots and make them solve many different tasks," Ghosh, Walke, Pertsch, Black and Mees said.

"Octo is a first step towards that goal. Its training looks very similar to models like ChatGPT: we curate a large and diverse dataset, in our case robot data instead of text, and train a large model to predict the next action the robot should execute given the current robot state and a task instruction."

Octo, the model developed by Ghosh, Walke, Pertsch, Black and Mees is based on the same type of neural networks as ChatGPT, known as transformers. A key advantage of Octo over other previously developed robotics models is the scale of the data used to train it and its flexibility.

The model was trained on the largest dataset of robotic manipulation trajectories compiled to date; the Open X-Embodiment dataset. Octo can also process a diverse range of sensory inputs including different types of images, robot joint readings, language instructions, goal-related images and more.

More information: Dibya Ghosh et al, Octo: An Open-Source Generalist Robot Policy, arXiv (2024). DOI: 10.48550/arxiv.2405.12213

Journal information: arXiv

© 2024 Science X Network

Citation: An open-source generalist model for robot object manipulation (2024, June 10) retrieved 10 June 2024 from https://techxplore.com/news/2024-06-source-generalist-robot.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.