Like Anthropic’s Laptop Use and Google DeepMind’s Mariner, Operator takes screenshots of a pc display and scans the pixels to determine what actions it may take. CUA, the mannequin behind it, is educated to work together with the identical graphical person interfaces—buttons, textual content containers, menus—that folks use after they do issues on-line. It scans the display, takes an motion, scans the display once more, takes one other motion, and so forth. That lets the mannequin perform duties on most web sites that an individual can use.
“Historically the best way fashions have used software program is thru specialised APIs,” says Reiichiro Nakano, a scientist at OpenAI. (An API, or software programming interface, is a chunk of code that acts as a sort of connector, permitting totally different bits of software program to be hooked as much as each other.) That places numerous apps and most web sites off limits, he says: “However when you create a mannequin that may use the identical interface that people use each day, it opens up an entire new vary of software program that was beforehand inaccessible.”
CUA additionally breaks duties down into smaller steps and tries to work by them one after the other, backtracking when it will get caught. OpenAI says CUA was educated with methods much like these used for its so-called reasoning models, o1 and o3.
OPENAI
OpenAI has examined CUA towards a variety of trade benchmarks designed to evaluate the flexibility of an agent to hold out duties on a pc. The corporate claims that its mannequin beats Laptop Use and Mariner in all of them.
For instance, on OSWorld, which assessments how properly an agent performs duties reminiscent of merging PDF recordsdata or manipulating a picture, CUA scores 38.1% to Laptop Use’s 22.0% Compared, people rating 72.4%. On a benchmark referred to as WebVoyager, which assessments how properly an agent performs duties in a browser, CUA scores 87%, Mariner 83.5%, and Laptop Use 56%. (Mariner can solely perform duties in a browser and subsequently doesn’t rating on OSWorld.)
For now, Operator can even solely perform duties in a browser. OpenAI plans to make CUA’s wider talents out there sooner or later by way of an API that different builders can use to construct their very own apps. That is how Anthropic launched Laptop Use in December.
OpenAI says it has examined CUA’s security, utilizing red teams to discover what occurs when customers ask it to do unacceptable duties (reminiscent of analysis tips on how to make a bioweapon), when web sites include hidden directions designed to derail it, and when the mannequin itself breaks down. “We’ve educated the mannequin to cease and ask the person for data earlier than doing something with exterior uncomfortable side effects,” says Casey Chu, one other researcher on the workforce.
Look! No arms
To make use of Operator, you merely kind directions right into a textual content field. However as a substitute of calling up the browser in your pc, Operator sends your directions to a distant browser operating on an OpenAI server. OpenAI claims that this makes the system extra environment friendly. It’s one other key distinction between Operator, Laptop Use and Mariner (which runs inside Google’s Chrome browser by yourself pc).