OPENAI presents the chatgpt agent: from research to the automation of the real world

by Brenden Burgess

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

On July 17, 2025Openai launched Pussy agenttransforming the chatpt of a conversational assistant into a unified AI agent Capable of autonomously performs complex and multi -stage tasks – from web navigation to the execution of code – in a virtual computer environment.

Bride the previous capacities

The chatgpt agent is based on two previous tools:

1500X500
  • OperatorActivated limited web interactions – Clicking, scrolling and filling of form – with an agent based on a browser.
  • Deep researchprovided autonomous navigation and report synthesis on longer deadlines.

Individually, the two had limitations: the operator could interface but could not carry out an in -depth analysis; In -depth research could analyze but do not dynamically interact with sites. The Chatgpt agent merges both forces, unifying navigation, the use of tools and reasoning in a single agent architecture.

Internal architecture and workflow

In the heart is a Virtual IT environment combination:

  1. A visual browser For the erasure sites of man,
  2. A text browser optimized for structured reasoning,
  3. A shell / terminal To execute the code,
  4. Integrated API connectors For services like Gmail or GitHub.

The agent adapts permanently – determine whether to click on the buttons, run scripts or analyze the content – while maintaining the state through the tools. All actions occur in the controlled context of agents, ensuring traceability and flexibility.

Examples of tasks: from planning to execution

The chatgpt agent can tackle tasks such as:

  • Calendar briefing: Analyze your calendar, recover news related and sum up to come to come.
  • Grocery command: Take advantage of ingredients, price comparison, switching from orders.
  • Competitive analysis: Recover the competitors' pages, scrape the data, create slides or spreadsheets.
  • Financial modeling: Download the data, update of the spreadsheets, preservation of formatting.

These workflows involve the use of multi-modal tools: connect to sites, run scripts in the terminal, then pack the results in modifiable documents, all with your surveillance.

Performance: human benchmarks and comparisons

OPENAI reports important gains on several benchmarks:

  • The last examination of humanity: Pass @ 1 rate of 41.6% (best agentic result); up to 44.4% with parallel trials
  • Frontierhath: 27.4% precision using terminal and code support, outperforming previous models.
  • Leaf: 45.5% Global score with XLSX edition, compared to Copilot in 20% Excel and human scores of ≈71%
  • Internal knowledge work benchmark: Agent tools meet or exceed expert performance around 50% of the time
  • BROWSCOMP & WEBARENA: New state of state of state with 68.9% on tasks based on accidents

These evaluations demonstrate a marked improvement in the autonomy and sophistication of tasks.

Safety and attenuation of risks

Agency autonomy has new risks. OPENAI has implemented several guarantees:

  • Explicit confirmation Before any substantial action (for example, purchases, publication).
  • Surveillance mode: Some sensitive tasks require active supervision.
  • Robust Quick injection defensesIncluding training to detect abnormal web prompts and monitor the output of the tool.
  • Confidentiality mechanisms: Mode of control specific to the session without storage of sensitive entries such as passwords.
  • Biothes measures: Classified as at high risk for organic agents, triggering improved threats, refusal training, live surveillance and bug bonus systems.

These layers aim to reduce the misuse, from data leaks to the diversion of tasks.

How to start

Available now for Chatppt Pro, more and team Users:

  • Professional users Get access today with 400 agent / month fashion messages.
  • More and team Will earn gradual access in the coming days (40 messages / month).
  • Company and education The levels will follow in the coming weeks.
  • The rolling launch outside the American territories (EEE, Switzerland) is underway.

You can go into “agent mode” via the tool menu in any conversation and describe the desired workflow. Progress is told in real time, and you can take a break, take over or stop at any time.

Meaning for AI-increase workflows

The Chatgpt agent represents a jump in the passive question systems to the proactive digital workers. By combining:

  • Linguistic reasoning (via GPT-4-Classe models),
  • Tool orchestration (browsers, terminals),
  • Context preservation execution environments,

… Openai allows more autonomous, reliableAnd focused on use cases. Although controls are essential to protect against improper use, this version widens the scope of what AI assistants can actually TO DOnot just say.

For developers and data scientists, the Chatgpt agent becomes a platform: a programmable and observable agent capable of scratching, analyzing, synthesizing and exporting to demand. It opens up Next Gen workflows possibilities in research, business automation and personal productivity.

Conclusion

The chatgpt agent is not only a conversational improvement – it is a strategic pivot towards the generalized and autonomous workflows of the AI. Its beginnings mark the transition of LLM of passive advisers to active agents, carrying out research, creation and real action in a unified and controllable environment. Expect that this matures in a fundamental capacity in the fields with AI.


Sponsorship
Reach the most influential AI developers in the world. 1M + monthly players, 500K + community manufacturers, endless possibilities. (Explore sponsorship))


a professional linkedin headshot photogr 0jcmb0R9Sv6nW5XK zkPHw uARV5VW1ST6osLNlunoVWg

Michal Sutter is a data science professional with a master's degree in data sciences from the University of Padova. With a solid base in statistical analysis, automatic learning and data engineering, Michal excels in transforming complex data sets into usable information.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.