
Google has just announced a new AI model called Gemini 2.5 Computer Use, which allows artificial intelligence to interact directly with the web browser like a real user.

This AI's capabilities include clicking, scrolling, typing, dragging and dropping, and navigating websites.

This is an important step forward in enabling AI to handle tasks on interfaces without APIs or direct connections.

According to Google, Gemini 2.5 Computer Use is equipped with visual understanding and reasoning capabilities to understand on-screen content and carry out user requests, such as filling out forms, submitting data, or navigating user interfaces (UI testing).

Some earlier versions of this model have been tested in internal projects like AI Mode and Project Mariner, where AI can automatically complete tasks in the browser, like adding products to a shopping cart based on a user-provided ingredients list.

Notably, Google's announcement comes just a day after OpenAI unveiled a series of new applications for ChatGPT at its Dev Day event, while Anthropic also introduced a "computer use" feature for its Claude model last year.

According to Google, Gemini 2.5 Computer Use outperforms competing models on many web and mobile benchmark tests.

However, unlike ChatGPT Agent or Claude, Google's model only works in a browser environment and is not optimized for full control of the computer's operating system.

It currently supports 13 types of actions, including opening a browser, entering text, dragging and dropping, and moving interface elements. The model is available to developers through Google AI Studio and Vertex AI, and users can view a live demo on Browserbase, where the AI performs tasks like "play 2048" or "find controversial topics on Hacker News."
Source: https://khoahocdoisong.vn/ai-google-gemini-25-thao-tac-voi-trinh-duyet-nhu-nguoi-that-post2149059532.html
Comment (0)