Meet the new King of AI coding: Google’s Gemini 2.5 Pro E / S Edition Dethrones Claude 3.7 Sonnet

Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more
There is a new king on the throne of the AI ​​coding models: today, the Deepmind AI research unit of Google has unveiled Gemini 2.5 Pro “I / O” edition, a new version of its Hit Gemini 2.5 Pro Multimodal Big Language Model (LLM) was published in March that Deepmind Ceo Demis Hassabis ” built!”
Indeed, the initial benchmarks published by the company indicate that Google has taken the lead – for the first time since the start of the AI ​​generative race with the launch at the end of 2022 of Chatgpt – above all other models on at least one important coding reference.
The new version, labeled “Gemini-2.5-Procreview-05-06”, replaces the previous 03-25 version and is now available for independent developers in Google AI Studio and for companies in the Vertex AI Cloud platform, as well as for individual users of the Gemini application. Google’s blog post said it also fueled the canvas of the Gemini mobile application and other features.
The new version feeds the development of features in applications like Gemini 95, where the model automatically helps visual styles between components. It also allows workflows such as the conversion of YouTube videos in complete learning apps and manufacturing very stylized components – such as reactive video players or animated dictation UIS – with little or no manual CSS edition.
It is a owner model, which means that companies will have to pay Google to use it and access only via Google web services. However, it does not modify the pricing or rate limits; Current Gemini 2.5 Pro users will automatically be routed to the updated model which costs $ 1.25 / $ 10 per million in / out (for context durations of 200,000 tokens) compared to $ 3/15 of Claude $ 3.7 / $ 15.
The company supervises this decision – before the annual conference of Google E / S developers later this month in Mountain View and Online, from May 20 to 21 – in response to a strong feedback from the community around the practical use of Gemini in the generation of code and the interface design of the real world.
Logan Kilpatrick, senior product manager for Gemini API and Google AI Studio, confirmed in a developer blog article that the update also deals with the feedback of key developers around the call for function, with improvements in the reduction of errors and the triggered reliability.
The best scores of human assessors to generate web applications
On Webdev Arena Leadboard, a third-party metric that classifies the models by human preference according to their ability to generate visually attractive and functional web applications, Gemini 2.5 Pro Overview (05-06) has now exceeded the Sonnet Claude 3.7 of Anthropic in first place.
The new version marked 1499.95 on the ranking, placing it well in front of the Sonnet 3.7 1377.10. The previous model Gemini 2.5 Pro (03-25) held third place with a score of 1278.96, which means that the e / s edition represents a jump of 221 points.

As the user of the Power of the AI ​​”Lisan Al Gaib” noted on X, even the GPT-4O of Openai (“O3”) could not move Sonnet 3.7, stressing the importance of the advancement of Gemini.
Gemini’s performance boost reflects an improvement in reliability, aesthetics and conviviality in its results.
Already winning elegant criticisms
Several developers and platform chiefs have highlighted the improvement of the reliability and the application of the model in the production scenarios.
Silas Alberti Cognition noted that Gemini 2.5 Pro was successfully completing a complex refactor of a backend routing system, demonstrating the type of decision -making that is expected of a main developer.
Michael Truell, CEO of the AI ​​coding tool cursor, said internal tests show a marked decrease in tools for tool calls, a previously rated problem. It expects users to find the latest much more effective version in practical environments. Cursor has already integrated Gemini 2.5 Pro into his own code agent, reflecting how developers use the model as a key component in the workflows of smarter developers.
Michele Catasta, president of Relit, described Gemini 2.5 Pro as the best border model to balance the capacity for latency. His comments suggest that Relit is considering the integration of the model into its own tools, in particular for tasks where high reactivity and reliability are crucial.
Likewise, the IAI educator and the founder of Blueshell Private Ai Chatbot, Paul Cover, noted on X that “his code generation and user interface capacities are impressive”. »»
And as Pietro Schirano, CEO of the AI ​​Everart art tool, noted it on X, the new Gemini 2.5 Pro I / O edition was able to generate an interactive simulation of the same “1 gorilla vs 100 men” which was circulating on social media recently from a single invite.
Show another interactive Tetris-The style puzzle game with sound effects that would have been created in less than a minute, the user X “Rameshr” (@rezmeram) wrote that “the occasional game industry is dead !!”
These endorsements add weight to DeepMind’s demands on practical improvements and can encourage broader adoption on developer platforms.
Complete applications and programs from a text prompt
One of the remarkable features of the update is its ability to create web applications or complete interactive simulations from a single prompt.
This is aligned with Deepmind’s vision to simplify the prototyping and development process.
Demonstrations within the Gemini application show how users can transform visual models or thematic prompts into usable code, reducing the barrier to the entrance for developers and teams focused on new ideas.
Although architecture and changes under the Hood of Gemini 2.5 Pro were not publicly detailed, the emphasis remains on the permit to allow faster and more intuitive development experiences.
By leaning on its forces in the generation of code and multimodal inputs, Gemini 2.5 Pro is less positioned as a new research and more as a practical tool for real coding challenges. The early version reflects a clear intention of Google Deepmind to meet the developers’ demand and maintain momentum before its main conference announcements.


