Google Unveils Gemini 3 Pro: Capabilities, Benchmarks, and Product Integration

Sam Witteveen

Summary Date: 2025-11-18 20:57:23

# Google Unveils Gemini 3 Pro: Capabilities, Benchmarks, and Product Integration ### Overview Google released Gemini 3 Pro today, the long‑awaited successor to Gemini 2.5 Pro. Early testers, including the author, have been given preview access. The launch is paired with a new product called **Anti‑Gravity**, focused on agentic coding. ### Core Model Improvements - **Reasoning Jump:** Gemini 3 Pro shows a noticeable boost in multi‑step and long‑horizon reasoning. - **Concise, Direct Assistant:** Unlike personality‑heavy models, Gemini 3 aims to be a practical tool that does heavy lifting for users. - **Enhanced Coding & Agent Skills:** Better function calling, tool use, and the ability to plan and execute complex tasks. ### Benchmark Performance | Benchmark | Gemini 3 Pro vs. Gemini 2.5 Pro | Notable Competitors | |-----------|--------------------------------|----------------------| | LM‑Marina (ELO) | >1500 ELO (≈50 pts higher) | Gro‑4.1 models close | | Humanity’s Last Exam | 37.5 % score (significant jump) | — | | GPQA‑Diamond | Top score, surpassing prior models | | ARC‑AGI | Higher than Claude Sonnet 4.5 and GPT‑5.1 | | Terminal‑Bench 2 | Substantially better than Gemini 2.5 and rivals | | Aentic Tool‑Use | Edges out Claude Sonnet 4.5 | *Overall, Gemini 3 Pro outperforms competitors on almost every listed benchmark, with SweetBench as a minor exception.* ### Coding & Agentic Capabilities Demonstrated in AI Studio - **Multi‑tool Workflows:** The model can chain searches, grounding, code execution, and citation generation to produce comprehensive analysis tables. - **Dynamic UI Generation:** It builds interactive web pages (e.g., a 3‑D Golden Gate Bridge simulation) with sliders, fog, lighting, and responsive elements. - **One‑Shot Game Creation:** From a brief prompt, Gemini 3 Pro generated playable versions of *Crossy Road* and a *Don’t Starve*‑style 2D game, complete with scoring and crafting mechanics. - **Creative Site Building:** Produced a slick, cat‑themed tech‑news website, automatically sourcing images and adapting layout to screen size. ### Product Rollout Across Google Ecosystem 1. **AI Studio & API:** Primary playground for developers; free access for experimentation. 2. **Gemini App:** Now hosts Gemini 3 Pro, reaching over 300 M users since Gemini 2.5 Pro preview and adding 200 M users since July. New features include: - **Visual Layouts:** Model returns images and arranges them into interactive webpages. - **Dynamic View:** On‑the‑fly interactive portals for any topic. - **Gemini Agent:** Agentic assistant that can act on tasks (e.g., organize inbox) using built‑in tools, moving beyond simple chat. 3. **Search Integration:** Gemini 3 Pro replaces the previous Flash model in AI‑augmented search, enabling: - Query fan‑out and multi‑query rewriting. - Generative UI elements like mortgage calculators embedded directly in search results. 4. **Google Labs & Emerging Apps:** Early access fuels products such as Opal, Stitch, Notebook LM, and upcoming “Plan Anything” tools. ### Future Outlook – Gemini 3 Deep Think - A forthcoming variant designed for prolonged reasoning (tens of minutes per response). - Early scores show strong performance on Humanity’s Last Exam and ARC‑AGI. - Anticipated release will be covered in a dedicated video. ### How to Get Started - Visit **AI Studio** (free) to test prompts, set thinking levels, and explore tool integrations. - For agentic coding, check out the **Anti‑Gravity** video and download the tool for free calls. - Follow upcoming tutorials on using Gemini 3 Pro with the ADK (Application Development Kit). Gemini 3 Pro marks a significant leap in reasoning, coding, and multi‑modal interaction, outperforming rivals on most benchmarks and being woven into Google’s core products—from AI Studio to Search—making it a versatile, production‑ready model for both developers and end‑users.

Full Transcript

Okay, so the model that people have been
waiting for for quite some time, Gemini
3, is being released today. And as one
of the early testers, DeepMind has given
me access to this model. And in this
video, I'm going to go through a whole
number of the different aspects of the
release. I'll talk about what Gemini 3
Pro is good at. I'll talk about where
Google is focusing with this model to
actually get it to be much better at
certain kinds of tasks. And then I'll
also talk about how Google plans to use
this model to support a whole suite of
new features and new products. Now along
with this Gemini 3 Pro release today,
Google's also released a product called
anti-gravity which is all about agentic
coding. Now I've actually made a
separate video where I go in depth into
that. So if you're interested in that,
check out that video as well. All right.
So first off, this model has taken quite
a while to arrive. And I'm not talking
about just the sort of few months that
people have been talking about rumors of
the model. I'm talking about that this
model is really the culmination of
Google focusing on a whole bunch of
things over the past few years. And
that's things on the sort of AI
infrastructure level with things like
TPUs and data centers etc. right through
to the cutting edge research that's
driving these mixture of experts models
that not only started at Google but has
been advanced in many ways over the past
couple of years and that brings us to
today of Gemini 3 and really for this
launch Google's sort of focusing on a
number of key capabilities for the model
and then how they're actually going to
use those in various products so
internally Google's been focused on
giving Gemini 3 a series of capabilities
and these revolve around a few different
things. One being that the model should
have a jump in reasoning which it
certainly does when you start playing
with it. But in many ways that reasoning
is focused on certain skills. So when
you're using the actual model via an API
or via one of the chat apps etc. The
whole goal there was to make the model
clever, concise and direct. And this
seems to be a really clear direction
that Google's going in in that it seems
that they're not going for the heavy
sort of personality driven style model
perhaps like what OpenAI is doing.
They're going for something that's more
both an assistant and a tool that will
actually do work for you and do a lot of
the heavy lifting for you. And this
really shows up in a number of the
skills that Gemini 3 Pro is actually a
lot better at. We see this, for example,
not only in things like the
state-of-the-art reasoning, but we also
see it in things like long horizon
tasks, in the ability to build and to
plan. And when we look at that, that's
not only things like coding tasks, but
being able to build dynamic UIs on the
fly so that the user can then interact
with the model far beyond just a text
driven interface. And while I'm
mentioning coding, this has clearly been
an area of focus for Google is both the
coding and the whole idea of agents
being able to follow through and do more
sophisticated tasks with things like
function calling with long horizon
tasks, etc. And we can even see this
when we start to look at things like the
benchmarks. So, not only does Gemini 3
Pro outperform the 2.5 Pro on all the
major benchmarks and often by a
significant margin, it's also the first
model to top LM Marina with a score that
goes beyond 1500 ELO. And that that's 50
points up on the Gemini 2.5 Pro. And
really, the only thing that's in between
those is some of the Gro 4.1 models.
Some other benchmarks where this does
really well is that it achieves 37.5%
on humanity's last exam. The progress on
that exam clearly has gone a lot faster
than the creators of it probably
intended, but it does really measure
some of the genuine sort of
comprehension and multi-step logic in
there, which you can see Gemini 3 Pro is
a lot better at. The other benchmark of
note is that they top the GPQA diamond.
So that's the Google proof question
answering and the whole idea there is to
measure both reasoning but also sort of
deep knowledge that someone like a PhD
would actually have in their specific
topic. If we come in and look at the
benchmarks from the model card, it's
pretty damn impressive, right? Not only
have we got humanity's last exam, which
I've talked about, you've got Arc AGI uh
being substantially higher than both
Claude Sonnet 4.5 and GBT 5.1. And then
we've got some of the agent coding ones
where we look at things like terminal
bench 2 again substantially better not
only than Gemini 2.5, but than
competitors models in there. The Aentic
tool use benchmark is doing really well,
edging out Cha Sonnet 4.5. And as you
can see, pretty much overall on all of
these, with perhaps the exception of
SweetBench, Gemini 3 Pro is beating out
the competition here. Right, let's jump
into AI Studio and have a look at some
of the actual examples of building
things with this using the AI Studio
build tool. Okay, so coming into AI
Studio, what I thought would be
interesting is to try out a number of
different examples where it's got to use
multiple searches and multiple sort of
tools for doing this. So, while I'm not
actually giving it a huge amount of
tools, we can see from this prompt that
the idea here is that it's going to need
to do searches, respond to those
searches, write some code, and then
gradually put all this together. And you
can see sure enough it kicks off and
actually does that. Now in the end you
can see down here I've given it code
execution. I've given it grounding and
I've given it the URL context to
basically pull things back and we've
asked it to basically do an analysis of
different coding tools out there. You
can see in the end it's able to write a
bunch of code for this. It's able then
to sort of execute that, use this to
actually make a comparison table and put
it all together. And then it gives us
citations along the way of what it
actually found at each site. And I've
gone through some of them. They seem to
be quite accurate. And you can see to
get to this point, it's used a lot of
different sources, but it's also used a
large amount of searches with each of
these different sort of keywords being
searched to find different benchmarks,
to find different reviews, etc. And this
is common pattern that I see through
lots of the different tests that I've
run doing a similar kind of thing of
asking it to basically come up with a
state of AI agents doing this. You'll
see that it goes through and in this
case it was supposed to make some
slides. So I haven't actually given it a
set of slides tool here. We're just in a
sandbox. But I've given it code
execution. I've given it the grounding
search etc. And to do these kind of
tasks, it needs to do a lot of multihop
sort of steps going through this. And we
can see that again when we're looking at
the all the sources that it's used and
the large amount of searches that it's
actually done to find the different
information and then collect it
together. This is a really good sign for
a lot more of the long horizon sort of
tasks. And it definitely shows up when
we're using the API version as well and
giving it a series of tasks where it's
basically required to go through and put
together a lot of these things over a
lot of multihop or multi-step reasoning
points going through it. All right, if
we come and look at some of the actual
coding abilities for this and I'll go
into the actual build tool in a second.
You can see here, this is actually an
example that came from one of the
googlers of making a 3D voxal image. And
there have been quite a lot of these out
there that have sort of leaked over the
last few weeks. This one I find very
interesting because this is basically an
interactive simulation. And this is of
the Golden Gate Bridge using 3JS. It's
nothing too surprising there, but having
the different lighting sliders, fog,
describing all of these things in there,
the model's got to pay attention to
actually be able to do this. And you can
see that when we look at its thoughts,
it's addressing each of the elements
that were brought up, which is really
good. And then finally, it gives us this
simple HTML page out. So I can zoom in
and zoom out. I can change the time of
the day so we can actually see things.
So, the way I've spun this around to, I
can do things now. I can increase the
traffic density. We can actually see
some shining off the water. We can see
some of the buildings in San Francisco
on the back end there. And we can even
add in things like fog so that we can
actually see what's actually sort of
going on here. I think we've got some
boats bobbing around there. And this is
actually pretty impressive for what it's
actually doing in here. So another area
where the model actually really shines
well is this whole sort of build your
ideas with Gemini which is in AI studio.
So this is like their vibe coding tool.
You can oneshot a lot of things and you
see when you actually go to create
something you can pick different
elements that you want to be in that app
and they will get added in here. Now I
got to say that the model is
particularly good at things relating to
games which kind of surprises me. Okay,
so here is one that I basically gave it
a very short prompt. I seem to have lost
my prompts, but this is basically a
oneshot game of building something like
Crossy Road. So, if you know Crossy
Road, you've got different characters.
It's a voxil environment, and it's kind
of like the old school game Frogger. You
can see here that this is basically done
the whole thing in one shot. And you
could argue that, yeah, okay, it's not
as nice. We've got parts of the board
cut off and stuff like that. But you can
see that I've got a fully functioning
game here that I can play. And so I can
come around, I can crash into a car
even. And you can see that this is one
shot for actually putting this together.
We've got a score system. We've got a
best score system. We've got each of
these things in here. Another example is
this one that I've asked it to make sort
of a clone of the game Don't Starve. So,
if you know Don't Starve, it's got a
certain cartoon kind of look to it. All
I've asked it in the prompt is, "Can you
build me a 2D cartoon game in the style
of Don't Starve where you control the
character as you walk around the world
and find different elements to use for
crafting?" That's the entire prompt. And
it's gone off and been able to put
together something that allows me to do
this. Now, I haven't actually tried
playing this. Now, it's gone off and
actually made something where I can walk
around the world. Perhaps not the best
graphics ever, but you can see in here
it's also got the font very similar to
the real game, but I've also got a
crafting thing here where I can see
that, okay, these are the things that I
would need for actually crafting in
here, which is very similar to the real
game. So, it's kind of interesting that
it not only has a good sort of sense of
what is in the real game, but it also
has the ability to actually code it and
put this together. Okay. If you wanted
to build something that's not an actual
game, here's an example that I did last
week where I basically asked it to build
me a professional looking news site that
parodies tech news but for written for
cats. Make it look slick and have good
graphics in there. And you can see that
in here it's built a whole sort of
website. It's obviously used, I'm
guessing, nano banana or image gen to
actually create the images in here. But
you can see that it's gone through and
put all of these together as a different
elements. And we've got like trending
news and stuff like that. And even if we
change the size of the screen in here,
it can adapt to that. So rather than
just go through a bunch of prompts and
show you different things, I would
suggest that you come in here yourself
to AI studio. Don't forget this is free.
You can try this out and see for your
particular use cases. How is it actually
responding? You can come in here and set
a thinking level. You can set obviously
a lot of the sort of standard tools and
stuff in here. and it's very easy to get
started with either the AI Studio
version or the API version for actually
testing out your prompts and your use
cases. So, one of the things that's
really interesting about this release is
the actual platforms that Google is
releasing the Gemini 3 Pro model on. So,
if we look back to when 2.5 Pro came
out, it was basically just AI Studio
back then. These models were only being
used by devs. The preview models in
particular were only being supported on
AI Studio, often not even on Vert.Ex or
GCP back then. So for Gemini 3 Pro, this
is a very different story 8 months
later. So of course, we've got the whole
sort of AI studio in there. That's not a
surprise. And having Vertex, I guess, is
also not a surprise these days. But
beyond that, we've got the Gemini app,
which we've started to see the models
come out on day one on the Gemini app
only in the last 6 months or so. And the
Gemini app itself is seeing massive
momentum. It's added over 300 million
users since the 2.5 Pro version of
Gemini came out in preview, and it's
added over 200 million users since July.
And while I think for us as developers,
we tend to think of AI Studio as being
the place to go to for the latest Gemini
models, clearly the Gemini app is what's
getting the most amount of traction
nowadays.
So in the Gemini app, they're actually
adding some really interesting things
which are features that Gemini 3 is able
to actually generate. So the first up is
visual layouts. So the idea here is that
rather than just return text, the model
is able then to go and find images and
then lay them out as well as being able
to generate different outputs via code
that don't get shown as code but get
shown as a website that is interactive
that people can engage with. Another one
along this lines is the idea of dynamic
view. And you can see in some of the
examples that they're showing for this
that you will be able to make fully
interactive portals on the fly in the
Gemini app for a particular topic or
subject that you're interested in. And
you got to think that this takes it to a
whole new level for things like the
Gemini app's learning mode where you can
basically get things made specifically
for what you're after to be able to
interact, learn, use it, etc. Another
new feature which in some ways you could
even think of as being a whole new
product is coming to the Gemini app
which is Gemini agent. And this is where
you can think of the Gemini app actually
making use of the sort of long horizon
tasks fully agentic sort of stuff with
tools etc in the Gemini app. So the idea
here is that rather than you just ask
for information or rather than you just
have a chat in here, this is like a
doing thing where you can basically tell
it things like go and organize my inbox,
go and do this task and as an agent with
a set of tools that Google's basically
empowered it with, it will be able to go
off and do this. Now, we don't know yet
whether that's going to also include
MCPS or any form of sort of custom
tools, but certainly you can see them
going down the similar sort of line to
Claude Skills and moving away from these
apps being purely a chat interface to
now being much more something where not
only can you get work done, but it can
get work done for you. The other big
platform that they've announced with
Gemini 3 Pro and is going to actually be
using the Gemini 3 models is search. And
we know that this is Google's bread and
butter. We know that they've taken quite
a while to basically implement some of
the features that some of their other
competitors or sort of answer engines
have already been adding in there. But
for me, the interesting thing here was
that up until recently, Gemini has
always used the flash models for search.
So even if you were doing things in AI
mode and stuff like that, you generally
were using the flash model of Gemini. In
this case, it seems with Gemini 3, they
may actually be using the pro model and
allowing you to get the best out of the
pro model there. And it does seem from
some of the things that they've shown
that there's some really interesting
ideas that they're using the Gemini 3
models for in search. So the whole idea
of fanning out a query, basically taking
a query and then rewriting it into
multiple queries and then doing checks
on that is something that perhaps up
until now has just been too compute
inensive to do. Google has now worked
this out and put it into the AI mode so
they can actually get you the best
responses back. on top of this AI mode
is getting a whole sort of generative UI
in the actual AI mode. And there are
some really nice examples of how they're
using this to basically if somebody asks
a particular question about a mortgage,
they can actually have the model write a
mortgage calculator and show it in
relation to your specific query. And
this generative UI doesn't stop there.
this whole idea now of we're moving into
a world where the UI that you're
basically engaging with can dynamically
change based on what you're asking of
the model. So that's going to come to AI
mode I think first in the US but I
imagine it'll gradually be rolled out to
the rest of the world as things go along
and it doesn't stop there. Clearly a lot
of the product teams inside Google have
had access to this model and been
planning for this model for a while. So
I think you'll see a lot of interesting
stuff come out of Google Labs that are
ideas that they've had perhaps for a
while but just haven't had the model
that was strong enough to be able to
deliver this. So obviously over the last
few months we've seen Opal come out of
Google Labs. We've seen Stitch. We've
seen a whole bunch of different ideas
that have started their journey there.
Not to mention Notebook LM which has had
the biggest turnaround for an app I
think ever in that when I first made a
video about it there are only a couple
of thousand people in the world using
it. Now it's become one of Google's
hottest products. So this whole idea of
learn anything across the different
apps, build anything perhaps mostly in
AI studio but also in things like Gemini
CLI and of course anti-gravity as well
through to things like plan anything and
the built-in agents that the various
platforms are using just show how strong
this model is and how Google has planned
for this model to be able to be rolled
out across many platforms.
and many apps and features that people
are using in the Google ecosystem.
So lastly, just one more thing that they
did announce but is not actually
released yet and I can't talk too much
about this is Gemini 3 Deep Think. I
made a video of Gemini 2.5 deep think
when that came out and I talked about
how that there were a lot of challenges
around that model that the time to first
token could be 15 minutes that it would
think so long and stuff like that and
unfortunately the model didn't make it
into the APIs I suspect purely because a
lot of the challenges around actually
running that model and perhaps even the
sort of cost of running that model but
the good thing is the deep mind team
have now announced ounced Gemini 3 deep
think which is quite a substantial
improvement on the previous one. Again
with the same idea that this is the kind
of model that you use when you're happy
for it to go off and think for tens of
minutes at a time before it will even
respond to you. And with this
announcement, they've also rolled out
some pretty impressive performance
scores on things like the humanities's
last exam and for me more interestingly,
the ARC AGI challenge. So, as that gets
closer to release, I will make a
separate video just for that and walk
through some examples of what that can
do. Overall, the Gemini 3 Pro release is
very impressive, both from a model
perspective, if you're going to be using
it through the API, but also from a
products perspective, if you're going to
be using it in the various Google apps,
etc. And while this still is a preview
release, we can expect just like with
the 2.0 models, the 2.5 Pro models, and
Flash models, etc., where we saw
multiple iterations of improvement
before it got to the GA release. I
expect we'll see the same thing with 3
Pro and perhaps not that far off in the
distance with the flash and with the
flashlight models. So, if you've got
some use cases, go over to AI Studio,
have a play with the model. It's totally
free. You can try it out. You can see
how it works. If you're interested in
agentic coding, definitely go and check
out anti-gravity. Like I mentioned
earlier on, I've made a full video about
that. That's something that you can
download today. You can start using and
gives you a bunch of free calls that you
can actually use the Gemini 3 Pro model
for trying out different coding tasks,
etc. And over the next couple of weeks,
I'll make some follow-up videos about
how you can actually use these things
with ADK. And also, I think there are
some other models in the works that
perhaps will see the light of day in the
not too distant future. All right, as
always, let me know what you think in
the comments. I'd love to hear what you
see if you're trying out prompts
yourself. I've tried to make this video
not just a video of me walking through
prompts, but to actually talk about some
of the key thinkings behind what's
actually going on. And I'd love to hear
from people where they see themselves
using this both from an API perspective,
which I guess is the main thing that I
would be interested in, but also from
the products perspective, I think is
really interesting. So anyway, as
always, if you found the video useful,
please click like and subscribe and I
will talk to you in the next video. Bye
for now.

Summary

Share This Summary

Embed This Summary

Stay Updated!