Methodology

Name: I Tested New Sonnet 4.6 vs Opus 4.6: Speed, Token Usage, Code Quality
Uploaded: 2026-03-01T11:29:27.354078+00:00
Channel: AI Coding Daily
Description: Anthropic recently released the Sonnet 4.6 model, now available through Cloud Code. The presenter set out to compare Sonnet 4.

AI Coding Daily

Mar 01, 2026

•

2 min read

YouTube video ID: dThh2V7B9OQ

Source: YouTube video by AI Coding Daily — Watch original video

PDF

Anthropic recently released the Sonnet 4.6 model, now available through Cloud Code. The presenter set out to compare Sonnet 4.6 with the existing Opus 4.6 model using a consistent experimental framework, motivated by Boris’s claim that Sonnet “nears Opus‑level intelligence.”

Methodology

The test suite comprised seven Laravel projects: five default starter kits (React, Vue, Livewire), one API‑only project, and one Filament admin‑panel project. For each codebase the models were asked to generate CRUD files, refactor the database schema, and fix hidden bugs. Evaluation relied on automated external tests and a manual review of code quality. The experiment ran sequentially—first Opus completed all tasks, then Sonnet tackled the same set. Session usage started at 0 %; after Opus it rose to 37 %, and after Sonnet to 49 %.

Performance Results

Opus required 39 minutes to finish the seven projects, while Sonnet completed them in 26 minutes, a clear speed advantage. Token consumption, however, was higher for Sonnet (49 % of the session) than for Opus (37 %). In terms of test outcomes Sonnet achieved zero failed tests, whereas Opus produced a single failure caused by a non‑existent create_first method in a seeder.

Code Quality Analysis

Opus tended to dive deeper into the codebase, employing more object‑oriented patterns, class‑based validation rules, and the latest Laravel features such as Wayfinder syntax. Sonnet often used older syntax but still produced functional code. On the UI side Sonnet showed strengths: it leveraged the Flux library for richer button and icon components and made more intuitive menu‑item groupings in the admin panel. Opus, by contrast, generated more detailed database queries, for example including post counts when listing categories.

Interpretation and Conclusion

The higher token usage of Opus appears linked to its deeper analysis, frequent consultation of documentation, and adherence to best practices. Sonnet’s faster turnaround likely stems from a leaner approach that skips extensive guideline checks. The presenter questions whether the modest quality differences matter to most project owners, noting community feedback that Sonnet suffices for roughly 95 % of everyday work. For small Laravel projects Sonnet is cheaper and quicker, making it the recommended daily‑work model, while Opus remains valuable for more complex tasks where thoroughness outweighs speed and cost.

Future Plans

The presenter will continue running comparative experiments with upcoming models and encourages readers to subscribe to AI Coding Daily for updates. Earlier comparisons with open‑source alternatives are available to premium subscribers.

Takeaways

Sonnet 4.6 completed the seven Laravel projects in 26 minutes, beating Opus 4.6's 39‑minute runtime.
Despite being faster, Sonnet used a higher token share (49 %) than Opus (37 %), indicating a higher raw cost per session.
Opus produced deeper, more modern Laravel code but incurred one test failure, while Sonnet delivered flawless test results with slightly older syntax.
Community feedback suggests Sonnet is sufficient for about 95 % of daily development tasks, making it the preferred model for small projects.
The recommendation is to use Sonnet for routine work and reserve Opus for complex scenarios where thoroughness outweighs speed and cost.

Frequently Asked Questions

Who is AI Coding Daily on YouTube?

AI Coding Daily is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Laravel Book Recommended

一本深入讲解 Laravel 框架的实用指南，帮助开发者快速上手并提升代码质量。

Amazon →

Php Ide

专业的 PHP 集成开发环境，提高编码效率并提供强大的调试功能。

Amazon →

Laravel Starter Kit

预配置的 Laravel 项目模板，加速新项目的初始化并包含常用前端技术。

Amazon →

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

Summarize another video

Full Transcript YouTube

Hello guys. So yesterday Anthropic
released Sonnet 4.6 model which is now
available in cloud code. This is the
tweet by Claude and Boris and I decided
to try it out and compare versus Oppus
4.6. Boris is claiming that it nears
Oppus level intelligence but what does
that exactly mean? I tried on seven
Laravel projects and tasks because now I
have a methodology. I've created my own
internal for now eval projects and
yesterday I published a video testing
six LLMs on that for premium members of
AI coding daily and I will link that in
the description below. So in this video
I will show you the result of the same
methodology on OPUS versus sonnet. So we
will be talking about this table for now
blurred out and I will show you the
results in a few minutes and explain
them. So for the same seven tasks on
Laravel different kinds of projects I
will compare the time of both models the
token usage which basically means cost
then how many failed test did they
produce if any and then later we will
dive into the code and compare the
differences. First really quickly about
my methodology. So those seven projects
are actually five projects of default
Laravel starter kits Laravel 12. So
react view and live wire and the prompts
were to generate a few cruds related
files. So pretty big task with a few
dozen of files to generate. Then I
repeat the same for API project and for
filament admin panel project. So that's
five tests and the evaluation was
automated test for all of them external
automated tests which AI agent doesn't
know about and then two more kind of
extra tests prompt to refactor database
structure and then another prompt to fix
a bug which is kind of hidden not in
plain sight so how the model thinks and
in terms of usage I started the
experiment with these numbers this
morning so it 0% current session and
this was the number that I compared for
OPUS. So I launched all seven tasks in
OPUS first and calculated the usage and
the time and then same tasks for Sonnet.
So after Oppus finished the tasks, the
current session usage was 37%.
And by the way, I'm on Claude Max 5X
plan $100 per month. So Opus used 37% of
my session. Then I switched to sonnet
and after the sonnet was done with all
of its tasks the usage was 49%.
So compare the difference and this is
the actual full table of time and token
usage and failed tests. So task by task
I was comparing the time and the total
time is 39 minutes versus 26. So that
whole evaluation took me like almost 2
hours to complete including the Google
sheet. this filling and some obsidian
nodes. So yeah, running AI evaluations
is not a quick task and sometimes even
expensive but in this case I was on
anthropic plan. So yeah, token usage is
staggering difference and the most
important part that evaluation on failed
tests. Sonnet delivered on all seven
projects without any failed test and
Opus ironically slipped once. I called
it slipped because sometimes models
hallucinate or some minor detail appears
and this can happen basically to any LLM
and this was that error by OPOS. So
Cedars failed in Laravel because it came
up with a method create our first which
doesn't exist kind of a lame thing but
again I noticed that can happen to all
models including Frontier models like
OPUS or GPT53.
So that's why automated tests is a
crucial thing for any project where you
use LLMs and this is by the way one of
the examples of those eval tests. So
testing basically whether the page loads
whether the data is there. So testing
all the cruds from feature point of view
without really looking at the code that
much. But if the code fails somewhere
then automated test would probably flag
that some CRUD behavior doesn't work. So
in this case for all those projects both
models delivered working code except for
ous slipped once but now let's take a
look at the code quality inside and
looking deeper into the code it starts
making sense why sonnet was faster
because it was cutting corners here and
there the code was working but some of
the code things were deeper in case of
us so I will show you the examples on
the left we'll have oppus And on the
right we'll have set. So for example for
validation rules uses class rule unique
ignore route and set uses string unique
categories slug kind of old version
which still works but object-oriented
approach is probably more appropriate
with autocomplete and everything.
Another example where ous got a bit
deeper and delivered one extra function.
So in the list of categories for the
blog for example, it gets the categories
from the database with count of posts.
Sonnet doesn't do that. Sonnet just goes
category get order by name. And in the
components of react in this case does
use posts count and show them in the
table and set does not use post count
and in the table has only name slogan
actions. Again, it's not a bug and I
didn't specify that in the prompt how it
should work, but OPOS went a bit deeper.
Another example is that OPUS uses latest
features in form for example for React.
It uses wayfinder and relatively new
form syntax which is now kind of
officially recommended or the first
party citizen so to speak with Laravel
ecosystem and set used old form with on
top it should be somewhere here use form
so this one use form from inertia react
again still works but us went deeper to
deliver kind of best latest practice but
also oppos was not always the winner of
head-to-head. Sonnet's strength was in
the UI things. So for example in
livewire project it's using flux library
and the button in oos just say variant
primary and link. That's it. In case of
set it says flux button and then icon
plus. So here and there set was adding
more UI elements icons components. So
basically taking care of UI better than
ous. In the same file for example the
button for edit the category. So flux
button with size small. That's it that
us did. And then in case of sonnet if we
scroll down the button for the same
thing is flux button with size small
variant ghost and icon pencil. And even
the usage of component libraries is
better on the sonnet side because it
uses flux table with components from
flux library that comes with livewire
starter kit. In case of us it just
delivered table with tailwind classes.
So flux is used here but only for the
buttons not for the full table and even
positioning of the menu items. Sonnet
did it better because for example in the
OPOS the new menu items of categories
and posts were added to the same group
of it's called platform so it was
dashboard here and Oppus added two items
to the same thing in case of sonnet it
left the dashboard as it was and created
a new group called blog with categories
and posts in a separate group again a
better UI decision for the user
experience so yeah if We come back to
the original table and try to understand
the reasons behind the differences. OPOS
gets much more token usage because it
dives deeper into documentation, latest
practices, comparing the code to
guidelines and retesting it probably to
perform well and set delivers the result
quicker in many cases as I understand
with the first result that it was
trained on without consulting the docs
and the guidelines quite often. So we
didn't use inertia latest features or
wavefinder syntax or laravel latest
features that much but the code of
sonnet still does work. The open
question is do we really care? And I
kind of started changing my opinion that
for such small details it doesn't matter
that much. Those sonnet cutting corners
were not really crucial. So in most
cases, most project owners would be
probably more happy with things being
cheaper and faster if the code quality
is like 80 or 90% to the latest
standards. And even not every developer
cares about the later standards that
much. And this is where I want to show
final tweets from the community that I
see about Sonnet 4.6. So this is one
tweet. One user notices that it's
noticeably faster. Then another tweet by
Jordan kind of summarizing what I just
said a minute ago. Oppus tasks are not
really needed that much in many cases.
And this is kind of a great tip by
Jordan. I'm yet to try it out, but
basically what he's saying 95% of my
work runs fine on set and I do agree
with that and Jordan is trying to
instruct claude code to flag if the
model is not appropriate. So, us may be
a waste, but Sonnet may be underpowered.
But this is a proof that Sonnet may be
fine for many of the development cases.
And this is another tweet by Akash for
another use case. If you work on bigger
projects with probably multiple agents
and more complex tasks, then you
shouldn't care about slowness or lower
token usage. Interesting kind of
poetical quote. Lion doesn't concern
itself with token counting. They care
about delivery. Right? And what Akash is
saying here is that people often
miscalculate the cost if set works in
circles of debugging the things and uses
pretty much same amount of tokens and I
also heard the same thought from Boris
from cloud code. But in my experiments
for really small basically Laravel
projects, I will try to do better and
more complex evals in the future. So
subscribe to the channel to get those.
But for my experiments, Sonnet is much
cheaper. So yeah, what do you think
about my experiment with Sonnet versus
Oppus? I kind of agree with Boris that
it's nears Oppus level intelligence in
terms of delivery of the code. So I
guess it kind of swings the situation to
again sonnet being the daily work model
and OPUS used for more complex tasks.
This is my probably recommendation for
now as of February 18th, 2026. The
recommendations may change daily. This
is why you should subscribe to my
channel AI Coding Daily to get all the
latest news and I will continue my
evaluation experiments with new models
that come out. As I mentioned now, I
have evaluations here and I will link
that premium tutorial and 17minute video
in the description below. So, you may
want to take a look deeper at my
methodology and how I compared other
models, open- source openweight models
like Kim Miniax and GLM against the
Frontier Western models or the same
video I made available for premium
Substack subscribers of AI coding daily.
I hope you appreciate and support my
mission of doing these e files by
subscribing to premium of AI coding
daily. All the links in the video
description below. Have you tried set
4.6? What do you think? Let's discuss
all of that in the comments below.
That's it for this time and see you guys
in other