Introduction of GPT 5.4

 3 min read

YouTube video ID: rvdUBieefR0

Source: YouTube video by Matthew BermanWatch original video

PDF

OpenAI has announced GPT 5.4 as its newest flagship model, branding it as “the best model on the planet.” Early‑access users describe the experience as “incredible,” noting that the model unifies the previously separate GPT 5.2 (general purpose) and GPT 5.3 CodeX (coding‑focused) into a single system. The release is positioned alongside Anthropic’s Opus 4.6, with both companies aiming at real‑world knowledge work and agentic tasks.

Model Features and Performance

GPT 5.4 combines world knowledge, logical reasoning, and a “great personality,” making it a strong candidate for personal AI assistants. Its capabilities span coding, creative writing, tool calling, and autonomous agent workflows. A headline feature is the 1 million‑token context window, matching the length offered by leading competitors. In addition to the larger context, the model is reported to be faster and more token‑efficient than its predecessors, allowing knowledge workers to read PDFs, generate PowerPoints, and conduct web searches with less overhead.

Benchmarking Results

OpenAI’s internal benchmarks illustrate the performance edge of GPT 5.4:

  • OS World (computer use) – GPT 5.4 Thinking scored 75 %, a slight lead over GPT 5.3 CodeX’s 74 % and Anthropic’s Opus 4.6 at 72.7 %.
  • SWE Bench Pro – GPT 5.4 Thinking achieved 57.7 %, surpassing GPT 5.3 CodeX (56.8 %) and Google’s Gemini 3.1 Pro (54.2 %).
  • GDP Val (real‑world knowledge work) – GPT 5.4 Thinking posted 83 %, outpacing Opus 4.6’s 78 % and even the GPT 5.4 Pro variant.

The model also led the Frontier Math benchmarks. The brief notes that companies often select benchmarks that favor their own systems, so direct comparisons can be nuanced.

Demos and Real‑World Use Cases

Live demonstrations highlighted GPT 5.4’s versatility:

  • Gmail automation – starring, labeling, and generating calendar invites directly from natural‑language prompts.
  • Bulk data entry – converting a JSON object into structured entries at real‑time speed.
  • Game development – building a theme‑park simulation and an RPG from simple textual descriptions.

The model can generate code that drives browsers and computers via libraries such as Playwright, interpreting screenshots to issue appropriate commands. In the OS World test, GPT 5.4 achieved 75 % accuracy with only 15 tool yields, compared with GPT 5.2’s sub‑50 % accuracy and 42 tool yields.

Pricing and Token Efficiency

While performance has improved, the cost has risen:

  • Input tokens – $2.50 per million for GPT 5.4 (up from $1.75 for GPT 5.2).
  • Input tokens Pro – $30 per million (up from $21).
  • Output tokens – $15 per million (vs. $14 for 5.2).
  • Output tokens Pro – $180 per million (vs. $168).

The higher output prices make the model “expensive,” especially for heavy‑usage scenarios, even though it is more token‑efficient than earlier versions.

Prompting and Usage Guidelines

Prompting GPT 5.4 differs from the approaches used with Opus or Claude models. Users are encouraged to consult the latest prompting guide and to maintain separate prompt sets for GPT 5.4 and other models. The “Thinking” mode can provide an upfront plan before execution, similar to the planning feature in Cursor, helping to steer the model and conserve tokens.

Rapid Model Development

OpenAI and Anthropic are releasing new models at a “lightning speed,” often on a weekly cadence. Both companies have streamlined their pre‑training cycles, enabling continuous improvements. This rapid pace contrasts with earlier releases such as GPT 4.5, which were described as massive, slow, and costly.

Industry Reactions

  • Matt Schumer – Calls GPT 5.4 “the best model on the planet,” finds the Thinking variant sufficient for his work, and praises its coding abilities while noting some front‑end limitations.
  • Flavio Adamo – Is impressed by the million‑token context window and the model’s performance on SWE Bench and CodeX 5.4 tasks that previously challenged older versions.
  • Peter Steinberger (OpenAI) – Highlights the “coding jump” as comparable to the leap from 5.0 to 5.1, now unified with general reasoning and agentic capabilities.
  • Sam Altman – Acknowledged reported issues and pledged immediate fixes.

These early testimonials suggest strong enthusiasm among testers, even as the community watches for rapid iteration and bug resolution.

  Takeaways

  • GPT 5.4 is marketed as the new best model, merging coding and general AI capabilities into a single flagship system.
  • With a 1 million‑token context window, faster speed, and higher token efficiency, GPT 5.4 outperforms GPT 5.2, GPT 5.3 CodeX, and rivals on OS World, SWE Bench Pro, and GDP Val benchmarks.
  • Live demos show GPT 5.4 automating Gmail, handling bulk data entry, and creating games, proving its usefulness for knowledge work and agentic tasks.
  • The cost has risen to $2.50 / M input tokens (or $30 / M for Pro) and $15 / M output tokens (or $180 / M for Pro), making it more expensive than earlier versions.
  • Early testers like Matt Schumer, Flavio Adamo, and OpenAI’s Peter Steinberger praise its performance, while Sam Altman commits to fixing reported issues quickly.

Frequently Asked Questions

Who is Matthew Berman on YouTube?

Matthew Berman is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

PDF