AI vs Human Workers: What the Remote Labor Index Study Really Shows

Name: AI Fails at 96% of Jobs (New Study)
Uploaded: 2026-02-14T13:41:28.045450+00:00
Channel: ColdFusion
Description: Summary and key takeaways on AI vs Human Workers: What the Remote Labor Index Study Really Shows, covering Overview A recent study introduced the Remote Labor

ColdFusion

Feb 14, 2026

•

3 min read

YouTube video ID: z3kaLM8Oj4o

Source: YouTube video by ColdFusion — Watch original video

PDF

Overview

A recent study introduced the Remote Labor Index (RLI), a real‑world benchmark that pits AI models against human freelancers on actual paid jobs from Upwork. The goal was to see whether AI can truly replace human workers in professional, computer‑based tasks.

Methodology

Jobs Tested: 240 diverse gigs (video creation, CAD, graphic design, game dev, audio work, architecture, etc.)
Payment: Average $630 per job, paid to human freelancers.
Process: Both the human and the AI received the exact same brief and any supporting files. After the AI completed a task, human evaluators judged the output against professional standards.
Metric: Success = output equal to or better than a human’s work; Failure = any result below human level.

Key Findings

Overall Failure Rate: 96.25% on average – the best AI (Claude Opus 4.5) succeeded on only 3.75% of tasks.
Worst Performer: Gemini, with a 1.25% success rate.
Failure Categories:
Corrupt or unusable file formats.
Incomplete deliverables (e.g., truncated videos, missing assets).
Poor quality that does not meet professional standards.
Inconsistencies within the output (e.g., mismatched 3D views).
Success Niches: AI excelled in generating creative ideas for audio/image work, writing reports, simple data retrieval/web‑scraping, logo/advertisement design, and producing basic code for data visualizations.

What This Means for the Job Market

Limited Replacement: While AI can speed up narrow, well‑defined tasks, it is far from ready to replace humans in most freelance or professional work.
Economic Overvaluation: Current hype inflates AI’s near‑term value; many CEOs report little financial return from AI deployments.
Human Oversight Required: Even in areas where AI shows promise, a skilled human must verify and refine the output.
Future Outlook: Gartner predicts many companies that cut staff for AI will re‑hire them, and the industry is still in a formative stage.

Broader Implications

Medical Risks: FDA has logged 100 AI‑related errors, including surgical mishaps, underscoring the danger of premature adoption.
Scaling Limits: Throwing more data and compute at current architectures (large language models) is unlikely to solve fundamental reasoning gaps.
Research Direction: Foundational work on reinforcement learning and world‑modeling is needed before AI can truly understand and act autonomously.

Takeaway

AI is a powerful tool, not a replacement for human expertise. Expect modest gains in specific, narrow tasks, but plan for continued human involvement and careful evaluation.

Practical Advice for Professionals

Software Engineers: Consider offering services that audit and fix AI‑generated code ("vibe‑coded" apps).
Creative Freelancers: Leverage AI for idea generation, but deliver the final polished product yourself.
Business Leaders: Implement AI with a clear roadmap, realistic expectations, and dedicated oversight teams.

This article condenses the entire Cold Fusion episode and the underlying research, so you don’t need to watch the video to grasp the findings.

Current AI systems are still far from matching human performance on real freelance work; they are valuable assistants for specific tasks but cannot replace human workers at scale today.

Frequently Asked Questions

Who is ColdFusion on YouTube?

ColdFusion is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.

Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.

What This Means for the Job Market

- **Limited Replacement**: While AI can speed up narrow, well‑defined tasks, it is far from ready to replace humans in most freelance or professional work. - **Economic Overvaluation**: Current hype inflates AI’s near‑term value; many CEOs report little financial return from AI deployments. - **Human Oversight Required**: Even in areas where AI shows promise, a skilled human must verify and refine the output. - **Future Outlook**: Gartner predicts many companies that cut staff for AI will re‑hire them, and the industry is still in a formative stage.

Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.

Artificial Intelligence: A Guide For Thinking Humans Book Recommended

Provides a clear, accessible overview of AI capabilities and limitations, helping readers understand why current AI tools are far from human-level performance

Amazon →

Superintelligence: Paths, Dangers, Strategies Book

Explores the long‑term risks and strategic considerations of advanced AI, giving context to the study's warning about over‑hyped expectations

Amazon →

Links may be affiliate links. We only include resources that are genuinely relevant to the topic.

Summarize another video

Full Transcript YouTube

In the absence of AI and robotics, we're
actually totally screwed.
>> We are working to build tools that one
day could help us make new discoveries
and address some of humanity's biggest
challenges like climate change and
curing cancer.
>> Hi, welcome to another episode of Cold
Fusion. Here's a question. How can AI be
disrupting the job market but also be
losing billions of dollars at the same
time? Well, this video will answer that.
The truth is, while AI helps make some
jobs easier, when compared to a human,
it performs worse a whopping 96.25%
of the time, which basically means give
an AI 10 tasks and it will perform at
least nine of them worse than when
compared to a human. That's at least
according to a new study. It's such an
interesting finding and begs the
question, why has no one systematically
compared how well AI does versus a human
who's done exactly the same job? All
previous benchmarks have been simulated
human work, not real generalized work.
The results from the team of researchers
who did the study makes one think maybe
the true value of consumer AI isn't
hundreds of billions of dollars, but
orders of magnitudes less. I'm not
saying that all AI sucks. This study is
just a general reminder that AI is a
time-saving tool and not a replacement.
Just maybe the economy is valuing it too
highly when it comes to near-term
capabilities.
In this episode, we'll take a look at
the study in detail and discuss what it
all means.
>> You are watching Temple Fusion TV.
>> So, the synopsis of the study was
straightforward enough. Give paid jobs
already completed by real people to AI
models and then see how well the results
compare. Once the AI completes the
tasks, humans evaluate the results. The
researchers called this method the
remote labor index or RLI. It's so
simple. Most of us use a computer to do
modern work, right? So why not just
directly compare how well AIs compete on
a professional computer-based job? The
jobs to be completed were real ones from
the freelancer site Upwork, a site where
you pay remote workers to complete any
given task. The jobs were varied from
video creation, computer Aed design,
graphic design, game development, audio
work, architecture, and more. Both
humans and AI were given the same brief
and any attached files that were
necessary for the job. For example, an
Excel spreadsheet of data or
instructional images.
The AI models were tested on 240 jobs,
each paying $630 on average. So, how did
they perform? The performance was
abysmal. The best AI was Claude Opus 4.5
with a 3.75% success rate when it came
to producing work of an acceptable
quality. You heard that right, a 96.25%
failure rate was the best performer.
Interestingly, Gemini was the loser with
a 1.25% success rate. Now, Claude Opus
4.6 might score 5% better, but that's
still a 91% failure rate. When these
scores get to 35% or 40%, then we can
talk. So, a couple of things to note.
The original paper used AI models that
were 6 months or so old, but their
website has up-to-date results, which
are the scores that I'm referring to in
this episode. I'll leave a link for the
website below.
So, where exactly did the AI systems
fail? Well, first we need to define
exactly what failure means. Failure
counts as not performing a task at or
better than a human level. This is
specifically in the context of a
freelancing environment, an environment
where people actually pay money directly
for the work. With that in mind, the
paper lists four main failure points for
AI systems.
Number one, sometimes the AI would
produce quote corrupt or empty files or
deliver work in incorrect or unusable
formats. Number two, AI quote frequently
submitted incomplete work characterized
by missing components, truncated videos,
or absent source assets. For example, a
video of 8 seconds when an 8-minute
video was required. Number three,
another one was quality issues. Quote,
"Even when agents produce a complete
deliverable, the quality of work is
frequently poor and does not meet
professional standards." End quote. And
finally, number four, inconsistencies
with AI generated work. This includes a
house's appearance changing across
different 3D views or digital floor
plans that don't match the supplied
sketches. It's all very interesting. So
for years now we've been told that AI is
going to replace humans everywhere. But
the truth is we are nowhere near that
point. At least not yet anyway.
So then where did the AI succeed?
Success would mean that the AI does the
same work at the same quality or better
quality than human output. They note
that AI was proficient in creative ideas
like audio and image related work along
with writing, data retrieval or web
scraping. And that kind of checks out.
The success of Open Claw attest to the
latitude and AI images and audio are
already good enough to fool a lot of
people. Advertisement and logo creation
was another successful area. It's also
no surprise that AI was good at report
writing and generating simple code for
an interactive data visualization.
Competent video generation is coming
very shortly. Just take a look at Seed
Dance 2.0.
Heat. Heat.
So, the main takeaway is AI is pretty
good at some things, but horrendous for
general work.
But what else do we learn? This paper
exposes a lot, much of it negative, but
it does show that the RLI format is a
very useful measure of AI performance in
the real world. Reason being,
current-day benchmarks aren't reflective
of real world performance. As the paper
puts it, quote, while AI systems have
saturated many existing benchmarks, we
find that the state-of-the-art AI agents
perform near the floor on RLI. End
quote. I found the study to be very
robust, by the way. So, I'll leave a
link to it below.
According to this study, AI may impact
jobs with lots of language requirements,
audio, simple advertising, or data
retrieval, but human oversight is still
needed. A PWC report found that the
majority of CEOs see no financial
returns from AI. Upper management and
CEOs just command workers to use AI and
expect it to all work. For AI to work
within a corporation, there needs to be
a planned and skilled implementation of
the technology with the knowledge of its
shortcomings. And that doesn't happen a
lot of the time. Gartner predicts that
by next year, half of the companies that
fired workers for AI are going to hire
them back. Also, 9 months ago, Microsoft
proudly proclaimed that 30% of their
code was written by AI. And since then,
we've seen some of the worst software
issues at the company in its history.
Now, it's obvious that AI is disruptive
and some jobs will be lost to the
technology. For example, diffusion
models are proficient in the visual arts
as you saw earlier. But as for LLMs and
the general workforce, this study
indicates that job losses could be a lot
less. The AI space does move fast. So, I
could be wrong, but that's how things
are looking today in early 2026. To sum
up the job prognosis in one line, if
you're a software engineer, set up a
business that fixes vibecoded apps and
you'll make a lot of money. I think the
thing is artificial intelligence really
is going to transform the world like in
ways we can't even imagine. But it's not
going to do it now. Not with this
technology. My favorite example of this
is one trains them on the whole
internet. So they get access to a lot of
written rules of chess and lots of games
of chess and they still make illegal
moves. They never really abstract the
model of how chess works. That's just so
damning. you would not be able to learn
chess after seeing a million games,
reading the rules in Wikipedia and
chess.com. Just making it bigger is not
going to solve these problem. We need to
do foundational research. That's what I
was saying for the last 5 years. What is
intelligence? The problem is is to
understand your world and um
reinforcement learning is about
understanding your world. Whereas large
language models are about mimicking
people, doing what people say you should
do. They're not about figuring out what
to do. just to mimic the the what people
say is not really to build a model of
the world at all. I don't think
>> so. I'm not saying that AI will never
work or it's not genuinely useful
already. There will be some narrow AI
products that work really well. I'm just
warning that there's a significant
financial risk in the current AI space.
The investment ethos and the rollout of
AI everywhere might be misallocating
hundreds of billions of dollars. Even in
the medical field, Reut has just
reported that the FDA has received 100
reports of AI malfunctions, botched
surgeries, and misidentified body parts.
In a few cases, a lawsuit alleges that
the AI misinformed the surgeons on the
locations of their instruments, causing
one to mistakenly puncture the base of a
patient's skull, and causing strokes
from the damage to a major artery in two
others. We don't need to put AI in every
field. It's just not ready yet. Again,
in some fields like coding, high maths,
and writing, AI is pretty good. and can
make jobs a lot easier, but we can't
pretend like it's going to replace
everyone perfectly right now. Now, I was
going to stop the video here, but just a
couple of personal thoughts. Back in
2016 when I started covering AI, it was
fun and fascinating to see how these
things worked. But ever since the big
money started coming in, the hype has
just gone off the charts. CNBC just
reported that companies like Anthropic,
Google, and Microsoft have paid
individual content creators $400,000 to
half a million dollars each to promote
their AI models. Now, brand deals are
fine, but if the current generation of
AI was as revolutionary as being
advertised, they wouldn't need to spend
so much money to convince us. It's a
jarring disconnect. One last thing.
>> We're fooled into thinking those
machines are intelligent because they
can manipulate language. And we're used
to the fact that people who can
manipulate language very well are
implicitly smart. But we're being
fooled. Um now they they're useful.
There's no question. They're great tools
like you know computers uh have been for
the last decade five decades. But let me
make an interesting historical point and
this is maybe due to my age. Uh there's
been generation after generation of AI
scientists
since the 1950s claiming that the
technique that they just discovered was
going to be the ticket for human level
intelligence. you you see declarations
of Marvin Minsky, Newan Simon, um you
know, Frank Rosenblad who invented the
perceptron, the first learning machine
in 1950 saying like within 10 years
we'll have machines that are as smart as
humans. They were all wrong. This
generation with L&M is also wrong. I've
seen three of those generation in my
lifetime. Okay. Um so, you know, it's it
it's just another example of being
fooled. That's Yan Lee Kun, the creator
of convolutional neural networks. He's
been outspoken in saying that the
current AI architecture is reaching its
peak. He thinks that throwing more data
and power at the problem isn't going to
solve it. And I think that's what the
early data is showing us. It's called
the scaling problem, and it's a large
part of my upcoming video about how open
AI is in big trouble. When it's
complete, I'll leave a link for that
episode below, so be sure to check it
out after this. Anyway, that's about it
from me. You've been watching Cold
Fusion. Let me know your thoughts. I'm
sure the comment section will be very
very full of very good discussion.
Anyway, that's it. My name's Doggo and
I'll see you again soon for the next
episode. Cheers guys. Have a good one.
Cold Fusion. It's new thinking.