AI Agents fail 97% of real-world tasks, studies show

AI Agents fail 97% of real-world tasks, studies show
Research shows humans still outperform AI across real workflows

​Several recent studies reveal that AI agents still fail to compete with humans when it comes to performing real-world tasks.

According to research by Scale AI and the AI Research Center, artificial intelligence agents were unable to complete 97% of Upwork tasks even at a basic level. The study tested six different AI models across 240 Upwork projects in categories like writing, design, and data analysis, comparing results with those of real freelancers.

The best-performing AI model, Manus, successfully completed only 2.5% of tasks, earning about $1,810 out of $143,991 in available work. Other models, such as Claude Sonnet and Grok 4, managed only 2.1%. Researchers concluded that AI agents struggle with multi-step workflows, initiative, and decision-making, suggesting that AI will not replace human jobs anytime soon.

A separate study by the European Broadcasting Union and BBC found that AI models — including ChatGPT, Copilot, and Perplexity — are ineffective at news reporting. They fail to meet key journalistic criteria such as source verification, accuracy, text generation, and distinguishing fact from opinion.

In 45% of AI-generated answers, researchers found at least one significant error; only 31% of responses were rated as correct, while 20% contained outdated, misleading, or false information.

Meanwhile, Freelance.com reported that AI-generated cover letters are undermining the job application process — leading to fewer hires or misaligned matches. The company also found that top-skilled professionals (top quintile) are 19% less likely to be hired than before, while lower-skilled candidates (bottom quintile) are 14% more likely to be hired.

Without humans, the world grows empty

These findings are consistent with an MIT study from August, which concluded that 95% of organizations saw no return on their $30 billion AI investments.

According to WorldTest, a study conducted by MIT and Basis Research, AI agents can match patterns and predict words — but struggle to build internal models of the world.

The MIT research involved 129 tasks across 43 interactive environments, requiring AI to predict hidden aspects of the world, plan sequences of actions to reach goals, and detect rule changes. In comparison, 517 human participants performed nearly optimally, while AI models often failed.

Researchers suggest humans excel because they intuitively understand environments, adjust perspectives, experiment, reset, and strategically explore. Increasing computational power did not help existing models — it improved performance in only 25 of 43 environments.

David Sacks, policy advisor on crypto and AI under the Trump administration, also warned that social media and search engine censorship could become deeply dystopian with generative AI.

He argued that the term “woke AI” understates the issue, describing instead an “Orwellian AI” that distorts answers, lies, and rewrites history in real time to align with the prevailing political narrative.

As we wrote, Goldman Sachs: AI models accelerate autonomous agent capabilities

This material may contain third-party opinions, none of the data and information on this webpage constitutes investment advice according to our Disclaimer. While we adhere to strict Editorial Integrity, this post may contain references to products from our partners.
Weekly Top Bonuses
up to $2,500
deposit bonus for all clients
CLAIM BONUS
Your capital is at risk.