Tech leaders spent 2024 promising that autonomous AI agents would join the workforce in 2025. They didn’t. Despite impressive demos in coding environments, generalized agents faltered at real-world tasks like navigating websites, handling mouse-driven interfaces, and executing multi-step workflows without derailing. Startups built “shadow” versions of popular sites to train agents, and OpenAI launched an early ChatGPT Agent, but reviewers flagged slow, error-prone behavior—sometimes getting stuck on basics like drop-down menus. The core issue is structural: agents rely on the same large language models that still hallucinate and struggle with spatial-temporal reasoning, amplifying small mistakes across long task chains. Proposals to retrofit the internet for bots—via standardized interfaces like the Model Context Protocol or agent-to-agent handoffs—may help, but will take time. OpenAI has since de-emphasized agents to focus on core chatbot improvements, and even boosters now frame progress on a decade-long timeline, not a single “year of the agent.” For businesses, the takeaway is to temper expectations: coding copilots and narrow automations are here, but the broad “digital labor revolution” is still in the shop.
Related articles:
— ReAct: Synergizing Reasoning and Acting in Language Models
— Toolformer: Language Models Can Teach Themselves to Use Tools
— Reflexion: Language Agents with Verbal Reinforcement Learning
— Voyager: An Open-Ended Embodied Agent in Minecraft
— WebGPT: Browser-assisted Question-Answering with Human Feedback





























