Even though ChatGPT/LLM as a technology is fascinating, many people still question how this technology manifests into the next big thing. The most probable answer is AI Agents.
But exactly what are AI Agents, and how is their capability different from what is offered by ChatGPT currently? At present, you can converse with ChatGPT, pose questions, or assign tasks, and receive responses. In this sense, ChatGPT is reactive and can be considered a productivity tool that accelerates your work.
For example, let's assume you don't know HTML and want to set up a website; you can use ChatGPT to guide you. You can ask questions about logging into a linux machine, formatting files, inserting code to format text into a title, and so on and so forth. However, you still need to ask questions and use the responses to implement your website. This method is akin to using Google Search before the advent of ChatGPT, and given the propensity for LLMs to hallucinate, many people might still find Google search to be a faster or more accurate tool. So in some ways, this isn't much of an improvement over Google Search and doesn't justify the hype around LLM's.
AI Agents will likely vindicate such hype. Differing from ChatGPT, AI Agents will have the ability to solve high-level tasks by breaking them down into smaller subtasks and determining a way to resolve them. This capability will be autonomous and will enhance over time as our ability to train and deploy agents improves.
Switching back to the website setup example, with AI Agents at your disposal, you could outline your website needs, point the AI Agent to your server, and let it autonomously code and set up your website based on your specifications. You would subsequently be able to have a conversation with an AI Agent and refine your website over time. You could ask the agent to try out a different color scheme, supplement parts of the website with additional information, or test variations of the logo.
This process closely mirrors hiring a freelancer from an outsourcing service like Fiverr, who takes high-level instructions, sets up the website, and asks clarifying questions along the way, all while taking feedback on progress. AI Agents should eventually replicate this exact workflow, but without a human in the loop.
There are already initiatives today, like SuperGPT/AutoGPT, that provide glimpses into how AI Agents would work eventually. However, most of these are only good for demos and don't truly function yet. But given the pace of development in the space and emerging architectures to orchestrate LLMs, we should expect workable solutions to emerge within months.
The immediate question that arises is what will be the first practical uses for these AI Agents. And the answers will derive from what will be the "Economics" of running an Agent. If deploying an agent to code up a website costs $1000/month in hardware costs, there will be a limited market, since outsourcing websites like Fiverr can deploy a human at much lower costs. The first agents will surely come with a hefty hardware footprint, but this should rapidly decrease over time with a simultaneous enhancement in Agent capabilities.
Over the next year or so, we should expect AI Agents to be proficient in most jobs that humans perform on sites like Fiverr. They should be able to autonomously execute digital tasks at the skill level of entry-level outsourced workers. Text-based jobs like proofreading and copywriting will likely be automated through Agents first. Simple coding jobs will likely follow soon after. Within the next year or so, we should see AI Agents matching the capabilities of entry-level professionals with 0-5 years of experience. Almost all digital jobs - like marketing, sales, and program management - should become possible to be accomplished by AI Agents.
What this means for the wider outsourced industry is anyone's guess. We should see manifold productivity gains once agents are deployed, but there might also arise new professions whose sole task is to manage these AI Agents. However, in the slightly longer term, we should also expect Agents advanced enough to manage other Agents.
There are some technological impediments on this journey, for example, current LLM’s don't work well with multi-modal (video, images, audio) and are only trained using text. But there are already efforts underway to overcome these challenges, and we should expect workable solutions within the next year.
You might enjoy this: https://edge.aampe.com/p/what-is-an-ai-agent