The best way to learn about AI Agents
The best way to learn about AI Agents is to understand their evolution, here's how to approach it better...
The best way to learn about AI Agents is to understand their evolution, here’s how to approach it better… From the rise of Generative AI technology like LLMs,
There’s been a desire for a well-equipped autonomous system that could use a wide range of tools yet be very conversational. They could utilize the very foundation of GenAI and include other cognitive features like Memory,
So, after several iterations, we finally came down to Agents but it was not a straightforward change,
📌 here are a few iterations made along the way to where we are as an Agentic system:
- The Very Beginning - A Large Language Model
- Workflow: Input: Text → LLM → Output: Text
- These are the transformer-based architectures that were trained on a large variation of datasets to create intelligent chatbots.
- This laid the very foundation of AI Agent’s future.
- Workflow: Input: Text → LLM → Output: Text
- LLM with larger context Inputs and Outputs:
- Workflow: Input: Text/Document → LLM → Output: Text/Document
- With LLMs became smarter so they needed to process information beyond the 8k context window.
- The new LLM architecture was updated with a larger context window to parse bigger documents.
- Workflow: Input: Text/Document → LLM → Output: Text/Document
- RAGs and Tool use
- Workflow Input: Text/Document corpus → LLM + Tool use + RAGs → Output: Text/Document
- Access to the latest information soon became the new trend in GenAI. Hence, RAGs were introduced.
- Similar to RAGs, to improve LLMs responses we started seeing more tool integrations like search API to improve results.
- Workflow Input: Text/Document corpus → LLM + Tool use + RAGs → Output: Text/Document
- Introduction to Multi-Modal workflow
- Workflow: Multi-Modal Input → LLM + Tool use + RAGs → Multi-modal Output
- This laid the very foundation of a simple agentic architecture.
- With the support of real-time data retrieval features, these agents can already perform complex tasks with decent human intervention.
- Workflow: Multi-Modal Input → LLM + Tool use + RAGs → Multi-modal Output
- Our current AI Agent architecture
- Workflow: Text + Multi-Modal Data Input → LLM + Tool Use + Memory → Decision → Text + Multi-Modal Data Output
- AI agents are now equipped with advanced structures:
a. Memory types: Short-term, Long-term and Episodic to remember past activities for improvements.
b. Tool Calling: Utilizing third-party APIs for performing a wide range of tasks like search, flight booking, etc.
c. Decision: Rather than just failing to act, it uses ReACT to start the process again with a different approach to get an output.
d. Supporting better Knowledge bases: Current agents now utilise even semantic databases to form better connections across different nodes for improved reasoning.
- Workflow: Text + Multi-Modal Data Input → LLM + Tool Use + Memory → Decision → Text + Multi-Modal Data Output
- Future architecture of AI Agents
What are your views on the future of AI Agents?
Link :https://www.facebook.com/share/p/1CsXRDVFPX/
This post is licensed under CC BY 4.0 by the author.
