AI Agents from Zero to Hero — Part 3

In Part 1 of this tutorial sequence, we launched AI Brokers, autonomous packages that carry out duties, make selections, and talk with others.

In Part 2 of this tutorial sequence, we understood learn how to make the Agent try to retry till the duty is accomplished via Iterations and Chains.

A single Agent can normally function successfully utilizing a device, however it may be much less efficient when utilizing many instruments concurrently. One solution to sort out difficult duties is thru a “divide-and-conquer” method: create a specialised Agent for every activity and have them work collectively as a Multi-Agent System (MAS).

In a MAS, a number of brokers collaborate to realize widespread targets, usually tackling challenges which might be too troublesome for a single Agent to deal with alone. There are two essential methods they’ll work together:

Sequential movement – The Brokers do their work in a particular order, one after the opposite. For instance, Agent 1 finishes its activity, after which Agent 2 makes use of the consequence to do its activity. That is helpful when duties rely upon one another and should be carried out step-by-step.

Hierarchical movement – Often, one higher-level Agent manages the entire course of and provides directions to decrease degree Brokers which concentrate on particular duties. That is helpful when the ultimate output requires some back-and-forth.

On this tutorial, I’m going to point out learn how to construct from scratch several types of Multi-Agent Techniques, from easy to extra superior. I’ll current some helpful Python code that may be simply utilized in different related instances (simply copy, paste, run) and stroll via each line of code with feedback with the intention to replicate this instance (hyperlink to full code on the finish of the article).

Setup

Please discuss with Part 1 for the setup of Ollama and the primary LLM.

import ollama llm = "qwen2.5"

On this instance, I’ll ask the mannequin to course of pictures, due to this fact I’m additionally going to want a Imaginative and prescient LLM. It’s a specialised model of a Giant Language Mannequin that, integrating NLP with CV, is designed to know visible inputs, reminiscent of pictures and movies, along with textual content.

Microsoft’s LLaVa is an environment friendly alternative as it could additionally run and not using a GPU.

After the obtain is accomplished, you may transfer on to Python and begin writing code. Let’s load a picture in order that we will check out the Imaginative and prescient LLM.

from matplotlib import picture as pltimg, pyplot as plt image_file = "draghi.jpeg" plt.imshow(pltimg.imread(image_file)) plt.present()

With a purpose to check the Imaginative and prescient LLM, you may simply cross the picture as an enter:

import ollama ollama.generate(mannequin="llava", immediate="describe the picture",                 pictures=[image_file])["response"]

Sequential Multi-Agent System

I shall construct two Brokers that may work in a sequential movement, one after the opposite, the place the second takes the output of the primary as an enter, similar to a Chain.

The primary Agent should course of a picture supplied by the person and return a verbal description of what it sees.

The second Agent will search the web and attempt to perceive the place and when the image was taken, primarily based on the outline supplied by the primary Agent.

Each Brokers shall use one Instrument every. The primary Agent may have the Imaginative and prescient LLM as a Instrument. Please keep in mind that with Ollama, so as to use a Instrument, the operate should be described in a dictionary.

def process_image(path: str) -> str: return ollama.generate(mannequin="llava", immediate="describe the picture", pictures=[path])["response"] tool_process_image = {'sort':'operate', 'operate':{ 'title': 'process_image', 'description': 'Load a picture for a given path and describe what you see', 'parameters': {'sort': 'object', 'required': ['path'], 'properties': { 'path': {'sort':'str', 'description':'the trail of the picture'}, }}}}

The second Agent ought to have a web-searching Instrument. Within the earlier articles of this tutorial sequence, I confirmed learn how to leverage the DuckDuckGo package deal for looking out the online. So, this time, we will use a brand new Instrument: Wikipedia (pip set up wikipedia==1.4.0). You possibly can instantly use the unique library or import the LangChain wrapper.

from langchain_community.instruments import WikipediaQueryRun from langchain_community.utilities import WikipediaAPIWrapper def search_wikipedia(question:str) -> str: return WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper()).run(question) tool_search_wikipedia = {'sort':'operate', 'operate':{ 'title': 'search_wikipedia', 'description': 'Search on Wikipedia by spending some key phrases', 'parameters': {'sort': 'object', 'required': ['query'], 'properties': { 'question': {'sort':'str', 'description':'The enter should be quick key phrases, not an extended textual content'}, }}}} ## check search_wikipedia(question="draghi")

First, you have to write a immediate to explain the duty of every Agent (the extra detailed, the higher), and that would be the first message within the chat historical past with the LLM.

immediate = ''' You're a photographer that analyzes and describes pictures in particulars. ''' messages_1 = [{"role":"system", "content":prompt}]

One vital choice to make when constructing a MAS is whether or not the Brokers ought to share the chat historical past or not. The administration of chat historical past depends upon the design and targets of the system:

Shared chat historical past – Brokers have entry to a typical dialog log, permitting them to see what different Brokers have mentioned or carried out in earlier interactions. This may improve the collaboration and the understanding of the general context.

Separate chat historical past – Brokers solely have entry to their very own interactions, focusing solely on their very own communication. This design is usually used when unbiased decision-making is vital.

I like to recommend maintaining the chats separate except it’s essential to do in any other case. LLMs might need a restricted context window, so it’s higher to make the historical past as lite as doable.

immediate = ''' You're a detective. You learn the picture description supplied by the photographer, and also you search Wikipedia to know when and the place the image was taken. ''' messages_2 = [{"role":"system", "content":prompt}]

For comfort, I shall use the operate outlined within the earlier articles to course of the mannequin’s response.

def use_tool(agent_res:dict, dic_tools:dict) -> dict: ## use device if "tool_calls" in agent_res["message"].keys(): for device in agent_res["message"]["tool_calls"]: t_name, t_inputs = device["function"]["name"], device["function"]["arguments"] if f := dic_tools.get(t_name): ### calling device print('🔧 >', f"x1b[1;31m{t_name} -> Inputs: {t_inputs}x1b[0m") ### tool output t_output = f(**tool["function"]["arguments"]) print(t_output) ### ultimate res res = t_output else: print('🤬 >', f"x1b[1;31m{t_name} -> NotFoundx1b[0m") ## don't use tool if agent_res['message']['content'] != '': res = agent_res["message"]["content"] t_name, t_inputs = '', '' return {'res':res, 'tool_used':t_name, 'inputs_used':t_inputs}

As we already did in earlier tutorials, the interplay with the Brokers could be began with a whereas loop. The person is requested to supply a picture that the primary Agent will course of.

dic_tools = {'process_image':process_image, 'search_wikipedia':search_wikipedia} whereas True: ## person enter attempt: q = enter('📷 > give me the picture to research:') besides EOFError: break if q == "stop": break if q.strip() == "": proceed messages_1.append( {"position":"person", "content material":q} ) plt.imshow(pltimg.imread(q)) plt.present()     ## Agent 1 agent_res = ollama.chat(mannequin=llm, instruments=[tool_process_image], messages=messages_1) dic_res = use_tool(agent_res, dic_tools)     res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"] print("👽📷 >", f"x1b[1;30m{res}x1b[0m") messages_1.append( {"role":"assistant", "content":res} )

The first Agent used the Vision LLM Tool and recognized text within the image. Now, the description will be passed to the second Agent, which shall extract some keywords to search Wikipedia.

## Agent 2 messages_2.append( {"role":"system", "content":"-Picture: "+res} ) agent_res = ollama.chat(model=llm, tools=[tool_search_wikipedia], messages=messages_2) dic_res = use_tool(agent_res, dic_tools)     res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]

The second Agent used the Instrument and extracted info from the online, primarily based on the outline supplied by the primary Agent. Now, it could course of the whole lot and provides a ultimate reply.

if tool_used == "search_wikipedia": messages_2.append( {"position":"system", "content material":"-Wikipedia: "+res} ) agent_res = ollama.chat(mannequin=llm, instruments=[], messages=messages_2) dic_res = use_tool(agent_res, dic_tools)         res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"] else: messages_2.append( {"position":"assistant", "content material":res} ) print("👽📖 >", f"x1b[1;30m{res}x1b[0m")

This is literally perfect! Let’s move on to the next example.

Hierarchical Multi-Agent System

Imagine having a squad of Agents that operates with a hierarchical flow, just like a human team, with distinct roles to ensure smooth collaboration and efficient problem-solving. At the top, a manager oversees the overall strategy, talking to the customer (the user), making high-level decisions, and guiding the team toward the goal. Meanwhile, other team members handle operative tasks. Just like humans, Agents can work together and delegate tasks appropriately.

I shall build a tech team of 3 Agents with the objective of querying a SQL database per user’s request. They must work in a hierarchical flow:

The Lead Agent talks to the user and understands the request. Then, it decides which team member is the most appropriate for the task.

The Junior Agent has the job of exploring the db and building SQL queries.

The Senior Agent shall review the SQL code, correct it if necessary, and execute it.

LLMs know how to code by being exposed to a large corpus of both code and natural language text, where they learn patterns, syntax, and semantics of programming languages. The model learns the relationships between different parts of the code by predicting the next token in a sequence. In short, LLMs can generate SQL code but can’t execute it, Agents can.

First of all, I am going to create a database and connect to it, then I shall prepare a series of Tools to execute SQL code.

## Read dataset import pandas as pd dtf = pd.read_csv('http://bit.ly/kaggletrain') dtf.head(3) ## Create dbimport sqlite3 dtf.to_sql(index=False, name="titanic", con=sqlite3.connect("database.db"),             if_exists="replace") ## Connect db from langchain_community.utilities.sql_database import SQLDatabase db = SQLDatabase.from_uri("sqlite:///database.db")

Let’s start with the Junior Agent. LLMs don’t need Tools to generate SQL code, but the Agent doesn’t know the table names and structure. Therefore, we need to provide Tools to investigate the database.

from langchain_community.tools.sql_database.tool import ListSQLDatabaseTool def get_tables() -> str: return ListSQLDatabaseTool(db=db).invoke("") tool_get_tables = {'type':'function', 'function':{ 'name': 'get_tables', 'description': 'Returns the name of the tables in the database.', 'parameters': {'type': 'object', 'required': [], 'properties': {} }}} ## check get_tables()

That may present the accessible tables within the db, and this can print the columns in a desk.

from langchain_community.instruments.sql_database.device import InfoSQLDatabaseTool def get_schema(tables: str) -> str: device = InfoSQLDatabaseTool(db=db) return device.invoke(tables) tool_get_schema = {'sort':'operate', 'operate':{ 'title': 'get_schema', 'description': 'Returns the title of the columns within the desk.', 'parameters': {'sort': 'object', 'required': ['tables'], 'properties': {'tables': {'sort':'str', 'description':'desk title. Instance Enter: table1, table2, table3'}} }}} ## check get_schema(tables='titanic')

Since this Agent should use multiple Instrument which could fail, I’ll write a strong immediate, following the construction of the earlier article.

prompt_junior = ''' [GOAL] You're a knowledge engineer who builds environment friendly SQL queries to get knowledge from the database. [RETURN] You need to return a ultimate SQL question primarily based on person's directions. [WARNINGS] Use your instruments solely as soon as. [CONTEXT] With a purpose to generate the right SQL question, you have to know the title of the desk and the schema. First ALWAYS use the device 'get_tables' to search out the title of the desk. Then, you MUST use the device 'get_schema' to get the columns within the desk. Lastly, primarily based on the knowledge you bought, generate an SQL question to reply person query. '''

Shifting to the Senior Agent. Code checking doesn’t require any specific trick, you may simply use the LLM.

def sql_check(sql: str) -> str: p = f'''Double test if the SQL question is appropriate: {sql}. You MUST simply SQL code with out feedback''' res = ollama.generate(mannequin=llm, immediate=p)["response"] return res.substitute('sql','').substitute('```','').substitute('n',' ').strip() tool_sql_check = {'sort':'operate', 'operate':{ 'title': 'sql_check', 'description': 'Earlier than executing a question, at all times evaluation the SQL question and proper the code if essential', 'parameters': {'sort': 'object', 'required': ['sql'], 'properties': {'sql': {'sort':'str', 'description':'SQL code'}} }}} ## check sql_check(sql='SELECT * FROM titanic TOP 3')

Executing code on the database is a unique story: LLMs can’t do this alone.

from langchain_community.instruments.sql_database.device import QuerySQLDataBaseTool def sql_exec(sql: str) -> str: return QuerySQLDataBaseTool(db=db).invoke(sql) tool_sql_exec = {'sort':'operate', 'operate':{ 'title': 'sql_exec', 'description': 'Execute a SQL question', 'parameters': {'sort': 'object', 'required': ['sql'], 'properties': {'sql': {'sort':'str', 'description':'SQL code'}} }}} ## check sql_exec(sql='SELECT * FROM titanic LIMIT 3')

And naturally, immediate.

prompt_senior = '''[GOAL] You're a senior knowledge engineer who critiques and execute the SQL queries written by others. [RETURN] You need to return knowledge from the database. [WARNINGS] Use your instruments solely as soon as. [CONTEXT] ALWAYS test the SQL code earlier than executing on the database.First ALWAYS use the device 'sql_check' to evaluation the question. The output of this device is the proper SQL question.You MUST use ONLY the proper SQL question while you use the device 'sql_exec'.'''

Lastly, we will create the Lead Agent. It has an important job: invoking different Brokers and telling them what to do. There are numerous methods to realize that, however I discover making a easy Instrument probably the most correct one.

def invoke_agent(agent:str, directions:str) -> str: return agent+" - "+directions if agent in ['junior','senior'] else f"Agent '{agent}' Not Discovered" tool_invoke_agent = {'sort':'operate', 'operate':{ 'title': 'invoke_agent', 'description': 'Invoke one other Agent to give you the results you want.', 'parameters': {'sort': 'object', 'required': ['agent', 'instructions'], 'properties': { 'agent': {'sort':'str', 'description':'the Agent title, certainly one of "junior" or "senior".'}, 'directions': {'sort':'str', 'description':'detailed directions for the Agent.'} } }}} ## check invoke_agent(agent="intern", directions="construct a question")

Describe within the immediate what sort of conduct you’re anticipating. Attempt to be as detailed as doable, for hierarchical Multi-Agent Techniques can get very complicated.

prompt_lead = ''' [GOAL] You're a tech lead. You will have a group with one junior knowledge engineer known as 'junior', and one senior knowledge engineer known as 'senior'. [RETURN] You need to return knowledge from the database primarily based on person's requests. [WARNINGS] You're the just one that talks to the person and will get the requests from the person. The 'junior' knowledge engineer solely builds queries. The 'senior' knowledge engineer checks the queries and execute them. [CONTEXT] First ALWAYS ask the customers what they need. Then, you MUST use the device 'invoke_agent' to cross the directions to the 'junior' for constructing the question. Lastly, you MUST use the device 'invoke_agent' to cross the directions to the 'senior' for retrieving the information from the database. '''

I shall maintain chat historical past separate so every Agent will know solely a particular a part of the entire course of.

dic_tools = {'get_tables':get_tables, 'get_schema':get_schema, 'sql_exec':sql_exec, 'sql_check':sql_check, 'Invoke_agent':invoke_agent} messages_junior = [{"role":"system", "content":prompt_junior}] messages_senior = [{"role":"system", "content":prompt_senior}] messages_lead = [{"role":"system", "content":prompt_lead}]

The whole lot is able to begin the workflow. After the person begins the chat, the primary to reply is the Chief, which is the one one which instantly interacts with the human.

whereas True: ## person enter q = enter('🙂 >') if q == "stop": break messages_lead.append( {"position":"person", "content material":q} ) ## Lead Agent agent_res = ollama.chat(mannequin=llm, messages=messages_lead, instruments=[tool_invoke_agent]) dic_res = use_tool(agent_res, dic_tools) res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"] agent_invoked = res.cut up("-")[0].strip() if len(res.cut up("-")) > 1 else '' directions = res.cut up("-")[1].strip() if len(res.cut up("-")) > 1 else ''     ###-->CODE TO INVOKE OTHER AGENTS HERE", f"x1b[1;30m{res}x1b[0m")    messages_lead.append( {"role":"assistant", "content":res} )

The Lead Agent decided to invoke the Junior Agent giving it some instruction, based on the interaction with the user. Now the Junior Agent shall start working on the query.

## Invoke Junior Agent if agent_invoked == "junior": print("😎 >", f"x1b[1;32mReceived instructions: {instructions}x1b[0m") messages_junior.append( {"role":"user", "content":instructions} ) ### use the tools available_tools = {"get_tables":tool_get_tables, "get_schema":tool_get_schema} context = '' while available_tools: agent_res = ollama.chat(model=llm, messages=messages_junior, tools=[v for v in available_tools.values()]) dic_res = use_tool(agent_res, dic_tools) res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"] if tool_used: available_tools.pop(tool_used) context = context + f"nTool used: {tool_used}. Output: {res}" #->add device utilization context messages_junior.append( {"position":"person", "content material":context} ) ### response agent_res = ollama.chat(mannequin=llm, messages=messages_junior) dic_res = use_tool(agent_res, dic_tools) res = dic_res["res"] print("😎 >", f"x1b[1;32m{res}x1b[0m") messages_junior.append( {"role":"assistant", "content":res} )

The Junior Agent activated all its Tools to explore the database and collected the necessary information to generate some SQL code. Now, it must report back to the Lead.

## update Lead Agent context = "Junior already wrote this query: "+res+ "nNow invoke the Senior to review and execute the code." print("👩‍💼 >", f"x1b[1;30m{context}x1b[0m") messages_lead.append( {"role":"user", "content":context} ) agent_res = ollama.chat(model=llm, messages=messages_lead, tools=[tool_invoke_agent]) dic_res = use_tool(agent_res, dic_tools) res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]                 agent_invoked = res.cut up("-")[0].strip() if len(res.cut up("-")) > 1 else '' directions = res.cut up("-")[1].strip() if len(res.cut up("-")) > 1 else ''

The Lead Agent obtained the output from the Junior and requested the Senior Agent to evaluation and execute the SQL question.

## Invoke Senior Agent if agent_invoked == "senior": print("🧓 >", f"x1b[1;34mReceived instructions: {instructions}x1b[0m") messages_senior.append( {"role":"user", "content":instructions} ) ### use the tools available_tools = {"sql_check":tool_sql_check, "sql_exec":tool_sql_exec} context = '' while available_tools: agent_res = ollama.chat(model=llm, messages=messages_senior, tools=[v for v in available_tools.values()]) dic_res = use_tool(agent_res, dic_tools) res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"] if tool_used: available_tools.pop(tool_used) context = context + f"nTool used: {tool_used}. Output: {res}" #->add device utilization context messages_senior.append( {"position":"person", "content material":context} ) ### response print("🧓 >", f"x1b[1;34m{res}x1b[0m") messages_senior.append( {"role":"assistant", "content":res} )

The Senior Agent executed the query on the db and got an answer. Finally, it can report back to the Lead which will give the final answer to the user.

### update Lead Agent context = "Senior agent returned this output: "+res print("👩‍💼 >", f"x1b[1;30m{context}x1b[0m") messages_lead.append( {"role":"user", "content":context} )

Conclusion

This article has covered the basic steps of creating Multi-Agent Systems from scratch using only Ollama. With these building blocks in place, you are already equipped to start developing your own MAS for different use cases.

Stay tuned for Part 4, where we will dive deeper into more advanced examples.

Full code for this article: GitHub

I hope you enjoyed it! Feel free to contact me for questions and feedback or just to share your interesting projects.

👉 Let’s Connect 👈

All images, unless otherwise noted, are by the author