Build Your First AI Agent in 5 Easy Steps (100% local)

Creating AI agents with Crewai and using Ollama to run them 100% locally in 5 very easy steps!

Jun 03, 2024

AI agents and RAG (read further) are the hot topics of the 2024 AI community. A general and hands-on understanding of agents and how they work is necessary if you work in this field. In this article, we will build a simple crew of AI agents that read a scientific paper (the world-famous “Attention is All You Need”), write a blog post about it, and select a title for it. I will use Ollama to run my LLM locally.

AI Chatbot vs. AI Agent

AI chatbots, such as ChatGPT, are designed to communicate with humans through text or speech. Their primary function is to answer questions, provide insights, and use natural language processing (NLP) to understand and respond to human language.

AI chatbots speak and AI agents act.

A general outline of how AI agent works.

On the other hand, an AI agent is a broader concept within artificial intelligence. Agents are not just limited to conversations; they have a clear goal to satisfy, and they use tools and reasoning to plan various methods to reach that goal.

Making Crews of Agents with CrewAI

It’s hard to work with LLMs and not know Langchain. I have used it in a previous article to play Flappy Bird in ChatGPT with Prompt Engineering. But coding your first AI agents with Langchain could turn a bit frustrating because of its learning curve. CrewAI makes it Simple!

Built on top of Langchain, CrewAI takes a modular approach to building agents, making it much more intuitive and faster. If you want to start your way with agents, start with crewai.

Let’s Code

inspired by Foad Kesheh’s Optimizing Everyday Tasks with CrewAI we’re going to create agents that read a scientific paper and write a blog post + title for it. To do so, we must first…

1. Setup LLM Locally

For various reasons (cost, open-weight models, security, etc.) you might want to run your AI locally, using your resources. Ollama is the go-to tool for this.

To run LLM locally with Ollama, follow the instructions, download the software, and install it. Now open a cmd and run:

ollama serve

Now, Ollama acts as a server, and you need an LLM to run on top of this server. To do so you have to pull the LLM from the internet. I use openhermes and I will explain my choice later on (it is important).

ollama pull openhermes

Now the rest is code. First, import the necessary packages.

from dotenv import load_dotenv
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
from langchain.tools import tool
from PyPDF2 import PdfReader
import re

We will also need to connect to the Ollama server we are running locally. Take note of the cmd you have ollama serve running on and use the URL it gives you for the base_url here.

model = ChatOpenAI(
    model="openhermes",
    base_url="http://localhost:11434/v1"
)

2. Tools

Agents might need to use specific tools for a task. A lot of these tools come with crewai but you can also create your custom tools. Here, we use a tool to read a pdf and return its contents.

# Tool for loading and reading a PDF locally
@tool
def fetch_pdf_content(pdf_path: str):
    """
    Reads a local PDF and returns the content
    """
    with open(pdf_path, 'rb') as f:
        pdf = PdfReader(f)
        text = '\n'.join(page.extract_text() for page in pdf.pages if page.extract_text())

    processed_text = re.sub(r'\s+', ' ', text).strip()
    return processed_text

3. Agents

Now, we must define three agents:

🤖 Reads paper
🤖 Write a post about it
🤖 Come up with a title

Each agent needs to have a specific role, a goal, and also having a backstory helps the agent to play the role you want it to play.

pdf_reader = Agent(
    role='PDF Content Extractor',
    goal='Extract and preprocess text from a PDF located in current local directory',
    backstory='Specializes in handling and interpreting PDF documents',
    verbose=True,
    tools=[fetch_pdf_content],
    allow_delegation=False,
    llm=model
)

article_writer = Agent(
    role='Article Creator',
    goal='Write a concise and engaging article',
    backstory='Expert in creating informative and engaging articles',
    verbose=True,
    allow_delegation=False,
    llm=model
)

title_creator = Agent(
    role='Title Generator',
    goal='Generate a compelling title for the article',
    backstory='Skilled in crafting engaging and relevant titles',
    verbose=True,
    allow_delegation=False,
    llm=model
)

4. Tasks

Each task is something you want your agents to perform. In our case, we define three tasks and assign each of them to an agent. Each task needs:

📝 description: A concise description of what needs to be done
👤 agent: Which agent is responsible for it
🏁 expected_output: The output we would like to receive

def pdf_reading_task(pdf):
    return Task(
        description=f"Read and preprocess the PDF at this local path: {pdf_local_relative_path}",
        agent=pdf_reader,
        expected_output="Extracted and preprocessed text from a PDF",
    )

task_article_drafting = Task(
    description="Create a concise article with 8-10 paragraphs based on the extracted PDF content.",
    agent=article_writer,
    expected_output="8-10 paragraphs describing the key points of the PDF",
)

task_title_generation = Task(
    description="Generate an engaging and relevant title for the article.",
    agent=title_creator,
    expected_output="A Title of About 5-7 Words"
)

5. Make the Crew and Kickoff! 🚀

As the name of crewai suggests, the goal is to have multiple agents perform multiple tasks. For this, we just need to initiate the Crewclass and pass in all the tasks and agents we have defined earlier. By calling kickoff() method on the crew object, the magic 🪄 starts to happen.

crew = Crew(
    agents=[pdf_reader, article_writer, title_creator],
    tasks=[pdf_reading_task(pdf_local_relative_path), 
    task_article_drafting, 
    task_title_generation],
    verbose=2
)

# Let's start!
result = crew.kickoff()

After some time, the result of our agents’ hard work is a blog post and a title that you can see here (actually, the post is 2–3 times longer):

Transformers Unleashed: Harnessing Attention for Natural Language Processing
=====================================================
In "Attention Is All You Need" by Ashish Vaswani et al., a transformer
architecture that utilizes an attention mechanism is proposed as the primary
means for information interaction among its components. This attention mechanism
allows the model to focus on relevant input elements and capture long-range
dependencies in the input sequence, resulting in improved performance over
traditional recurrent neural networks (RNNs)....

Believe it or not, making AI agents that perform small and specific tasks is just as easy. Fortunately, a lot of the processes in AI today are highly modular and come with many layers of abstraction, meaning that you only need a few lines of code. The sophisticated libraries will take care of the rest.

What Model to Use for AI Agents?

Some models are fine-tuned for agentic purposes. Meaning they are supposed to have more capabilities to reason and come up with strategies for performing tasks. You want models that don’t get confused with their tools, don’t forget which step of the way is next, and don’t fall into infinite loops of making the same mistakes over and over again.

Various models of Ollama. (source: Ollama)

I have tested multiple models for this use case and here is my experience with each one:

gpt-3.5-turbo: the free version of OpenAI’s chatgpt was able to pull off the task like a piece of cake. It was fast and didn’t make mistakes along the way.
llama-3–8b and llama-3-70b-instruct: Meta’s llama-3 was my first choice to go, and I couldn’t be more disappointed. The lightweight version of llama3 with 8B parameters fell into an infinite loop where it didn’t know what variable to pass to thefetch-pdf-content tool. I think the problem was that when it attempted to pass the name of the pdf to the function, it missed the last " of the pdf name and throw an exception. Surprisingly, the fine-tuned ~45GB version of Llama3 with 70B parameters didn’t make any difference and would fall into the very same trap.
openhermes: After being very frustrated with doing this locally, I came across this fine-tuned 7B version of Mistral that is very lightweight and easily performs agentic tasks.

All the code for this project is published in my Lightning AI ⚡studio, no special setups are needed. The best thing about Lightning AI ⚡ is that you have your libraries and environments ready to use in a matter of seconds, and you also have free computation to use for running these models. It’s as easy as using VSCode.

🌟 Join +1000 people learning about Python🐍, ML/MLOps/AI🤖, Data Science📈, and LLM 🗯

Also, you are welcome to follow me on Medium and check out my X/Twitter, where I keep you updated Daily. 😉

Thanks for reading,

— Hesam

Ajay Kumar

Jul 9

So while creating OpenAI API, do we need to buy the premium subscription? it seems OpenAI API is paid right? or is there is any free way to do that? attach link maybe.

Expand full comment

3 replies by Hesam Sheikh and others