Building Your Own Agent from Scratch
Today, we will systematically build a customized Agent from scratch. Following our previous exploration of Agent principles, this section will be based on the ReAct paradigm (for background, please refer to previous articles) and gradually implement a complete workflow for a lightweight LLM Agent.
Step 1: Building the Tool Library
In the ReAct paradigm, Agents rely on external tools to execute tasks. The following example is from the implementation in tools.py, which includes a Google search tool function.
{
'name_for_human': 'Google Search',
'name_for_model': 'google_search',
'description_for_model': 'Google Search is a general-purpose search engine that can be used to access the internet, query encyclopedic knowledge, stay updated on current events, and more.',
'parameters': [
{
'name': 'search_query',
'description': 'Search keywords or phrases',
'required': True,
'schema': {'type': 'string'},
}
],
}
Tool definitions need to specify four core elements:
- Human-readable name (name_for_human)
- Model call identifier (name_for_model)
- Function description (description_for_model)
- Parameter specification (parameters)
The implementation of the Google search function is as follows, based on the Serper API (new registrations get 2,500 free calls):
def google_search(search_query: str):
url = "https://google.serper.dev/search"
payload = json.dumps({"q": search_query})
headers = {
'X-API-KEY': 'xxxxxx',
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload).json()
return response['organic'][0]['snippet']
With this, one tool is complete. Next, to align with the theme of code intelligence, let's add another tool for code compilation checking.
First, the code checking function is defined as follows:
{
'name_for_human': 'Code Check',
'name_for_model': 'code_check',
'description_for_model': 'Code Check is a code checking tool that can be used to check code errors and issues.',
'parameters': [
{
'name': 'language',
'description': 'Full language type name',
'required': True,
'schema': {'type': 'string'},
},
{
'name': 'source_code',
'description': 'Source code',
'required': True,
'schema': {'type': 'string'},
}
]
}
The code checking function involves two parameters: language type and source code. The implementation of the check_code function can be found in tree_sitter_parser.py in the repository.
Now that both tools are ready, let's start constructing the Agent workflow.
Step 2: Building the ReAct Workflow
The ReAct paradigm completes tasks by iteratively executing four stages: "Thought-Action-Observation-Final Answer"
- Thought: Agent analyzes context and task objectives
- Action: Agent makes decisions and calls tools
- Observation: Agent parses tool return results
- Final Answer: Agent outputs final conclusions
Based on the above workflow, the corresponding ReAct prompt in the project is as follows, found in agent.py:
Answer the following questions as best you can. You have access to the following tools:
google_search: Call this tool to interact with the Google Search API. What is the Google Search API useful for? Google Search is a general-purpose search engine that can be used to access the internet, query encyclopedic knowledge, stay updated on current events, and more. Parameters: [{'name': 'search_query', 'description': 'Search keywords or phrases', 'required': True, 'schema': {'type': 'string'}}] Format the arguments as a JSON object.
code_check: Call this tool to interact with the Code Check API. What is the Code Check API useful for? Code Check is a code checking tool that can be used to check code errors and issues. Parameters: [{'name': 'language', 'description': 'Full language type name', 'required': True, 'schema': {'type': 'string'}}, {'name': 'source_code', 'description': 'Source code', 'required': True, 'schema': {'type': 'string'}}] Format the arguments as a JSON object.
Use the following format:
Question: the input question you must answer Thought: you should always think about what to do Action: the action to take, should be one of [{tool_names}] Action Input: the input to the action Observation: the result of the action ... (this Thought/Action/Action Input/Observation can be repeated zero or more times) Thought: I now know the final answer Final Answer: the final answer to the original input question
Begin!
The prompt is divided into two parts. First, it describes to the LLM the tools it can call, including Google Search and Code Check.
Then, the prompt describes the ReAct workflow, including Thought, Action, Observation, and Final Answer.
The Agent workflow is established in two stages:
Stage 1: Decision Stage: The model parses the user's question and generates tool call instructions:
The first stage informs the LLM of the overall workflow and the user's question, allowing the LLM to think and decide which tool to use.
This logic is found in the first three lines of the text_completion function in agent.py.
def text_completion(self, text, history=[]):
text = "\nQuestion:" + text
response, his = self.model.chat(text, history, self.system_prompt)
print("first response:\n")
print(response)
print("-"*100)
Stage 2: Execution Stage: Parse model output, call tools, and summarize results:
Then, the program reads the method requested by the LLM and executes it.
# Parse the method requested by the LLM
def parse_latest_plugin_call(self, text):
plugin_name, plugin_args = '', ''
i = text.rfind('\nAction:')
j = text.rfind('\nAction Input:')
k = text.rfind('\nObservation:')
if 0 <= i < j: # If the text has `Action` and `Action input`,
if k < j: # but does not contain `Observation`,
text = text.rstrip() + '\nObservation:' # Add it back.
k = text.rfind('\nObservation:')
plugin_name = text[i + len('\nAction:') : j].strip()
plugin_args = text[j + len('\nAction Input:') : k].strip()
text = text[:k]
return plugin_name, plugin_args, text
# Execute the method requested by the LLM
def call_plugin(self, plugin_name, plugin_args):
plugin_args = json5.loads(plugin_args)
if plugin_name == 'google_search':
return '\nObservation:' + self.tool.google_search(**plugin_args)
elif plugin_name == 'code_check':
return '\nObservation:' + self.tool.code_check(**plugin_args)
From parse_latest_plugin_call, we can see that the current method for parsing LLM output still uses hard matching. If the LLM output deviates slightly, it will cause parsing to fail.
Currently, only two tools are supported. Adding more tools may also cause abnormal model output, so the number of function calls is not necessarily better.
After obtaining results through the call_plugin function, we continue to the second stage of ReAct: summarizing execution results.
function_call_result = self.call_plugin(plugin_name, plugin_args)
response += function_call_result
response, his = self.model.chat(response, his, self.system_prompt)
The observation content is appended by us. During the second stage summary, the LLM can know the previous thought, action, and observation content and summarize on this basis.
The above is the Agent workflow based on the ReAct paradigm.
Step 3: Let's Run a Practical Test
Finally, let's look at a simple test case to see how the Agent performs.
The following test case demonstrates the Agent's ability to fix Python syntax errors:
def hello_world(): print("Hello, World!")
def hello_world2()::::: print("Hello, World2!")
Please fix the errors in this code.
Here I used a simple case. In the hello_world2 function, I intentionally wrote multiple colons incorrectly, expecting the Agent to fix this error.
Stage 1 Output:
Thought: There is a syntax error in the code, so I need to call the code checking tool to fix it. Action: code_check Action Input: {"language": "Python", "source_code": "def hello_world():\n print("Hello, World!")\n\n def hello_world2():::::\n print("Hello, World2!")\n\n "}
After thinking, the LLM decided to call the code checking tool to fix the error. It provided the code_check method call with correct parameters.
Tool Execution Result:
Observation:code compile error at (3, 23) to (3, 27), the error text is: "b'::::'"
Through TreeSitter, we can obtain the compile exception location and error code, so we return this information to the LLM.
Agent Final Output:
Thought: The observed error message indicates there are extra colons when defining the
hello_world2function. Now that I know the cause of the error, I can directly provide the fixed code. Final Answer: The fixed code is as follows: def hello_world(): print("Hello, World!")def hello_world2(): print("Hello, World2!")
In the original code,
def hello_world2():::::had extra colons. Fixing it todef hello_world2():resolves the issue.
With the help of tools, the Agent successfully fixed the errors in the code.
Of course, this example is too simple. Even without tools, the LLM should be able to output fixed code.
Through this simple example, let's understand the Agent execution workflow.
Summary and Outlook
Today, through a simple example, we learned about the Agent execution workflow. We can see that the Agent workflow can be summarized as:
- Problem Parsing: Model analyzes the task and selects tools
- Tool Calling: Executes external tools to get results
- Result Integration: Generates final output based on feedback
The complete code has been open-sourced at TinyCodeBase. Welcome to star the project.
We can see that the currently implemented Agent still has many issues, such as:
- The ReAct workflow can only execute one round, unable to perform multi-turn interactions with tools.
- The code checking tool has limited capabilities, only able to check basic static syntax errors, and cannot detect runtime errors.
Later, we will further optimize the TinyCodeBase project to make it more capable.
Before continuing to enhance capabilities, we are missing an important step: LLM capability evaluation.
Therefore, we will learn about LLM evaluation methods based on the TinyEval project and apply them to the TinyCodeBase project. Stay tuned.