Building & Developing Agents

Tool Use in AI Agents: How Agents Interact with the Real World

Tools are how AI agents escape the text box and act in the world. The LLM reads a tool schema, outputs a structured function call, the runtime executes it, and the result feeds back as an observation. The quality of the schema — not the tool itself — determines whether the agent uses it correctly.

By Nora LinJune 1, 20257 min read

An AI agent without tools is just a chatbot that plans. **Tool use in ai agents** is the mechanism that transforms text-generating models into systems that search the web, run code, call APIs, read files, and take actions in external software. Getting tool design right is what separates agents that reliably do useful work from agents that hallucinate their way through tasks.

Quick answer

Tools in AI agents are callable functions with a name, description, and parameter schema. The LLM reads the schema, decides which tool to call and with what arguments, outputs a structured JSON tool call, and the runtime executes it. The result returns as an observation. The description quality — not the function logic — determines 80% of tool-use accuracy.

What exactly is a tool in an AI agent context?

A tool is a callable function that the agent can invoke during its reasoning loop. The function can do anything: query a database, execute Python code, fetch a web page, send an email, or control a browser. What makes it a 'tool' in the agent sense is that it has a formal schema — a JSON description of its name, purpose, and parameters — that the LLM can read to understand when and how to use it.

OpenAI introduced function calling in June 2023, and it has become the universal mechanism for tool use across all major model providers. OpenAI's function calling documentation shows that models now achieve over 90% correct tool selection accuracy on benchmarks when schemas are well-written — dropping to under 60% with poorly written descriptions. The schema is not metadata; it is the primary interface.

How does an LLM decide which tool to call?

Tool selection happens in the LLM's reasoning pass. The model receives a list of available tool schemas alongside the current messages, reasons about what action would best advance the goal, and outputs either a plain text response or a structured tool call — a JSON object with the tool name and arguments.

The decision is driven entirely by the tool descriptions. The model performs a semantic match between 'what I need to do next' and 'what each tool says it does.' This is why:

Vague tool names like `helper()` cause random selection behavior.
Overlapping descriptions (two tools that sound similar) cause the model to pick unpredictably between them.
Missing 'when NOT to use this' guidance causes the model to invoke tools in wrong situations.
Well-scoped, specific descriptions produce near-deterministic tool selection for clear use cases.

The tool call cycle: LLM reads schemas → selects tool → outputs JSON call → runtime executes → result returned as observation.

What are the main categories of agent tools?

Virtually any API or function can be a tool. In practice, tools cluster into six categories:

Information retrieval

Web search — Tavily, Serper, Bing Search API. Returns current information the model wasn't trained on.
Vector store lookup — semantic search over a private knowledge base.
Database query — SQL queries against structured data stores.
Document retrieval — read a specific file, PDF, or code repository.

Code and computation

Code interpreter — runs Python (or other languages) and returns stdout/stderr. OpenAI's sandbox and E2B are popular implementations.
Calculator — isolated math evaluation to prevent hallucinated arithmetic.
Data analysis — pandas operations on uploaded CSVs.

External APIs and services

Calendar and email — Google Calendar, Gmail, Outlook APIs.
CRM and ticketing — Salesforce, HubSpot, Jira, Linear.
Payment and e-commerce — Stripe, Shopify APIs.
Communication — Slack, Teams, Twilio.

Browser and UI control

Playwright/Puppeteer — navigate, click, fill forms, extract DOM content.
Screenshot and visual analysis — capture and analyze browser state.
Computer use — full desktop control (Anthropic's computer use API).

How should you handle tool errors and retries?

Tool errors are inevitable in any production agent. APIs time out, search results are empty, code throws runtime exceptions. The worst thing you can do is swallow the error — the agent will hallucinate a result for the step that failed and proceed on false premises.

The correct pattern is to return the error as an observation:

The tool call fails with an exception or a non-200 HTTP status.
The tool wrapper catches the error and returns a structured error string: `ERROR: HTTP 429 rate limited. Retry after 60 seconds.`
This error string is appended to messages as the tool result observation.
The LLM reads the observation and decides: wait and retry, switch to an alternative tool, or escalate to the user.

For transient errors (rate limits, timeouts), add exponential backoff at the tool wrapper level. For permanent errors (resource not found, permission denied), let the LLM decide the recovery strategy — it has the context to know whether this is recoverable.

What are the security risks of tool use in AI agents?

Tool use is where AI agents go from 'potentially incorrect' to 'potentially harmful.' A misconfigured agent with write access to a production database or the ability to send emails on your behalf can cause real damage. Key security considerations:

Principle of least privilege — give tools only the permissions they need. A research agent doesn't need write access to anything.
Prompt injection via tool results — malicious content in a web page or API response can instruct the agent to take unintended actions. Sanitize tool outputs before feeding them back to the model.
Irreversible actions — file deletion, API calls that charge money, or emails to external parties should require explicit human confirmation before execution.
Tool call validation — validate that tool arguments match the schema before executing. An agent that constructs a SQL query should have that query reviewed by a parameter-binding layer, not executed as raw string interpolation.

For a full treatment of the security surface introduced by agentic AI, see AI Agent Security Risks. For the foundational context of what agents are before adding tools, see What Is an AI Agent.

What is the difference between single-tool and multi-tool agents?

A single-tool agent has one function it can call. This is fine for narrow, well-defined tasks — a code review agent that only needs a code interpreter, or a summarization agent that only needs a document retriever.

A multi-tool agent can select from several tools and chain them together in novel sequences the developer never explicitly programmed. This is more powerful but also more unpredictable. The LLM invents sequences like 'search → extract entity → query database with entity → format result' — which is impressive when it works and cryptic to debug when it doesn't.

For multi-tool agents, tool naming conventions become critical. Prefix related tools consistently (`web_search`, `web_scrape` rather than `search` and `get_page`) and add explicit guidance in the system prompt about which tool categories to prefer for which task types.

Frequently asked questions

What is function calling in AI agents?

Function calling is the mechanism that lets an LLM output a structured request to execute a specific function with specific arguments, rather than generating free text. The model produces a JSON object like `{"name": "web_search", "arguments": {"query": "LangGraph tutorial"}}`. The runtime intercepts this output, executes the actual function, and returns the result to the model as the next observation.

How do I write a good tool description for an AI agent?

Write it like documentation for a junior engineer: state what the tool does, what it returns, when to use it, and when NOT to use it. Include example inputs. Avoid jargon the model might misinterpret. A description like 'Searches the web for current information not in training data. Returns top 3 result snippets. Use when you need live information. Do not use for mathematical calculations.' is far more effective than 'Search tool.'

Can AI agents use tools in parallel?

Yes, with a parallel tool execution pattern. Instead of calling tools sequentially (one at a time), the agent outputs multiple tool calls simultaneously, the runtime executes them in parallel, and all results are injected as observations before the next reasoning step. LangGraph supports this via fan-out edges. Parallel tool use dramatically reduces latency for tasks that require independent information gathering.

What happens when a tool returns an empty result?

An empty result should be returned as an explicit observation: 'Search returned 0 results for query X.' The LLM can then reason about why — maybe the query was too specific — and try a reformulated query, switch to a different tool, or ask the user for clarification. Never let an empty result silently stop the agent without informing the model.

How many tools should a single AI agent have?

As few as possible while still being able to complete the target task. Tool sets with 10+ tools increase the probability of wrong tool selection and slow down the reasoning step. A focused agent with 3-5 high-quality tools consistently outperforms a general agent with 20 mediocre ones. If you need broad capability, consider a multi-agent architecture where specialist sub-agents each have a small, focused tool set.

tools function calling security error handling building agents

Written by

Nora Lin

Senior AI Research Analyst & Technical Reviewer

Nora researches AI agent capabilities, safety, and practical deployment patterns. She reviews every guide on agent2agent to ensure technical accuracy and current best practices.

This article is for educational purposes only. It does not constitute professional software, legal, or financial advice. Read our full disclaimer.