Skip to main content

Tools

In ORS, tools are the actions that agents can take. Every interaction with an environment happens through tool calls - this is the fundamental principle of ORS.

Core Principle: Actions are Tools

The only way agents interact with environments is by calling tools.
This design choice: -Leverages existing function calling support from LLM providers -Provides a clear, structured interface -Makes agent actions explicit and traceable -Enables type-safe interactions with JSON Schema

What is a Tool?

A tool is a function that:
  1. Has a name and description
  2. Optionally defines input parameters (via JSON Schema)
  3. Returns a ToolOutput with content, reward, and finished flag
Example tools:
  • submit -Submit an answer to a problem
  • bash -Execute a bash command
  • read_file -Read a file’s contents
  • web_search -Search the web
  • python -Execute Python code

Tool Specification

Tools are advertised via the GET /{env_name}/tools endpoint:
{
  "tools": [
    {
      "name": "submit",
      "description": "Submit your answer to the current math problem",
      "input_schema": {
        "type": "object",
        "properties": {
          "answer": {
            "type": "number",
            "description": "Your numeric answer"
          }
        },
        "required": ["answer"]
      }
    }
  ]
}

Tool Spec Fields

name (string, required): -Tool identifier used in tool calls -Should be descriptive (e.g., bash, not b) -Convention: lowercase with underscores description (string, required): -Human-readable explanation of what the tool does -Used by LLMs to decide when to call the tool -Should be clear and specific input_schema (object, optional): -JSON Schema defining tool parameters -If omitted, tool takes no parameters -Enables validation and type checking

JSON Schema for Parameters

The input_schema follows JSON Schema specification:
{
  "type": "object",
  "properties": {
    "command": {
      "type": "string",
      "description": "The bash command to execute",
      "examples": ["ls -la", "cat file.txt"]
    },
    "timeout": {
      "type": "number",
      "description": "Command timeout in seconds",
      "default": 30
    }
  },
  "required": ["command"]
}
Supported types:
  • string, number, boolean
  • object (nested parameters)
  • array (lists of values)
  • null
Schema features:
  • required -Mandatory fields
  • default -Default values
  • enum -Allowed values
  • description -Field documentation
  • examples -Example values

Tool Output

Every tool call returns a ToolOutput:
interface ToolOutput {
  blocks: Blocks  // Content (text/images)
  reward?: number  // RL feedback signal
  finished: boolean  // Episode termination
  metadata?: JSONObject  // Optional extra data
}

Example Outputs

Successful completion:
{
  "blocks": [
    {"text": "Correct! The answer is 4.", "detail": null, "type": "text"}
  ],
  "metadata": null,
  "reward": 1.0,
  "finished": true
}
Intermediate step:
{
  "blocks": [
    {"text": "total 48\ndrwxr-xr-x  8 user  staff  256 Jan  1 12:00 .", "detail": null, "type": "text"}
  ],
  "metadata": null,
  "reward": 0.0,
  "finished": false
}
Error:
{
  "blocks": [
    {"text": "Error: File not found", "detail": null, "type": "text"}
  ],
  "metadata": null,
  "reward": -0.1,
  "finished": true
}

Key Fields

blocks: The content returned by the tool -Always an array (even for single text output) -Can be text, images, or both -This is what the agent observes as next state reward: Feedback for RL training -Optional (can be null) -Typically 0.0 for steps, 1.0 for success, 0.0/-1.0 for failure -See Rewards for design patterns finished: Episode termination signal -Required boolean field -When true, episode is complete -Agent should stop calling tools and cleanup session metadata: Optional structured data -Not shown to agent (in typical RL setup) -Used for logging, debugging, analysis -Can include execution time, resource usage, etc.

Tool Design Patterns

Pattern 1: Parameterless Tools

Tools that don’t need input:
{
  "name": "get_hint",
  "description": "Get a hint for the current problem",
  "input_schema": null
}
Called as:
{"name": "get_hint", "input": {}}

Pattern 2: Simple Parameter Tools

Tools with basic parameters:
{
  "name": "submit",
  "description": "Submit an answer",
  "input_schema": {
    "type": "object",
    "properties": {
      "answer": {"type": "number"}
    },
    "required": ["answer"]
  }
}

Pattern 3: Complex Parameter Tools

Tools with rich parameters:
{
  "name": "web_search",
  "description": "Search the web",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query"
      },
      "max_results": {
        "type": "number",
        "description": "Maximum results",
        "default": 10,
        "minimum": 1,
        "maximum": 100
      },
      "filters": {
        "type": "object",
        "properties": {
          "date_range": {"type": "string"},
          "domain": {"type": "string"}
        }
      }
    },
    "required": ["query"]
  }
}

Pattern 4: Enum Parameters

Tools with constrained choices:
{
  "name": "change_difficulty",
  "description": "Change problem difficulty",
  "input_schema": {
    "type": "object",
    "properties": {
      "difficulty": {
        "type": "string",
        "enum": ["easy", "medium", "hard"],
        "description": "Target difficulty level"
      }
    },
    "required": ["difficulty"]
  }
}

Tool Categories

Observation Tools

Tools that let agents observe environment state:
  • bash -Execute read-only commands
  • read_file -Read file contents
  • list_files -List directory contents
  • get_status -Check task status
Characteristics: -Usually finished: false -Reward typically 0.0 -Return information in blocks

Action Tools

Tools that modify environment state:
  • bash -Execute commands with side effects
  • write_file -Modify files
  • submit -Submit solution
  • reset -Reset environment
Characteristics: -May set finished: true (if task completes) -May have non-zero reward -Represent meaningful progress

Termination Tools

Tools that end the episode:
  • submit -Submit final answer
  • give_up -Abandon task
  • end_episode -Explicitly end
Characteristics: -Always finished: true -Have final reward (success/failure) -No tool calls after this

Multi-Modal Tools

Tools can return images and text:
{
  "blocks": [
    {"text": "Here's a visualization of the solution:", "detail": null, "type": "text"},
    {
      "data": "iVBORw0KGgoAAAANSUhEUgA...",
      "mimeType": "image/png",
      "detail": null,
      "type": "image"
    }
  ],
  "metadata": null,
  "reward": 0.0,
  "finished": false
}
Use cases: -Plotting graphs from code execution -Screenshots from web navigation -Diagrams explaining solutions -Visual feedback for agents

Tool Calling Flow

┌─────────────┐
│   Agent     │
└──────┬──────┘
       │ 1. Observes state (prompt + previous tool outputs)
       │ 2. Selects tool and parameters


┌─────────────────────────────────────┐
│  POST /{env_name}/call              │
│  {"name": "bash",                   │
│   "input": {"command": "ls"}}       │
└──────────┬──────────────────────────┘


┌─────────────────────────────────────┐
│  Environment executes tool          │
│  -Validates input                  │
│  -Runs tool logic                  │
│  -Generates ToolOutput             │
└──────────┬──────────────────────────┘


┌─────────────────────────────────────┐
│  Returns ToolOutput                 │
│  {blocks, reward, finished}         │
└──────────┬──────────────────────────┘


┌─────────────────────────────────────┐
│  Agent processes result             │
│  -Updates state representation     │
│  -Checks if finished               │
│  -Selects next action (if not done)│
└─────────────────────────────────────┘

Best Practices

1. Clear Tool Names

# Good - descriptive
"submit_answer"
"read_file"
"execute_python"

# No Bad - unclear
"do_thing"
"action1"
"tool"

2. Comprehensive Descriptions

# Good - specific and helpful
"Submit your final answer to the current math problem. The problem will be graded and you'll receive a reward."

# No Bad - vague
"Submit answer"

3. Validate Tool Inputs

def submit(self, params: SubmitParams) -> ToolOutput:
    if not isinstance(params.answer, (int, float)):
        return ToolOutput(
            blocks=[TextBlock(text="Error: Answer must be a number")],
            reward=-0.1,
            finished=True
        )

    # ... check answer logic

4. Provide Informative Outputs

# Good - explains what happened
ToolOutput(
    blocks=[TextBlock(text="Incorrect. Your answer was 5, but the correct answer is 7.")],
    reward=0.0,
    finished=True
)

# No Bad - no context
ToolOutput(
    blocks=[TextBlock(text="Wrong")],
    reward=0.0,
    finished=True
)

5. Use finished Correctly

# Good - clear termination
if answer_is_correct:
    return ToolOutput(..., finished=True)
else:
    return ToolOutput(..., finished=True)  # Task failed

# No Bad - ambiguous
return ToolOutput(..., finished=False)  # Never terminates?

Tool Security

Input Validation

Always validate tool inputs: -Check types match schema -Validate ranges and constraints -Sanitize strings (prevent injection attacks) -Reject malformed inputs

Command Injection

Be careful with bash tools:
# No Dangerous - command injection
def bash(self, params):
    command = params.command
    os.system(command)  # User can inject: "ls; rm -rf /"

# Safe - use subprocess with shell=False
import subprocess

def bash(self, params):
    # Validate command
    if any(dangerous in params.command for dangerous in [';', '&&', '|']):
        return ToolOutput(
            blocks=[TextBlock(text="Error: Command contains forbidden characters")],
            finished=True
        )

    # Execute safely
    result = subprocess.run(
        params.command.split(),
        shell=False,
        capture_output=True,
        timeout=30
    )
    return ToolOutput(...)

Resource Limits

Prevent resource exhaustion: -Set timeouts on tool execution -Limit output size -Rate limit tool calls -Sandbox execution environments

Next Steps


Key Takeaway: Tools are the agent’s interface to the environment. Design them carefully with clear names, comprehensive descriptions, proper validation, and informative outputs. The quality of your tools directly impacts agent performance.