Tools

In ORS, tools are the actions that agents can take. Every interaction with an environment happens through tool calls - this is the fundamental principle of ORS.

Core Principle: Actions are Tools

The only way agents interact with environments is by calling tools.

This design choice: -Leverages existing function calling support from LLM providers -Provides a clear, structured interface -Makes agent actions explicit and traceable -Enables type-safe interactions with JSON Schema

What is a Tool?

A tool is a function that:

Has a name and description
Optionally defines input parameters (via JSON Schema)
Returns a ToolOutput with content, reward, and finished flag

Example tools:

submit -Submit an answer to a problem
bash -Execute a bash command
read_file -Read a file’s contents
web_search -Search the web
python -Execute Python code

Tool Specification

Tools are advertised via the GET /{env_name}/tools endpoint:

{
  "tools": [
    {
      "name": "submit",
      "description": "Submit your answer to the current math problem",
      "input_schema": {
        "type": "object",
        "properties": {
          "answer": {
            "type": "number",
            "description": "Your numeric answer"
          }
        },
        "required": ["answer"]
      }
    }
  ]
}

Tool Spec Fields

name (string, required): -Tool identifier used in tool calls -Should be descriptive (e.g., bash, not b) -Convention: lowercase with underscores description (string, required): -Human-readable explanation of what the tool does -Used by LLMs to decide when to call the tool -Should be clear and specific input_schema (object, optional): -JSON Schema defining tool parameters -If omitted, tool takes no parameters -Enables validation and type checking

JSON Schema for Parameters

The input_schema follows JSON Schema specification:

{
  "type": "object",
  "properties": {
    "command": {
      "type": "string",
      "description": "The bash command to execute",
      "examples": ["ls -la", "cat file.txt"]
    },
    "timeout": {
      "type": "number",
      "description": "Command timeout in seconds",
      "default": 30
    }
  },
  "required": ["command"]
}

Supported types:

string, number, boolean
object (nested parameters)
array (lists of values)
null

Schema features:

required -Mandatory fields
default -Default values
enum -Allowed values
description -Field documentation
examples -Example values

Tool Output

Every tool call returns a ToolOutput:

interface ToolOutput {
  blocks: Blocks  // Content (text/images)
  reward?: number  // RL feedback signal
  finished: boolean  // Episode termination
  metadata?: JSONObject  // Optional extra data
}

Example Outputs

Successful completion:

{
  "blocks": [
    {"text": "Correct! The answer is 4.", "detail": null, "type": "text"}
  ],
  "metadata": null,
  "reward": 1.0,
  "finished": true
}

Intermediate step:

{
  "blocks": [
    {"text": "total 48\ndrwxr-xr-x  8 user  staff  256 Jan  1 12:00 .", "detail": null, "type": "text"}
  ],
  "metadata": null,
  "reward": 0.0,
  "finished": false
}

Error:

{
  "blocks": [
    {"text": "Error: File not found", "detail": null, "type": "text"}
  ],
  "metadata": null,
  "reward": -0.1,
  "finished": true
}

Key Fields

blocks: The content returned by the tool -Always an array (even for single text output) -Can be text, images, or both -This is what the agent observes as next state reward: Feedback for RL training -Optional (can be null) -Typically 0.0 for steps, 1.0 for success, 0.0/-1.0 for failure -See Rewards for design patterns finished: Episode termination signal -Required boolean field -When true, episode is complete -Agent should stop calling tools and cleanup session metadata: Optional structured data -Not shown to agent (in typical RL setup) -Used for logging, debugging, analysis -Can include execution time, resource usage, etc.

Tool Design Patterns

Pattern 1: Parameterless Tools

Tools that don’t need input:

{
  "name": "get_hint",
  "description": "Get a hint for the current problem",
  "input_schema": null
}

Called as:

{"name": "get_hint", "input": {}}

Pattern 2: Simple Parameter Tools

Tools with basic parameters:

{
  "name": "submit",
  "description": "Submit an answer",
  "input_schema": {
    "type": "object",
    "properties": {
      "answer": {"type": "number"}
    },
    "required": ["answer"]
  }
}

Pattern 3: Complex Parameter Tools

Tools with rich parameters:

{
  "name": "web_search",
  "description": "Search the web",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query"
      },
      "max_results": {
        "type": "number",
        "description": "Maximum results",
        "default": 10,
        "minimum": 1,
        "maximum": 100
      },
      "filters": {
        "type": "object",
        "properties": {
          "date_range": {"type": "string"},
          "domain": {"type": "string"}
        }
      }
    },
    "required": ["query"]
  }
}

Pattern 4: Enum Parameters

Tools with constrained choices:

{
  "name": "change_difficulty",
  "description": "Change problem difficulty",
  "input_schema": {
    "type": "object",
    "properties": {
      "difficulty": {
        "type": "string",
        "enum": ["easy", "medium", "hard"],
        "description": "Target difficulty level"
      }
    },
    "required": ["difficulty"]
  }
}

Tool Categories

Observation Tools

Tools that let agents observe environment state:

bash -Execute read-only commands
read_file -Read file contents
list_files -List directory contents
get_status -Check task status

Characteristics: -Usually finished: false -Reward typically 0.0 -Return information in blocks

Action Tools

Tools that modify environment state:

bash -Execute commands with side effects
write_file -Modify files
submit -Submit solution
reset -Reset environment

Characteristics: -May set finished: true (if task completes) -May have non-zero reward -Represent meaningful progress

Termination Tools

Tools that end the episode:

submit -Submit final answer
give_up -Abandon task
end_episode -Explicitly end

Characteristics: -Always finished: true -Have final reward (success/failure) -No tool calls after this Tools can return images and text:

{
  "blocks": [
    {"text": "Here's a visualization of the solution:", "detail": null, "type": "text"},
    {
      "data": "iVBORw0KGgoAAAANSUhEUgA...",
      "mimeType": "image/png",
      "detail": null,
      "type": "image"
    }
  ],
  "metadata": null,
  "reward": 0.0,
  "finished": false
}

Use cases: -Plotting graphs from code execution -Screenshots from web navigation -Diagrams explaining solutions -Visual feedback for agents

Tool Calling Flow

┌─────────────┐
│   Agent     │
└──────┬──────┘
       │ 1. Observes state (prompt + previous tool outputs)
       │ 2. Selects tool and parameters
       │
       ▼
┌─────────────────────────────────────┐
│  POST /{env_name}/call              │
│  {"name": "bash",                   │
│   "input": {"command": "ls"}}       │
└──────────┬──────────────────────────┘
           │
           ▼
┌─────────────────────────────────────┐
│  Environment executes tool          │
│  -Validates input                  │
│  -Runs tool logic                  │
│  -Generates ToolOutput             │
└──────────┬──────────────────────────┘
           │
           ▼
┌─────────────────────────────────────┐
│  Returns ToolOutput                 │
│  {blocks, reward, finished}         │
└──────────┬──────────────────────────┘
           │
           ▼
┌─────────────────────────────────────┐
│  Agent processes result             │
│  -Updates state representation     │
│  -Checks if finished               │
│  -Selects next action (if not done)│
└─────────────────────────────────────┘

Best Practices

1. Clear Tool Names

# Good - descriptive
"submit_answer"
"read_file"
"execute_python"

# No Bad - unclear
"do_thing"
"action1"
"tool"

2. Comprehensive Descriptions

# Good - specific and helpful
"Submit your final answer to the current math problem. The problem will be graded and you'll receive a reward."

# No Bad - vague
"Submit answer"

3. Validate Tool Inputs

def submit(self, params: SubmitParams) -> ToolOutput:
    if not isinstance(params.answer, (int, float)):
        return ToolOutput(
            blocks=[TextBlock(text="Error: Answer must be a number")],
            reward=-0.1,
            finished=True
        )

    # ... check answer logic

4. Provide Informative Outputs

# Good - explains what happened
ToolOutput(
    blocks=[TextBlock(text="Incorrect. Your answer was 5, but the correct answer is 7.")],
    reward=0.0,
    finished=True
)

# No Bad - no context
ToolOutput(
    blocks=[TextBlock(text="Wrong")],
    reward=0.0,
    finished=True
)

5. Use `finished` Correctly

# Good - clear termination
if answer_is_correct:
    return ToolOutput(..., finished=True)
else:
    return ToolOutput(..., finished=True)  # Task failed

# No Bad - ambiguous
return ToolOutput(..., finished=False)  # Never terminates?

Tool Security

Input Validation

Always validate tool inputs: -Check types match schema -Validate ranges and constraints -Sanitize strings (prevent injection attacks) -Reject malformed inputs

Command Injection

Be careful with bash tools:

# No Dangerous - command injection
def bash(self, params):
    command = params.command
    os.system(command)  # User can inject: "ls; rm -rf /"

# Safe - use subprocess with shell=False
import subprocess

def bash(self, params):
    # Validate command
    if any(dangerous in params.command for dangerous in [';', '&&', '|']):
        return ToolOutput(
            blocks=[TextBlock(text="Error: Command contains forbidden characters")],
            finished=True
        )

    # Execute safely
    result = subprocess.run(
        params.command.split(),
        shell=False,
        capture_output=True,
        timeout=30
    )
    return ToolOutput(...)

Resource Limits

Prevent resource exhaustion: -Set timeouts on tool execution -Limit output size -Rate limit tool calls -Sandbox execution environments

Next Steps

Tasks & Splits

Organize problems for training and evaluation

Rewards

Design reward signals for RL

Implementing a Server

Build an ORS server with custom tools

HTTP API

See how tools are listed and called

Key Takeaway: Tools are the agent’s interface to the environment. Design them carefully with clear names, comprehensive descriptions, proper validation, and informative outputs. The quality of your tools directly impacts agent performance.

Getting Started

Specification

Core Concepts

Implementation Guides

Comparison

Tools

Tools

Core Principle: Actions are Tools

What is a Tool?

Tool Specification

Tool Spec Fields

JSON Schema for Parameters

Tool Output

Example Outputs

Key Fields

Tool Design Patterns

Pattern 1: Parameterless Tools

Pattern 2: Simple Parameter Tools

Pattern 3: Complex Parameter Tools

Pattern 4: Enum Parameters

Tool Categories

Observation Tools

Action Tools

Termination Tools

Tool Calling Flow

Best Practices

1. Clear Tool Names

2. Comprehensive Descriptions

3. Validate Tool Inputs

4. Provide Informative Outputs

5. Use `finished` Correctly

Tool Security

Input Validation

Command Injection

Resource Limits

Next Steps

Tasks & Splits

Rewards

Implementing a Server

HTTP API

Getting Started

Specification

Core Concepts

Implementation Guides

Comparison

​Tools

​Core Principle: Actions are Tools

​What is a Tool?

​Tool Specification

​Tool Spec Fields

​JSON Schema for Parameters

​Tool Output

​Example Outputs

​Key Fields

​Tool Design Patterns

​Pattern 1: Parameterless Tools

​Pattern 2: Simple Parameter Tools

​Pattern 3: Complex Parameter Tools

​Pattern 4: Enum Parameters

​Tool Categories

​Observation Tools

​Action Tools

​Termination Tools

​Multi-Modal Tools

​Tool Calling Flow

​Best Practices

​1. Clear Tool Names

​2. Comprehensive Descriptions

​3. Validate Tool Inputs

​4. Provide Informative Outputs

​5. Use finished Correctly

​Tool Security

​Input Validation

​Command Injection

​Resource Limits

​Next Steps

Tasks & Splits

Rewards

Implementing a Server

HTTP API

Tools

Core Principle: Actions are Tools

What is a Tool?

Tool Specification

Tool Spec Fields

JSON Schema for Parameters

Tool Output

Example Outputs

Key Fields

Tool Design Patterns

Pattern 1: Parameterless Tools

Pattern 2: Simple Parameter Tools

Pattern 3: Complex Parameter Tools

Pattern 4: Enum Parameters

Tool Categories

Observation Tools

Action Tools

Termination Tools

Multi-Modal Tools

Tool Calling Flow

Best Practices

1. Clear Tool Names

2. Comprehensive Descriptions

3. Validate Tool Inputs

4. Provide Informative Outputs

5. Use `finished` Correctly

Tool Security

Input Validation

Command Injection

Resource Limits

Next Steps