Skip to main content
In ORS, tools are the actions that agents can take. Every interaction with an environment happens through tool calls. This is the fundamental principle of ORS.

Core Principle: Actions are Tools

The only way agents interact with environments is by calling tools.
This design choice:
  • Leverages existing function calling support from LLM providers
  • Provides a clear, structured interface
  • Makes agent actions explicit and traceable
  • Enables type-safe interactions with JSON Schema

What is a Tool?

A tool is a function that:
  1. Has a name and description
  2. Optionally defines input parameters (via JSON Schema)
  3. Returns a ToolOutput with content, reward, and finished flag
Example tools:
  • submit -Submit an answer to a problem
  • bash -Execute a bash command
  • read_file -Read a file’s contents
  • web_search -Search the web
  • python -Execute Python code

Tool Specification

Tools are advertised via two endpoints:
  • GET /{env_name}/tools — shared tools available to all tasks (no session required)
  • GET /{env_name}/task_tools — shared tools plus task-specific tools (requires X-Session-ID, since task-specific tools depend on the active task)
Example response from GET /{env_name}/tools:
{
  "tools": [
    {
      "name": "submit",
      "description": "Submit your answer to the current math problem",
      "input_schema": {
        "type": "object",
        "properties": {
          "answer": {
            "type": "number",
            "description": "Your numeric answer"
          }
        },
        "required": ["answer"]
      }
    }
  ]
}

Tool Spec Fields

name (string, required):
  • Tool identifier used in tool calls
  • Should be descriptive (e.g., bash, not b)
  • Convention: lowercase with underscores
description (string, required):
  • Human-readable explanation of what the tool does
  • Used by LLMs to decide when to call the tool
  • Should be clear and specific
input_schema (object, nullable):
  • JSON Schema defining tool parameters
  • null if the tool takes no parameters (always present in output, never omitted)
  • Enables validation and type checking

JSON Schema for Parameters

The input_schema follows JSON Schema specification:
{
  "type": "object",
  "properties": {
    "command": {
      "type": "string",
      "description": "The bash command to execute",
      "examples": ["ls -la", "cat file.txt"]
    },
    "timeout": {
      "type": "number",
      "description": "Command timeout in seconds",
      "default": 30
    }
  },
  "required": ["command"]
}
Supported types:
  • string, number, boolean
  • object (nested parameters)
  • array (lists of values)
  • null
Schema features:
  • required -Mandatory fields
  • default -Default values
  • enum -Allowed values
  • description -Field documentation
  • examples -Example values

Calling a Tool

Tools are called via POST /{env_name}/call with a JSON body:
{
  "name": "submit",
  "input": {"answer": "42"},
  "task_id": "call-001"
}
  • name: Tool to call (required)
  • input: Parameters matching the tool’s input_schema (required)
  • task_id: Optional identifier for SSE reconnection — clients can reconnect and retrieve results within a 60-second window

Tool Output

Every tool call returns a ToolOutput:
interface ToolOutput {
  blocks: Blocks  // Content (text/images)
  reward?: number  // RL feedback signal
  finished?: boolean  // Episode termination (default: false)
  metadata?: JSONObject  // Optional extra data
}

Wire Format

Tool call responses are delivered via Server-Sent Events and wrapped in a RunToolSuccess or RunToolError envelope: Successful completion:
{
  "ok": true,
  "output": {
    "blocks": [
      {"text": "Correct! The answer is 4.", "detail": null, "type": "text"}
    ],
    "metadata": null,
    "reward": 1.0,
    "finished": true
  }
}
Intermediate step:
{
  "ok": true,
  "output": {
    "blocks": [
      {"text": "total 48\ndrwxr-xr-x  8 user  staff  256 Jan  1 12:00 .", "detail": null, "type": "text"}
    ],
    "metadata": null,
    "reward": 0.0,
    "finished": false
  }
}
Error:
{
  "ok": false,
  "error": "Tool 'read_file' failed: File not found"
}

Key Fields

blocks: The content returned by the tool
  • Always an array (even for single text output)
  • Can be text, images, or both
  • This is what the agent observes
reward: Feedback for RL training
  • Optional (can be null)
  • Environment-defined; see Rewards for design patterns
finished: Episode termination signal
  • Defaults to false; always present in serialized output
  • When true, episode is complete
  • Agent should stop calling tools and cleanup session
metadata: Optional structured data
  • By convention, not included in the agent’s context window - but this is not enforced by the protocol
  • Used for logging, debugging, analysis
  • Can include execution time, resource usage, etc.

Tool Design Patterns

Pattern 1: Parameterless Tools

Tools that don’t need input:
{
  "name": "get_hint",
  "description": "Get a hint for the current problem",
  "input_schema": null
}
Called as:
{"name": "get_hint", "input": {}}

Pattern 2: Simple Parameter Tools

Tools with basic parameters:
{
  "name": "submit",
  "description": "Submit an answer",
  "input_schema": {
    "type": "object",
    "properties": {
      "answer": {"type": "number"}
    },
    "required": ["answer"]
  }
}

Pattern 3: Complex Parameter Tools

Tools with rich parameters:
{
  "name": "web_search",
  "description": "Search the web",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query"
      },
      "max_results": {
        "type": "number",
        "description": "Maximum results",
        "default": 10,
        "minimum": 1,
        "maximum": 100
      },
      "filters": {
        "type": "object",
        "properties": {
          "date_range": {"type": "string"},
          "domain": {"type": "string"}
        }
      }
    },
    "required": ["query"]
  }
}

Pattern 4: Enum Parameters

Tools with constrained choices:
{
  "name": "change_difficulty",
  "description": "Change problem difficulty",
  "input_schema": {
    "type": "object",
    "properties": {
      "difficulty": {
        "type": "string",
        "enum": ["easy", "medium", "hard"],
        "description": "Target difficulty level"
      }
    },
    "required": ["difficulty"]
  }
}

Task-Specific Tools

Some environments need different tools depending on the task. For example, a multiple-choice task might offer a select_option tool, while an open-ended task offers submit_answer instead. ORS supports this through two separate tool-listing endpoints:
EndpointRequires sessionReturns
GET /{env_name}/toolsNoShared tools (constant across all tasks)
GET /{env_name}/task_toolsYes (X-Session-ID)Shared tools + task-specific tools
Since task-specific tools depend on the active task, they can only be resolved within a session. Clients should call task_tools instead of tools after creating a session to get the complete tool set. If an environment has no task-specific tools, both endpoints return the same result.

Multi-Modal Tools

Tools can return images and text:
{
  "blocks": [
    {"text": "Here's a visualization of the solution:", "detail": null, "type": "text"},
    {
      "data": "iVBORw0KGgoAAAANSUhEUgA...",
      "mimeType": "image/png",
      "detail": null,
      "type": "image"
    }
  ],
  "metadata": null,
  "reward": 0.0,
  "finished": false
}
Example use cases:
  • Screenshots from web navigation
  • Visual feedback for agents

Tool Calling Flow

Tool Calling Flow - Agent to environment tool execution cycle

Best Practices

1. Clear Tool Names

# Good - descriptive
"submit_answer"
"read_file"
"execute_python"

# Bad - unclear
"do_thing"
"action1"
"tool"

2. Comprehensive Descriptions

# Good - specific and helpful
"Submit your final answer to the current math problem. The problem will be graded and you'll receive a reward."

# No Bad - vague
"Submit answer"

3. Validate Tool Inputs

def submit(self, params: SubmitParams) -> ToolOutput:
    if not isinstance(params.answer, (int, float)):
        return ToolOutput(
            blocks=[TextBlock(text="Error: Answer must be a number")],
            reward=-0.1,
            finished=True
        )

    # ... check answer logic

4. Provide Informative Outputs

# Good - explains what happened
ToolOutput(
    blocks=[TextBlock(text="Incorrect. Your answer was 5, but the correct answer is 7.")],
    reward=0.0,
    finished=True
)

# No Bad - no context
ToolOutput(
    blocks=[TextBlock(text="Wrong")],
    reward=0.0,
    finished=True
)

5. Use finished Correctly

# Good - clear termination
if answer_is_correct:
    return ToolOutput(..., finished=True)
else:
    return ToolOutput(..., finished=True)  # Task failed

# No Bad - ambiguous
return ToolOutput(..., finished=False)  # Never terminates?

Tool Security

Input Validation

Always validate tool inputs:
  • Check types match schema
  • Validate ranges and constraints
  • Reject malformed inputs

Resource Limits

Prevent resource exhaustion:
  • Set timeouts on tool execution
  • Limit output size

Next Steps

Tasks & Splits

Organise problems for training and evaluation

Rewards

Design reward signals for RL

Implementing a Server

Build an ORS server with custom tools

HTTP API

See how tools are listed and called

Key Takeaway: Tools are the agent’s interface to the environment. Design them carefully with clear names, comprehensive descriptions, proper validation, and informative outputs. The quality of your tools directly impacts agent performance.