Skip to main content
The Open Reward Standard is an HTTP-based protocol for connecting language model agents to reinforcement learning environments. It uses standard REST endpoints for control operations and Server-Sent Events (SSE) for delivering tool outputs.

Design Principles

1. Language-Agnostic

ORS uses HTTP, making it implementable in any programming language:
  • Python, TypeScript, Go, Rust, Java, etc.
  • Any web framework or HTTP library
  • Standard REST patterns

2. Episode-Centric

The protocol is organized around RL episodes (sessions):
  • One session = one episode
  • Episode continues until finished: true
  • Stateful interaction across multiple tool calls

3. Tool-Based Interaction

All agent actions are tool calls:
  • Discovered via GET /{env_name}/tools
  • Executed via POST /{env_name}/call
  • Return structured outputs with rewards

Protocol Architecture

ORS Protocol Architecture — Agent communicating with ORS Server and Environment Logic

Key Components

Agent Side:
  • Makes HTTP requests
  • Parses SSE responses
  • Maintains session ID
ORS Server:
  • Implements HTTP endpoints
  • Manages episode state
  • Executes tools and returns rewards

Episode Lifecycle

An episode (session) follows this lifecycle:
1. Create Session

2. Create Episode Instance

3. Get Prompt (initial state)

4. Call Tools (actions)
   ├─ Receive reward
   ├─ Check finished flag
   └─ If not finished, repeat step 4

5. Delete Episode (cleanup)

Example Flow

# 1. Create session ID
POST /create_session
→ {"sid": "abc-123"}

# 2. Create episode instance with task
POST /create
Headers: X-Session-ID: abc-123
Body: {
  "env_name": "math",
  "task_spec": {"question": "What is 2+2?", "answer": "4"},
  "secrets": {}
}
→ {"sid": "abc-123"}

# 3. Get initial prompt
GET /math/prompt
Headers: X-Session-ID: abc-123
→ [{"text": "What is 2+2?", "detail": null, "type": "text"}]

# 4. Call tool
POST /math/call
Headers: X-Session-ID: abc-123
Accept: text/event-stream
Body: {"name": "submit", "input": {"answer": "4"}}
→ SSE stream (see below for format)

# 5. Delete episode
POST /delete
Headers: X-Session-ID: abc-123
→ {"sid": "abc-123"}

Endpoint Categories

ORS endpoints fall into four categories:

1. Discovery Endpoints

Get information about the environment:
GET /list_environments        # List available environments
GET /{env_name}/tools        # List available tools
GET /{env_name}/splits       # List available splits
POST /{env_name}/tasks       # List tasks for a split
These are stateless - no session required.

2. Session Management

Create and manage episodes:
POST /create_session         # Generate session ID
POST /create                 # Create episode instance
POST /delete                 # Delete episode
POST /delete_session         # (Cleanup - optional)
POST /ping                   # Keep session alive
These require the X-Session-ID header (except create_session).

3. Episode Interaction

Interact with the active episode:
GET /{env_name}/prompt       # Get initial prompt
POST /{env_name}/call        # Call a tool
These require X-Session-ID and an active episode.

4. Health

GET /health                  # Server health check

Session Management

X-Session-ID Header

Episodes are identified by a session ID passed in the X-Session-ID header:
POST /create
X-Session-ID: abc-123
Flow:
  1. Call POST /create_session to get a session ID
  2. Use that ID in all subsequent requests
  3. Server maintains episode state for that ID
  4. Call POST /delete to clean up

Session Timeout

Sessions automatically expire after 15 minutes of inactivity. To prevent timeout:
POST /ping
X-Session-ID: abc-123
Call /ping periodically to keep the session alive. The reference SDK pings every 10 seconds; at minimum, ping well before the 15-minute timeout.

Tool Execution with SSE

Tool calls return results via Server-Sent Events:
POST /{env_name}/call
Headers:
  X-Session-ID: abc-123
  Accept: text/event-stream
Body: {
  "name": "bash",
  "input": {"command": "ls -la"}
}
Response (SSE stream):
event: task_id
data: 877bb56c594e4a0f921ad55c439a3762

event: end
data: {"ok": true, "output": {"blocks": [{"text": "Output text", "detail": null, "type": "text"}], "metadata": null, "reward": 0.0, "finished": false}}
For large responses (>4KB), the result is split across chunk events before the final end event. See Server-Sent Events for the full event type reference.

Why SSE?

Server-Sent Events are used because:
  • Long-running tool calls: Keeps connections alive while tools execute (bash commands, LLM calls, etc.)
  • Chunking large responses: Results over 4KB are split into chunks for reliable delivery
  • Reconnection: Clients can reconnect with a task ID and retrieve completed results
  • Standard protocol: Built into browsers and HTTP libraries

Error Handling

HTTP Status Codes

Standard HTTP status codes:
  • 200 OK: Successful request
  • 400 Bad Request: Invalid input
  • 404 Not Found: Session/environment/tool not found
  • 500 Internal Server Error: Server error

SSE Errors

Errors can also arrive as SSE events during tool execution streaming:
event: task_id
data: task-xyz-790

event: error
data: Session not found
These represent server-level failures (session not found, tool not recognized, internal error). They are distinct from tool logic errors, which return {"ok": false, "error": "..."} in an end event. See Server-Sent Events for details.

Tool Errors

Tool execution errors are returned in the ToolOutput:
{
  "ok": false,
  "error": "Tool 'submit' failed: Invalid answer format"
}
Successful tool calls:
{
  "ok": true,
  "output": {
    "blocks": [{"text": "Correct!", "detail": null, "type": "text"}],
    "metadata": null,
    "reward": 1.0,
    "finished": true
  }
}

Stateful Sessions

Sessions maintain state across tool calls:
# Episode state persists between calls
session.call_tool("bash", {"command": "echo 'hello' > file.txt"})
session.call_tool("bash", {"command": "cat file.txt"})
# → "hello"
What’s maintained:
  • Environment-specific state (variables, files, etc.)
  • Task context
  • Episode progress
What ends an episode:
  • Client calls POST /delete (explicit cleanup — calls environment’s teardown())
  • Session times out after 15 minutes of inactivity (automatic reaper)
The finished: true signal indicates the episode is logically complete from an RL perspective (e.g., the agent submitted a correct answer). It does not trigger automatic server-side teardown. The client should call POST /delete to free resources after receiving finished: true.
What’s NOT maintained:
  • State across different sessions

Security Considerations

Secrets

Tasks can receive secrets via the secrets field:
POST /create
Body: {
  "env_name": "web_env",
  "task_spec": {...},
  "secrets": {
    "api_key": "sk-..."
  }
}
Secrets are passed to the environment when a session is created. The environment can use those secrets for internal logic of the environment. For example, a model provider API key an be used to initialise an LLM client within the environment for an LLM grader.

Isolation

ORS sessions are isolated at the protocol level. Each session ID maps to its own environment instance with independent state. But deeper isolation depends on the server implementation:
  • Protocol-level: Each session has its own environment instance and state
  • Implementation-level: Filesystem, network, and process isolation require additional infrastructure (e.g., containers, sandboxes)

Implementation Approaches

Option 1: Use ORS Python SDK

The Python SDK implements the full ORS protocol:
from ors import Environment, Server, tool

class MyEnvironment(Environment):
    @classmethod
    def list_splits(cls):
        return ["train", "test"]

    # ... implement other methods

server = Server([MyEnvironment])
server.run(port=8080)
The SDK handles:
  • HTTP endpoint routing
  • Session management
  • SSE response delivery
  • Error handling

Option 2: Implement from Scratch

Implement the protocol in any language:
  1. Create HTTP server
  2. Implement required endpoints
  3. Manage session state
  4. Return tool outputs via SSE
See Implementation Guide for details.

Next Steps

HTTP API Reference

Complete endpoint documentation

Data Types

Request and response schemas

Session Management

Deep dive on episodes and sessions

Key Takeaway: ORS is a straightforward HTTP protocol with RESTful endpoints for discovery and management, plus SSE for delivering tool execution results. It’s designed to be simple to implement in any language.