Protocol Overview

The Open Reward Standard is an HTTP-based protocol for connecting language model agents to reinforcement learning environments. It uses standard REST endpoints for control operations and Server-Sent Events (SSE) for streaming tool outputs.

Design Principles

1. Language-Agnostic

ORS uses HTTP, making it implementable in any programming language:

Python, TypeScript, Go, Rust, Java, etc.
Any web framework or HTTP library
Standard REST patterns

2. Episode-Centric

The protocol is organized around RL episodes (sessions):

One session = one episode
Episode continues until finished: true
Stateful interaction across multiple tool calls

3. Tool-Based Interaction

All agent actions are tool calls:

Discovered via GET /{env_name}/tools
Executed via POST /{env_name}/call
Return structured outputs with rewards

Protocol Architecture

┌─────────────┐                  ┌─────────────┐
│             │   HTTP + SSE     │             │
│    Agent    │ ◄──────────────► │ ORS Server  │
│             │                  │             │
└─────────────┘                  └─────────────┘
       │                                │
       │                                │
       │                                ▼
       │                         ┌─────────────┐
       │                         │ Environment │
       └────── Episodes ─────────┤   Logic     │
                                 └─────────────┘

Key Components

Agent Side:

Makes HTTP requests
Handles SSE streams
Maintains session ID

ORS Server:

Implements HTTP endpoints
Manages episode state
Executes tools and returns rewards

Episode Lifecycle

An episode (session) follows this lifecycle:

1. Create Session
   ↓
2. Create Episode Instance
   ↓
3. Get Prompt (initial state)
   ↓
4. Call Tools (actions)
   ├─ Receive reward
   ├─ Check finished flag
   └─ If not finished, repeat step 4
   ↓
5. Delete Episode (cleanup)

Example Flow

# 1. Create session ID
POST /create_session
→ {"sid": "abc-123"}

# 2. Create episode instance with task
POST /create
Headers: X-Session-ID: abc-123
Body: {
  "env_name": "math",
  "task_spec": {"question": "What is 2+2?", "answer": "4"},
  "secrets": {}
}
→ {"sid": "abc-123"}

# 3. Get initial prompt
GET /math/prompt
Headers: X-Session-ID: abc-123
→ [{"text": "What is 2+2?", "detail": null, "type": "text"}]

# 4. Call tool
POST /math/call
Headers: X-Session-ID: abc-123
Accept: text/event-stream
Body: {"name": "submit", "input": {"answer": "4"}}
→ SSE stream (see below for format)

# 5. Delete episode
POST /delete
Headers: X-Session-ID: abc-123
→ {"sid": "abc-123"}

Endpoint Categories

ORS endpoints fall into four categories:

1. Discovery Endpoints

Get information about the environment:

GET /list_environments        # List available environments
GET /{env_name}/tools        # List available tools
GET /{env_name}/splits       # List available splits
POST /{env_name}/tasks       # List tasks for a split

These are stateless - no session required.

2. Session Management

Create and manage episodes:

POST /create_session         # Generate session ID
POST /create                 # Create episode instance
POST /delete                 # Delete episode
POST /delete_session         # (Cleanup - optional)
POST /ping                   # Keep session alive

These require the X-Session-ID header (except create_session).

3. Episode Interaction

Interact with the active episode:

GET /{env_name}/prompt       # Get initial prompt
POST /{env_name}/call        # Call a tool

These require X-Session-ID and an active episode.

4. Health

GET /health                  # Server health check

Session Management

X-Session-ID Header

Episodes are identified by a session ID passed in the X-Session-ID header:

POST /create
X-Session-ID: abc-123

Flow:

Call POST /create_session to get a session ID
Use that ID in all subsequent requests
Server maintains episode state for that ID
Call POST /delete to clean up

Session Timeout

Sessions automatically expire after 15 minutes of inactivity. To prevent timeout:

POST /ping
X-Session-ID: abc-123

Call /ping periodically (e.g., every 5 minutes) to keep the session alive.

Tool Execution with SSE

Tool calls use Server-Sent Events for streaming responses:

POST /{env_name}/call
Headers:
  X-Session-ID: abc-123
  Accept: text/event-stream
Body: {
  "name": "bash",
  "input": {"command": "ls -la"}
}

Response (SSE stream):

event: task_id
data: 877bb56c594e4a0f921ad55c439a3762

event: end
data: {"ok": true, "output": {"blocks": [{"text": "Output text", "detail": null, "type": "text"}], "metadata": null, "reward": 0.0, "finished": false}}

Why SSE?

Server-Sent Events enable:

Streaming long-running operations: Bash commands, LLM calls, etc.
Progressive output: Send results as they’re generated
Clean error handling: Structured error messages
Standard protocol: Built into browsers and HTTP libraries

Error Handling

HTTP Status Codes

Standard HTTP status codes:

200 OK: Successful request
400 Bad Request: Invalid input
404 Not Found: Session/environment/tool not found
500 Internal Server Error: Server error

Tool Errors

Tool execution errors are returned in the ToolOutput:

{
  "ok": false,
  "error": "Tool 'submit' failed: Invalid answer format"
}

Successful tool calls:

{
  "ok": true,
  "output": {
    "blocks": [{"text": "Correct!", "detail": null, "type": "text"}],
    "metadata": null,
    "reward": 1.0,
    "finished": true
  }
}

Stateful Sessions

Sessions maintain state across tool calls:

# Episode state persists between calls
session.call_tool("bash", {"command": "echo 'hello' > file.txt"})
session.call_tool("bash", {"command": "cat file.txt"})
# → "hello"

What’s maintained:

Environment-specific state (variables, files, etc.)
Task context
Episode progress

What’s NOT maintained:

State after finished: true
State after session timeout
State across different sessions

Concurrency

Multiple agents can interact with the same ORS server concurrently:

Each session is independent
Sessions are isolated from each other
Server handles concurrent requests

Agent A (session-1) ─┐
                     ├─► ORS Server
Agent B (session-2) ─┘

Security Considerations

Secrets

Tasks can receive secrets via the secrets field:

POST /create
Body: {
  "env_name": "web_env",
  "task_spec": {...},
  "secrets": {
    "api_key": "sk-..."
  }
}

Secrets are:

Passed to the environment at episode creation
Available to tools during execution
Not logged or persisted

Isolation

Episodes should be isolated:

File system changes in one episode don’t affect others
Environment variables are episode-specific
Network access is controlled per-episode

Implementation Approaches

Option 1: Use OpenReward Python SDK

The Python SDK implements the full ORS protocol:

from openreward.environments import Environment, Server, tool

class MyEnvironment(Environment):
    @classmethod
    def list_splits(cls):
        return ["train", "test"]

    # ... implement other methods

server = Server([MyEnvironment])
server.run(port=8080)

The SDK handles:

HTTP endpoint routing
Session management
SSE streaming
Error handling

Option 2: Implement from Scratch

Implement the protocol in any language:

Create HTTP server
Implement required endpoints
Manage session state
Stream tool outputs via SSE

See Implementation Guide for details.

Next Steps

HTTP API Reference

Complete endpoint documentation

Data Types

Request and response schemas

Session Management

Deep dive on episodes and sessions

Key Takeaway: ORS is a straightforward HTTP protocol with RESTful endpoints for discovery and management, plus SSE for streaming tool execution. It’s designed to be simple to implement in any language.

Getting Started

Specification

Core Concepts

Implementation Guides

Comparison

Protocol Overview

Protocol Overview

Design Principles

1. Language-Agnostic

2. Episode-Centric

3. Tool-Based Interaction

Protocol Architecture

Key Components

Episode Lifecycle

Example Flow

Endpoint Categories

1. Discovery Endpoints

2. Session Management

3. Episode Interaction

4. Health

Session Management

X-Session-ID Header

Session Timeout

Tool Execution with SSE

Why SSE?

Error Handling

HTTP Status Codes

Tool Errors

Stateful Sessions

Concurrency

Security Considerations

Secrets

Isolation

Implementation Approaches

Option 1: Use OpenReward Python SDK

Option 2: Implement from Scratch

Next Steps

HTTP API Reference

Data Types

Session Management

Getting Started

Specification

Core Concepts

Implementation Guides

Comparison

​Protocol Overview

​Design Principles

​1. Language-Agnostic

​2. Episode-Centric

​3. Tool-Based Interaction

​Protocol Architecture

​Key Components

​Episode Lifecycle

​Example Flow

​Endpoint Categories

​1. Discovery Endpoints

​2. Session Management

​3. Episode Interaction

​4. Health

​Session Management

​X-Session-ID Header

​Session Timeout

​Tool Execution with SSE

​Why SSE?

​Error Handling

​HTTP Status Codes

​Tool Errors

​Stateful Sessions

​Concurrency

​Security Considerations

​Secrets

​Isolation

​Implementation Approaches

​Option 1: Use OpenReward Python SDK

​Option 2: Implement from Scratch

​Next Steps

HTTP API Reference

Data Types

Session Management

Protocol Overview

Design Principles

1. Language-Agnostic

2. Episode-Centric

3. Tool-Based Interaction

Protocol Architecture

Key Components

Episode Lifecycle

Example Flow

Endpoint Categories

1. Discovery Endpoints

2. Session Management

3. Episode Interaction

4. Health

Session Management

X-Session-ID Header

Session Timeout

Tool Execution with SSE

Why SSE?

Error Handling

HTTP Status Codes

Tool Errors

Stateful Sessions

Concurrency

Security Considerations

Secrets

Isolation

Implementation Approaches

Option 1: Use OpenReward Python SDK

Option 2: Implement from Scratch

Next Steps