Skip to main content

Protocol Overview

The Open Reward Standard is an HTTP-based protocol for connecting language model agents to reinforcement learning environments. It uses standard REST endpoints for control operations and Server-Sent Events (SSE) for streaming tool outputs.

Design Principles

1. Language-Agnostic

ORS uses HTTP, making it implementable in any programming language:
  • Python, TypeScript, Go, Rust, Java, etc.
  • Any web framework or HTTP library
  • Standard REST patterns

2. Episode-Centric

The protocol is organized around RL episodes (sessions):
  • One session = one episode
  • Episode continues until finished: true
  • Stateful interaction across multiple tool calls

3. Tool-Based Interaction

All agent actions are tool calls:
  • Discovered via GET /{env_name}/tools
  • Executed via POST /{env_name}/call
  • Return structured outputs with rewards

Protocol Architecture

┌─────────────┐                  ┌─────────────┐
│             │   HTTP + SSE     │             │
│    Agent    │ ◄──────────────► │ ORS Server  │
│             │                  │             │
└─────────────┘                  └─────────────┘
       │                                │
       │                                │
       │                                ▼
       │                         ┌─────────────┐
       │                         │ Environment │
       └────── Episodes ─────────┤   Logic     │
                                 └─────────────┘

Key Components

Agent Side:
  • Makes HTTP requests
  • Handles SSE streams
  • Maintains session ID
ORS Server:
  • Implements HTTP endpoints
  • Manages episode state
  • Executes tools and returns rewards

Episode Lifecycle

An episode (session) follows this lifecycle:
1. Create Session

2. Create Episode Instance

3. Get Prompt (initial state)

4. Call Tools (actions)
   ├─ Receive reward
   ├─ Check finished flag
   └─ If not finished, repeat step 4

5. Delete Episode (cleanup)

Example Flow

# 1. Create session ID
POST /create_session
→ {"sid": "abc-123"}

# 2. Create episode instance with task
POST /create
Headers: X-Session-ID: abc-123
Body: {
  "env_name": "math",
  "task_spec": {"question": "What is 2+2?", "answer": "4"},
  "secrets": {}
}
→ {"sid": "abc-123"}

# 3. Get initial prompt
GET /math/prompt
Headers: X-Session-ID: abc-123
→ [{"text": "What is 2+2?", "detail": null, "type": "text"}]

# 4. Call tool
POST /math/call
Headers: X-Session-ID: abc-123
Accept: text/event-stream
Body: {"name": "submit", "input": {"answer": "4"}}
→ SSE stream (see below for format)

# 5. Delete episode
POST /delete
Headers: X-Session-ID: abc-123
→ {"sid": "abc-123"}

Endpoint Categories

ORS endpoints fall into four categories:

1. Discovery Endpoints

Get information about the environment:
GET /list_environments        # List available environments
GET /{env_name}/tools        # List available tools
GET /{env_name}/splits       # List available splits
POST /{env_name}/tasks       # List tasks for a split
These are stateless - no session required.

2. Session Management

Create and manage episodes:
POST /create_session         # Generate session ID
POST /create                 # Create episode instance
POST /delete                 # Delete episode
POST /delete_session         # (Cleanup - optional)
POST /ping                   # Keep session alive
These require the X-Session-ID header (except create_session).

3. Episode Interaction

Interact with the active episode:
GET /{env_name}/prompt       # Get initial prompt
POST /{env_name}/call        # Call a tool
These require X-Session-ID and an active episode.

4. Health

GET /health                  # Server health check

Session Management

X-Session-ID Header

Episodes are identified by a session ID passed in the X-Session-ID header:
POST /create
X-Session-ID: abc-123
Flow:
  1. Call POST /create_session to get a session ID
  2. Use that ID in all subsequent requests
  3. Server maintains episode state for that ID
  4. Call POST /delete to clean up

Session Timeout

Sessions automatically expire after 15 minutes of inactivity. To prevent timeout:
POST /ping
X-Session-ID: abc-123
Call /ping periodically (e.g., every 5 minutes) to keep the session alive.

Tool Execution with SSE

Tool calls use Server-Sent Events for streaming responses:
POST /{env_name}/call
Headers:
  X-Session-ID: abc-123
  Accept: text/event-stream
Body: {
  "name": "bash",
  "input": {"command": "ls -la"}
}
Response (SSE stream):
event: task_id
data: 877bb56c594e4a0f921ad55c439a3762

event: end
data: {"ok": true, "output": {"blocks": [{"text": "Output text", "detail": null, "type": "text"}], "metadata": null, "reward": 0.0, "finished": false}}

Why SSE?

Server-Sent Events enable:
  • Streaming long-running operations: Bash commands, LLM calls, etc.
  • Progressive output: Send results as they’re generated
  • Clean error handling: Structured error messages
  • Standard protocol: Built into browsers and HTTP libraries

Error Handling

HTTP Status Codes

Standard HTTP status codes:
  • 200 OK: Successful request
  • 400 Bad Request: Invalid input
  • 404 Not Found: Session/environment/tool not found
  • 500 Internal Server Error: Server error

Tool Errors

Tool execution errors are returned in the ToolOutput:
{
  "ok": false,
  "error": "Tool 'submit' failed: Invalid answer format"
}
Successful tool calls:
{
  "ok": true,
  "output": {
    "blocks": [{"text": "Correct!", "detail": null, "type": "text"}],
    "metadata": null,
    "reward": 1.0,
    "finished": true
  }
}

Stateful Sessions

Sessions maintain state across tool calls:
# Episode state persists between calls
session.call_tool("bash", {"command": "echo 'hello' > file.txt"})
session.call_tool("bash", {"command": "cat file.txt"})
# → "hello"
What’s maintained:
  • Environment-specific state (variables, files, etc.)
  • Task context
  • Episode progress
What’s NOT maintained:
  • State after finished: true
  • State after session timeout
  • State across different sessions

Concurrency

Multiple agents can interact with the same ORS server concurrently:
  • Each session is independent
  • Sessions are isolated from each other
  • Server handles concurrent requests
Agent A (session-1) ─┐
                     ├─► ORS Server
Agent B (session-2) ─┘

Security Considerations

Secrets

Tasks can receive secrets via the secrets field:
POST /create
Body: {
  "env_name": "web_env",
  "task_spec": {...},
  "secrets": {
    "api_key": "sk-..."
  }
}
Secrets are:
  • Passed to the environment at episode creation
  • Available to tools during execution
  • Not logged or persisted

Isolation

Episodes should be isolated:
  • File system changes in one episode don’t affect others
  • Environment variables are episode-specific
  • Network access is controlled per-episode

Implementation Approaches

Option 1: Use OpenReward Python SDK

The Python SDK implements the full ORS protocol:
from openreward.environments import Environment, Server, tool

class MyEnvironment(Environment):
    @classmethod
    def list_splits(cls):
        return ["train", "test"]

    # ... implement other methods

server = Server([MyEnvironment])
server.run(port=8080)
The SDK handles:
  • HTTP endpoint routing
  • Session management
  • SSE streaming
  • Error handling

Option 2: Implement from Scratch

Implement the protocol in any language:
  1. Create HTTP server
  2. Implement required endpoints
  3. Manage session state
  4. Stream tool outputs via SSE
See Implementation Guide for details.

Next Steps


Key Takeaway: ORS is a straightforward HTTP protocol with RESTful endpoints for discovery and management, plus SSE for streaming tool execution. It’s designed to be simple to implement in any language.