What is ORS?

The Open Reward Standard (ORS) is an open-source HTTP-based protocol for connecting AI models to reinforcement learning (RL) environments. It specifies how an AI model can interact with an environment to manipulate its state and obtain results and rewards.

ORS Architecture — Agent interacting with an ORS Environment Server

Key Features

ORS is designed for reinforcement learning and agentic evaluation. Its key features include:

Episodes: Sessions are RL episodes that continue until a finished signal
Rewards: Numeric feedback that can be used for reinforcement learning
Tool calling: Actions are tools - agents interact with an environment via function calling
Tasks & Splits: Tasks are organised into splits for training and evaluation
Language-agnostic: the underlying HTTP protocol can be implemented in any language

Example Server

Here is a server written with the example Python SDK.

from pydantic import BaseModel
from ors import Environment, Server, tool, ToolOutput, TextBlock, Split


class SubmitInput(BaseModel):
    answer: str


class MathEnv(Environment):
    def __init__(self, task_spec=None, secrets=None):
        super().__init__(task_spec=task_spec or {}, secrets=secrets or {})

    @classmethod
    def list_splits(cls):
        return [Split(name="test", type="test")]

    @classmethod
    def list_tasks(cls, split: str):
        return [{"question": "What is 2+2?", "answer": "4"}]

    def get_prompt(self):
        return [TextBlock(text=self.task_spec["question"])]

    @tool
    def submit(self, params: SubmitInput) -> ToolOutput:
        correct = params.answer.strip() == self.task_spec["answer"]
        return ToolOutput(
            blocks=[TextBlock(text="Correct!" if correct else "Incorrect")],
            reward=1.0 if correct else 0.0,
            finished=True,
        )


if __name__ == "__main__":
    Server([MathEnv]).run(port=8080)

Example Client

Here is a client written with the example Python SDK.

from ors.client import ORS

client = ORS(base_url="http://localhost:8080")
env = client.environment("mathenv")
tasks = env.list_tasks(split="test")
tools = env.list_tools()

with env.session(task=tasks[0]) as session:
    prompt = session.get_prompt()
    print(f"Question: {prompt[0].text}")

    result = session.call_tool("submit", {"answer": "4"})
    print(f"Reward: {result.reward}, Finished: {result.finished}")

Core Concepts

An ORS server provides access to:

Tools - Core methods for interacting with environments (e.g., bash, submit_solution)
Tasks - Specific problems to be accomplished (e.g., math problems, coding challenges)
Splits - Categorised lists of tasks (e.g. train, validation, test)
Prompts - Instructions given to the agent for each task
Rewards - Numeric feedback signals for RL training
Episodes - Stateful sessions that continue until finished: true

Actions are Tools

A key principle in ORS: the only way agents interact with environments is by calling tools. This design:

Leverages existing function calling support from LLM providers
Provides a clear interface boundary
Makes agent actions explicit and traceable

This design also enforces a Cartesian boundary between the agent and the environment. ORS environments make no assumptions about the agent that is interacting with them. For example, there is no notion of chat messages or tokenised outputs. This avoids entanglement with the core agent loop.

Why ORS?

Primary use case: RL training

ORS allows you to write environments for reinforcement learning:

Reward signals: Actions yield numeric rewards that can be used in RL
Episode structure: Sessions are episodes with finished signals from tools
State manipulation: Agents interact with stateful environments over multiple steps by calling tools

Secondary use case: Evaluation

ORS also excels at agentic evaluation:

Structured benchmarks with train/test splits
Reproducible evaluation across different agents
Standard interface for diverse task types

How does ORS compare to MCP?

The Model Context Protocol (MCP) is excellent for connecting LLMs to tools and data sources. But it serves a more specific purpose of providing tool access, rather than the full set of primitives for reinforcement learning:

Feature	MCP	ORS
Purpose	Tool access, workflows	RL training environments
Episode termination	No	Yes - `finished` signal
Rewards	No	Yes - For RL training
Tasks & Splits	No	Yes - Train/validation/test
Tool calling	Yes	Yes
Protocol	JSON-RPC	HTTP/REST + SSE

Key difference: ORS includes reward and finished signals that enable reinforcement learning, plus task organization for training and evaluation.

ORS and MCP serve complementary purposes. Use MCP for general tool access, ORS for RL training and structured evaluation.

Next Steps

Understand the Protocol

Read the introduction to learn core concepts

See it in Action

Follow the quick start to run a local ORS server

Build Your Own

Use the implementation guide to create an ORS server

Example Implementations

Looking for existing ORS environment implementations to reference or use?

EnvCommons

Browse reference ORS environment implementations

Note: The ORS Python SDK is one implementation of ORS. The standard itself is language-agnostic and can be implemented in Python, TypeScript, Go, Rust, or any language with HTTP support.

Getting Started

Specification

Core Concepts

Implementation Guides

Comparison

Key Features

Example Server

Example Client

Core Concepts

Actions are Tools

Why ORS?

Primary use case: RL training

Secondary use case: Evaluation

How does ORS compare to MCP?

Next Steps

Example Implementations

EnvCommons

Getting Started

Specification

Core Concepts

Implementation Guides

Comparison

​Key Features

​Example Server

​Example Client

​Core Concepts

​Actions are Tools

​Why ORS?

​Primary use case: RL training

​Secondary use case: Evaluation

​How does ORS compare to MCP?

​Next Steps

​Example Implementations

EnvCommons

Key Features

Example Server

Example Client

Core Concepts

Actions are Tools

Why ORS?

Primary use case: RL training

Secondary use case: Evaluation

How does ORS compare to MCP?

Next Steps

Example Implementations