Skip to main content
The Open Reward Standard (ORS) is an open-source HTTP-based protocol for connecting AI models to reinforcement learning (RL) environments. It specifies how an AI model can interact with an environment to manipulate its state and obtain results and rewards.
ORS Architecture — Agent interacting with an ORS Environment Server

Key Features

ORS is designed for reinforcement learning and agentic evaluation. Its key features include:
  • Episodes: Sessions are RL episodes that continue until a finished signal
  • Rewards: Numeric feedback that can be used for reinforcement learning
  • Tool calling: Actions are tools - agents interact with an environment via function calling
  • Tasks & Splits: Tasks are organised into splits for training and evaluation
  • Language-agnostic: the underlying HTTP protocol can be implemented in any language

Example Server

Here is a server written with the example Python SDK.
from pydantic import BaseModel
from ors import Environment, Server, tool, ToolOutput, TextBlock, Split


class SubmitInput(BaseModel):
    answer: str


class MathEnv(Environment):
    def __init__(self, task_spec=None, secrets=None):
        super().__init__(task_spec=task_spec or {}, secrets=secrets or {})

    @classmethod
    def list_splits(cls):
        return [Split(name="test", type="test")]

    @classmethod
    def list_tasks(cls, split: str):
        return [{"question": "What is 2+2?", "answer": "4"}]

    def get_prompt(self):
        return [TextBlock(text=self.task_spec["question"])]

    @tool
    def submit(self, params: SubmitInput) -> ToolOutput:
        correct = params.answer.strip() == self.task_spec["answer"]
        return ToolOutput(
            blocks=[TextBlock(text="Correct!" if correct else "Incorrect")],
            reward=1.0 if correct else 0.0,
            finished=True,
        )


if __name__ == "__main__":
    Server([MathEnv]).run(port=8080)

Example Client

Here is a client written with the example Python SDK.
from ors.client import ORS

client = ORS(base_url="http://localhost:8080")
env = client.environment("mathenv")
tasks = env.list_tasks(split="test")
tools = env.list_tools()

with env.session(task=tasks[0]) as session:
    prompt = session.get_prompt()
    print(f"Question: {prompt[0].text}")

    result = session.call_tool("submit", {"answer": "4"})
    print(f"Reward: {result.reward}, Finished: {result.finished}")

Core Concepts

An ORS server provides access to:
  1. Tools - Core methods for interacting with environments (e.g., bash, submit_solution)
  2. Tasks - Specific problems to be accomplished (e.g., math problems, coding challenges)
  3. Splits - Categorised lists of tasks (e.g. train, validation, test)
  4. Prompts - Instructions given to the agent for each task
  5. Rewards - Numeric feedback signals for RL training
  6. Episodes - Stateful sessions that continue until finished: true

Actions are Tools

A key principle in ORS: the only way agents interact with environments is by calling tools. This design:
  • Leverages existing function calling support from LLM providers
  • Provides a clear interface boundary
  • Makes agent actions explicit and traceable
This design also enforces a Cartesian boundary between the agent and the environment. ORS environments make no assumptions about the agent that is interacting with them. For example, there is no notion of chat messages or tokenised outputs. This avoids entanglement with the core agent loop.

Why ORS?

Primary use case: RL training

ORS allows you to write environments for reinforcement learning:
  • Reward signals: Actions yield numeric rewards that can be used in RL
  • Episode structure: Sessions are episodes with finished signals from tools
  • State manipulation: Agents interact with stateful environments over multiple steps by calling tools

Secondary use case: Evaluation

ORS also excels at agentic evaluation:
  • Structured benchmarks with train/test splits
  • Reproducible evaluation across different agents
  • Standard interface for diverse task types

How does ORS compare to MCP?

The Model Context Protocol (MCP) is excellent for connecting LLMs to tools and data sources. But it serves a more specific purpose of providing tool access, rather than the full set of primitives for reinforcement learning:
FeatureMCPORS
PurposeTool access, workflowsRL training environments
Episode terminationNoYes - finished signal
RewardsNoYes - For RL training
Tasks & SplitsNoYes - Train/validation/test
Tool callingYesYes
ProtocolJSON-RPCHTTP/REST + SSE
Key difference: ORS includes reward and finished signals that enable reinforcement learning, plus task organization for training and evaluation.
ORS and MCP serve complementary purposes. Use MCP for general tool access, ORS for RL training and structured evaluation.

Next Steps

1

Understand the Protocol

Read the introduction to learn core concepts
2

See it in Action

Follow the quick start to run a local ORS server
3

Build Your Own

Use the implementation guide to create an ORS server

Example Implementations

Looking for existing ORS environment implementations to reference or use?

EnvCommons

Browse reference ORS environment implementations

Note: The ORS Python SDK is one implementation of ORS. The standard itself is language-agnostic and can be implemented in Python, TypeScript, Go, Rust, or any language with HTTP support.