Key Features
ORS is designed for reinforcement learning and agentic evaluation. Its key features include:- Episodes: Sessions are RL episodes that continue until a
finishedsignal - Rewards: Numeric feedback that can be used for reinforcement learning
- Tool calling: Actions are tools - agents interact with an environment via function calling
- Tasks & Splits: Tasks are organised into splits for training and evaluation
- Language-agnostic: the underlying HTTP protocol can be implemented in any language
Example Server
Here is a server written with the example Python SDK.Example Client
Here is a client written with the example Python SDK.Core Concepts
An ORS server provides access to:- Tools - Core methods for interacting with environments (e.g.,
bash,submit_solution) - Tasks - Specific problems to be accomplished (e.g., math problems, coding challenges)
- Splits - Categorised lists of tasks (e.g. train, validation, test)
- Prompts - Instructions given to the agent for each task
- Rewards - Numeric feedback signals for RL training
- Episodes - Stateful sessions that continue until
finished: true
Actions are Tools
A key principle in ORS: the only way agents interact with environments is by calling tools. This design:- Leverages existing function calling support from LLM providers
- Provides a clear interface boundary
- Makes agent actions explicit and traceable
Why ORS?
Primary use case: RL training
ORS allows you to write environments for reinforcement learning:- Reward signals: Actions yield numeric rewards that can be used in RL
- Episode structure: Sessions are episodes with
finishedsignals from tools - State manipulation: Agents interact with stateful environments over multiple steps by calling tools
Secondary use case: Evaluation
ORS also excels at agentic evaluation:- Structured benchmarks with train/test splits
- Reproducible evaluation across different agents
- Standard interface for diverse task types
How does ORS compare to MCP?
The Model Context Protocol (MCP) is excellent for connecting LLMs to tools and data sources. But it serves a more specific purpose of providing tool access, rather than the full set of primitives for reinforcement learning:| Feature | MCP | ORS |
|---|---|---|
| Purpose | Tool access, workflows | RL training environments |
| Episode termination | No | Yes - finished signal |
| Rewards | No | Yes - For RL training |
| Tasks & Splits | No | Yes - Train/validation/test |
| Tool calling | Yes | Yes |
| Protocol | JSON-RPC | HTTP/REST + SSE |
ORS and MCP serve complementary purposes. Use MCP for general tool access, ORS for RL training and structured evaluation.
Next Steps
Understand the Protocol
Read the introduction to learn core concepts
See it in Action
Follow the quick start to run a local ORS server
Build Your Own
Use the implementation guide to create an ORS server
Example Implementations
Looking for existing ORS environment implementations to reference or use?EnvCommons
Browse reference ORS environment implementations
Note: The ORS Python SDK is one implementation of ORS. The standard itself is language-agnostic and can be implemented in Python, TypeScript, Go, Rust, or any language with HTTP support.

