Quick Start

Get started with ORS by building a simple math environment server and testing it locally.

What You’ll Build

A working ORS server for math problems (GSM8K-style) with:

One tool (submit) for submitting answers
Train and test task splits
Reward signals for RL training
Local HTTP server you can test with curl or Python

Time: ~15 minutes

Prerequisites

Python 3.8+ installed
Basic Python knowledge
Terminal/command line access

Step 1: Install Dependencies

The OpenReward Python SDK is one implementation of the ORS specification. We’ll use it for this quickstart.

pip install openreward pandas

Remember: The Python SDK is ONE way to implement ORS. You can also implement the HTTP protocol from scratch in any language.

Step 2: Download GSM8K Data

Download the GSM8K dataset from the HuggingFace repository:

Download train-00000-of-00001.parquet
Download test-00000-of-00001.parquet

Place both files in your working directory.

GSM8K is a dataset of grade school math word problems. Each task has a question and an integer answer.

Step 3: Create Your Environment

Create a file gsm8k_env.py:

from openreward.environments import Environment, Server, tool
from openreward.environments.types import ToolOutput, TextBlock
from pydantic import BaseModel
import pandas as pd

# Load GSM8K tasks from parquet files
train_tasks = pd.read_parquet("train-00000-of-00001.parquet").to_dict(orient="records")
test_tasks = pd.read_parquet("test-00000-of-00001.parquet").to_dict(orient="records")

# Add IDs to tasks
for i, task in enumerate(train_tasks):
    task['id'] = str(i)
for i, task in enumerate(test_tasks):
    task['id'] = str(i)

# Tool parameter schema (must be defined before GSM8KEnvironment)
class SubmitParams(BaseModel):
    answer: str

class GSM8KEnvironment(Environment):
    """GSM8K math problem environment"""

    @classmethod
    def list_splits(cls):
        return ["train", "test"]

    @classmethod
    def list_tasks(cls, split: str):
        if split == "train":
            return train_tasks
        elif split == "test":
            return test_tasks
        raise ValueError(f"Unknown split: {split}")

    def get_prompt(self):
        question = self.task_spec["question"]
        return [TextBlock(text=question)]

    @tool
    def submit(self, params: SubmitParams) -> ToolOutput:
        """Submit your answer to the math problem"""
        # Extract the final answer from GSM8K format (after ####)
        gold_answer = self.task_spec["answer"].split("####")[-1].strip()
        user_answer = str(params.answer).strip()

        if user_answer == gold_answer:
            return ToolOutput(
                blocks=[TextBlock(text="Correct!")],
                reward=1.0,
                finished=True
            )
        else:
            return ToolOutput(
                blocks=[TextBlock(text=f"Incorrect. The answer was {gold_answer}.")],
                reward=0.0,
                finished=True
            )

# Create and run server
if __name__ == "__main__":
    server = Server([GSM8KEnvironment])
    server.run(port=8080)

What this code does:

Loads GSM8K tasks from the parquet files
Defines an ORS environment with math tasks
Implements list_splits(), list_tasks(), and get_prompt()
Creates a submit tool that checks answers and returns rewards
Starts an HTTP server on port 8080

Step 4: Run the Server

python gsm8k_env.py

You should see:

INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)

Your ORS server is now running!

Step 5: Test with HTTP

Let’s test the server with curl. Open a new terminal:

List environments

curl http://localhost:8080/list_environments

Response:

["gsm8kenvironment"]

List tools

curl http://localhost:8080/gsm8kenvironment/tools

Response:

{
  "tools": [
    {
      "name": "submit",
      "description": "Submit your answer to the math problem",
      "input_schema": {...}
    }
  ]
}

List splits

curl http://localhost:8080/gsm8kenvironment/splits

Response:

[
  {"name": "train", "type": "train"},
  {"name": "test", "type": "test"}
]

List tasks

curl -X POST http://localhost:8080/gsm8kenvironment/tasks \
  -H "Content-Type: application/json" \
  -d '{"split": "train"}'

Response (first 2 tasks shown):

{
  "tasks": [
    {
      "id": "0",
      "question": "Natalia sold clips to 48 of her friends in April...",
      "answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\n...#### 72"
    },
    {
      "id": "1",
      "question": "Weng earns $12 an hour for babysitting...",
      "answer": "...#### 20"
    }
  ],
  "env_name": "gsm8kenvironment"
}

The full dataset contains 7,473 training tasks and 1,319 test tasks.

Step 6: Run an Episode

Now let’s run a complete episode (session):

Create session ID

curl -X POST http://localhost:8080/create_session

Response:

{"sid": "abc-123-def-456"}

Save this session ID for the next steps.

Create episode

Use a task from the dataset:

curl -X POST http://localhost:8080/create \
  -H "X-Session-ID: abc-123-def-456" \
  -H "Content-Type: application/json" \
  -d '{
    "env_name": "gsm8kenvironment",
    "task_spec": {
      "id": "0",
      "question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
      "answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72"
    },
    "secrets": {}
  }'

Response:

{"sid": "abc-123-def-456"}

Get prompt

curl http://localhost:8080/gsm8kenvironment/prompt \
  -H "X-Session-ID: abc-123-def-456"

Response:

[
  {
    "text": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
    "detail": null,
    "type": "text"
  }
]

Call submit tool

curl -N -X POST http://localhost:8080/gsm8kenvironment/call \
  -H "X-Session-ID: abc-123-def-456" \
  -H "Accept: text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{"name": "submit", "input": {"answer": "72"}}'

Response (SSE stream):

event: task_id
data: 877bb56c594e4a0f921ad55c439a3762

event: end
data: {"ok":true,"output":{"blocks":[{"text":"Correct!","detail":null,"type":"text"}],"metadata":null,"reward":1.0,"finished":true}}

Success! The agent got reward 1.0 and finished: true.

Cleanup

curl -X POST http://localhost:8080/delete \
  -H "X-Session-ID: abc-123-def-456"

Step 7: Test with Python Client

Create test_client.py:

from openreward import OpenReward

# Connect to local server
client = OpenReward()
env = client.environments.get(
    name="gsm8kenvironment",
    base_url="http://localhost:8080"
)

# Get tasks
tasks = env.list_tasks(split="train")
print(f"Found {len(tasks)} training tasks")

# Run an episode
task = tasks[0]  # First task from GSM8K

with env.session(task=task) as session:
    # Get prompt
    prompt = session.get_prompt()
    print(f"Question: {prompt[0].text[:80]}...")  # Show first 80 chars

    # Submit answer (the correct answer is 72)
    result = session.call_tool("submit", {"answer": "72"})

    print(f"Result: {result.blocks[0].text}")
    print(f"Reward: {result.reward}")
    print(f"Finished: {result.finished}")

Run it:

python test_client.py

Output:

Found 7473 training tasks
Question: Natalia sold clips to 48 of her friends in April, and then she sold half...
Result: Correct!
Reward: 1.0
Finished: True

Understanding the Code

Key Components

1. Environment Class

class GSM8KEnvironment(Environment):

Inherits from Environment base class, which handles HTTP protocol details. 2. Splits and Tasks

@classmethod
def list_splits(cls):
    return ["train", "test"]

@classmethod
def list_tasks(cls, split: str):
    return [...]  # Task list

Organize problems into train/test sets. 3. Prompt Generation

def get_prompt(self):
    return [TextBlock(text=f"Solve: {self.task_spec['question']}")]

Convert task into initial agent prompt. 4. Tools

@tool
def submit(self, params: SubmitParams) -> ToolOutput:
    # Check answer, return reward and finished signal

Actions agents can take. Return ToolOutput with reward and finished flag. 5. Tool Output

ToolOutput(
    blocks=[TextBlock(text="Correct!")],
    reward=1.0,  # RL feedback signal
    finished=True  # Episode termination
)

Structured response with content, reward, and termination signal.

What You’ve Learned

How to implement an ORS server using the Python SDK Core ORS concepts: splits, tasks, tools, prompts, rewards How sessions (episodes) work The HTTP API for ORS How to test an ORS server locally

Next Steps

Add More Tools

Add bash, calculator, or other tools to your environment

Design Rewards

Learn reward design patterns for RL

Implementation Guide

Deep dive into building ORS servers

Specification

Understand the complete ORS protocol

Common Issues

”ModuleNotFoundError: No module named ‘openreward’”

Install the SDK:

pip install openreward

“404 Environment not found”

Check the environment name matches the class name (lowercase): -Class: GSM8KEnvironment -Name: gsm8kenvironment

”Connection refused”

Make sure the server is running:

python gsm8k_env.py

“Session not found”

Create a new session ID:

curl -X POST http://localhost:8080/create_session

Congratulations! You’ve built your first ORS server. You now understand the core concepts and can start building more complex environments for RL training and agent evaluation.

Getting Started

Specification

Core Concepts

Implementation Guides

Comparison

Quick Start

Quick Start

What You’ll Build

Prerequisites

Step 1: Install Dependencies

Step 2: Download GSM8K Data

Step 3: Create Your Environment

Step 4: Run the Server

Step 5: Test with HTTP

List environments

List tools

List splits

List tasks

Step 6: Run an Episode

Create session ID

Create episode

Get prompt

Call submit tool

Cleanup

Step 7: Test with Python Client

Understanding the Code

Key Components

What You’ve Learned

Next Steps

Add More Tools

Design Rewards

Implementation Guide

Specification

Common Issues

”ModuleNotFoundError: No module named ‘openreward’”

“404 Environment not found”

”Connection refused”

“Session not found”

Getting Started

Specification

Core Concepts

Implementation Guides

Comparison

​Quick Start

​What You’ll Build

​Prerequisites

​Step 1: Install Dependencies

​Step 2: Download GSM8K Data

​Step 3: Create Your Environment

​Step 4: Run the Server

​Step 5: Test with HTTP

​List environments

​List tools

​List splits

​List tasks

​Step 6: Run an Episode

​Create session ID

​Create episode

​Get prompt

​Call submit tool

​Cleanup

​Step 7: Test with Python Client

​Understanding the Code

​Key Components

​What You’ve Learned

​Next Steps

Add More Tools

Design Rewards

Implementation Guide

Specification

​Common Issues

​”ModuleNotFoundError: No module named ‘openreward’”

​“404 Environment not found”

​”Connection refused”

​“Session not found”

Quick Start

What You’ll Build

Prerequisites

Step 1: Install Dependencies

Step 2: Download GSM8K Data

Step 3: Create Your Environment

Step 4: Run the Server

Step 5: Test with HTTP

List environments

List tools

List splits

List tasks

Step 6: Run an Episode

Create session ID

Create episode

Get prompt

Call submit tool

Cleanup

Step 7: Test with Python Client

Understanding the Code

Key Components

What You’ve Learned

Next Steps

Common Issues

”ModuleNotFoundError: No module named ‘openreward’”

“404 Environment not found”

”Connection refused”

“Session not found”