Skip to main content
Tasks and splits are how ORS organises problems for training and evaluation. Tasks are the individual problems agents solve, while splits categorize these tasks into organised units - for example train/test splits, or splits for different types of problem (e.g. those which requires CPUs versus GPUs).

Tasks

What is a Task?

A task is a specific problem for an agent to solve. Each task is represented as a JSON object with task-specific data.

Task Examples

Math environment:
{
  "question": "If x + 5 = 12, what is x?",
  "answer": "7",
  "difficulty": "easy"
}
Coding environment:
{
  "problem_id": "reverse_string",
  "description": "Write a function to reverse a string",
  "test_cases": [
    {"input": "hello", "output": "olleh"},
    {"input": "world", "output": "dlrow"}
  ],
  "time_limit_seconds": 5
}
Web navigation:
{
  "task_id": "find_price",
  "goal": "Find the price of iPhone 15",
  "start_url": "https://example.com",
  "success_criteria": "Price found and extracted correctly"
}

Task Lifecycle

1. Environment defines tasks
2. Tasks organized into splits (e.g. train/test)
3. Agent requests tasks from a split
4. For each task:
   a. Create episode with task
   b. Get prompt (derived from task)
   c. Solve task via tool calls and receive rewards
   d. Receive finished signal
   e. Cleanup episode

Accessing Tasks

The simplest way to retrieve tasks is to list all tasks in a split:
POST /math/tasks
Content-Type: application/json

{
  "split": "train"
}
Response:
{
  "tasks": [
    {"question": "What is 2+2?", "answer": "4"},
    {"question": "If x + 5 = 12, what is x?", "answer": "7"},
    ...
  ],
  "env_name": "math"
}
For large datasets, loading all tasks at once is wasteful. Use these endpoints for efficient access: Count tasksPOST /{env_name}/num_tasks:
POST /math/num_tasks
{"split": "train"}
Returns the number of tasks in a split. Get a single taskPOST /{env_name}/get_task:
POST /math/get_task
{"split": "train", "index": 42}
Returns the task at a specific index. Get a range of tasksPOST /{env_name}/get_task_range:
POST /math/get_task_range
{"split": "train", "start": 0, "stop": 100}
Returns tasks in the range [start, stop). Both start and stop are optional.
The server validates split names on all task endpoints and returns 400 for invalid splits.

Task as Episode Input

Tasks are passed when creating episodes. You can provide the task inline via task_spec, or reference it by split and index:
POST /create
X-Session-ID: abc-123

{
  "env_name": "math",
  "task_spec": {
    "question": "What is 2+2?",
    "answer": "4"
  },
  "secrets": {}
}
Or load directly from a split (the server resolves the task):
POST /create
X-Session-ID: abc-123

{
  "split": "train",
  "index": 0,
  "secrets": {}
}
All fields are optional: env_name defaults to the first registered environment, secrets defaults to {}. Exactly one of task_spec or split+index must be provided (see CreateSession). The environment uses the task to:
  • Generate the initial prompt
  • Determine correct answers
  • Calculate rewards
  • Track episode progress

Splits

What is a split?

A split is a named category of tasks. Splits organise tasks for different purposes in ML workflows. An example split structure:
  • train - Tasks for training
  • validation - Tasks for hyperparameter tuning
  • test - Tasks for evaluation

Split Structure

interface Split {
  name: string  // Split identifier
  type: "train" | "validation" | "test"  // Category
}
Examples:
[
  {"name": "train", "type": "train"},
  {"name": "validation", "type": "validation"},
  {"name": "test", "type": "test"}
]

Accessing Splits

List available splits:
GET /math/splits
Response:
[
  {"name": "train", "type": "train"},
  {"name": "test", "type": "test"}
]
Then request tasks from a specific split:
POST /math/tasks
{"split": "train"}

Custom Splits

Environments can define custom splits beyond train/validation/test:
[
  {"name": "easy", "type": "train"},
  {"name": "medium", "type": "train"},
  {"name": "hard", "type": "test"},
  {"name": "expert", "type": "test"}
]
Use cases:
  • Difficulty-based splits (easy/medium/hard)
  • Domain-specific splits (algebra/geometry/calculus)
  • Time-based splits (before_2020/after_2020)
  • Resource-based splits (CPU/GPU sandboxes)
Type defaults: Environments can return splits as either Split objects or bare strings. The server normalises bare strings: "train", "validation", and "test" map to their corresponding type, while any other name defaults to "type": "validation". Convention: When using Split objects explicitly, map to standard types:
  • Training-related → "type": "train"
  • Evaluation-related → "type": "test"
  • Tuning-related → "type": "validation"

Next Steps

Tools

Design tools for solving tasks

Rewards

Create reward signals for tasks

Implementing a Server

Build an ORS server with tasks

HTTP API

See how tasks are accessed via API

Key Takeaway: Tasks are the problems agents solve. Splits organize tasks for proper ML workflows. Design task structures that are clear, validated, and organized into train/test splits to enable both learning and fair evaluation.