Skip to main content
An episode is a complete run of experience from start to termination. In ORS, a session with a server is equivalent to an episode if the agent terminates the session after receiving a finished signal.

Sessions as Episodes

In ORS an episode
  • Starts with a specific task
  • Continues through multiple tool calls
  • Ends when finished: true is received from a ToolOutput.

RL Episode Terminology

RL TermORS TermDescription
EpisodeSessionOne complete trajectory
StateEnvironment instanceFull internal state on the server
ObservationBlocks (prompt + tool outputs)Partial view of state returned to the agent
ActionTool callAgent action
RewardToolOutput.rewardFeedback signal
Terminal statefinished: trueEpisode complete

Episode Lifecycle

Complete Flow

Episode Lifecycle - 5 steps from session creation to cleanup

States in Detail

1. Session ID Generation

POST /create_session
Purpose: Generate a unique identifier for this episode. Response:
{"sid": "abc-123-def-456"}
Note: This just creates an ID. No environment is instantiated yet.

2. Episode Initialization

POST /create
X-Session-ID: abc-123-def-456
Content-Type: application/json

{
  "env_name": "math",
  "task_spec": {"question": "What is 2+2?"},
  "secrets": {"api_key": "sk-..."}
}
All body fields are optional. env_name defaults to the first registered environment. Either task_spec or both split+index must be provided (see CreateSession). What happens:
  1. Server resolves env_name (or defaults to first environment) and task_spec (or loads from split/index)
  2. Instantiates the environment class with task_spec and secrets
  3. Calls environment.setup() (async)
  4. Marks session as “ready” when setup completes
Blocking: Subsequent requests wait for setup to complete before proceeding.

3. Initial Observation

GET /math/prompt
X-Session-ID: abc-123-def-456
Purpose: Get the initial observation (o₀) for the episode. Response:
[
  {"text": "What is 2+2?", "detail": null, "type": "text"}
]
RL Interpretation: This is the initial observation that the agent uses to select its first action.

4. Action-Observation Loop

POST /math/call
X-Session-ID: abc-123-def-456

{"name": "submit", "input": {"answer": "4"}}
Response (SSE):
{
  "ok": true,
  "output": {
    "blocks": [{"text": "Correct!", "detail": null, "type": "text"}],
    "metadata": null,
    "reward": 1.0,
    "finished": true
  }
}
What happens:
  1. Agent takes action (calls tool)
  2. Environment executes action
  3. Environment returns next state (blocks), reward, and termination flag
  4. If finished: false, repeat from step 1
  5. If finished: true, episode is complete
RL Interpretation: This is the core RL loop:
  • Action: Tool call
  • Observation: Blocks
  • Reward: Reward signal
  • Terminal: Finished flag

5. Episode Termination

POST /delete
X-Session-ID: abc-123-def-456
Purpose: Clean up episode resources. What happens:
  1. Calls environment.teardown()
  2. Removes session from active sessions
  3. Frees memory and resources
Important: Always call /delete when done, even if episode finished naturally.

Episode Termination

The finished Signal

The finished field in ToolOutput is critical:
interface ToolOutput {
  blocks: Blocks
  metadata?: JSONObject
  reward?: number
  finished?: boolean  // ← Episode termination signal (default: false)
}
When finished: true:
  • Episode is complete
  • Agent should stop calling tools
  • Agent should call /delete to cleanup
  • Task succeeded or failed (check reward or blocks for details)
When finished: false:
  • Episode continues
  • Agent should take another action
  • State may have changed (reflected in blocks)

Termination Patterns

Pattern 1: Immediate Termination

Task completes in one step:
# Single action episode
result = session.call_tool("submit", {"answer": "42"})
assert result.finished == True

Pattern 2: Multi-Step Termination

Task requires multiple actions:
# Multi-step episode
result1 = session.call_tool("bash", {"command": "cat file.txt"})
assert result1.finished == False  # Continue

result2 = session.call_tool("submit", {"answer": "Paris"})
assert result2.finished == True  # Complete

Pattern 3: Failure Termination

Task fails (but episode still terminates):
result = session.call_tool("submit", {"answer": 999})
assert result.finished == True
assert result.reward == 0.0  # Failed

State Management

What’s Preserved in a Session?

Environment state:
  • Instance variables in environment class
  • Files created during episode (if environment has filesystem or persistent sandbox)
  • Any side effects from tool executions
Example:
# State persists across tool calls
session.call_tool("bash", {"command": "export VAR=hello"})
result = session.call_tool("bash", {"command": "echo $VAR"})
# → "hello" (state preserved)

What’s NOT Preserved?

Across episodes:
  • Each session is independent at the protocol level
  • Session 1 and Session 2 have separate environment instances
  • No shared instance state between sessions (though implementations may share class-level or cached state)
After timeout:
  • 15 minutes of inactivity → session deleted
  • State is lost
  • Must create new session
After finished: true:
  • Episode data is final
  • Further tool calls should not be made
  • Call /delete for cleanup

Session Timeout

Sessions automatically expire after 15 minutes of inactivity.

Inactivity Definition

“Inactivity” means no requests with that session’s X-Session-ID:
  • /ping resets timer
  • /{env_name}/call resets timer
  • /{env_name}/prompt resets timer
  • Any request with X-Session-ID resets the timer (except /delete, which removes the session)

Keeping Sessions Alive

For long-running episodes, periodically call /ping:
import threading
import time

def keep_alive(session_id):
    while True:
        requests.post(
            "http://server/ping",
            headers={"X-Session-ID": session_id}
        )
        time.sleep(300)  # Every 5 minutes

# Start background thread
threading.Thread(target=keep_alive, args=(session_id,), daemon=True).start()

Timeout Cleanup

When a session times out:
  1. Server calls environment.teardown()
  2. Session removed from active sessions
  3. Subsequent requests with that session ID → 404 Not Found

Session Best Practices

1. Always Delete Sessions

# Good - cleanup
session_id = create_session()
try:
    # ... episode logic
    pass
finally:
    delete_session(session_id)
# Bad - resource leak
session_id = create_session()
# ... episode logic
# Forgot to delete!

2. Check finished Flag

# Good - respect termination
result = session.call_tool("submit", {"answer": "42"})
if result.finished:
    delete_session(session_id)
else:
    # Continue episode
    pass
# Bad - ignore termination
result = session.call_tool("submit", {"answer": "42"})
# Continue calling tools even if finished=True

3. Handle Errors Gracefully

# Good - cleanup on error
try:
    result = session.call_tool("bash", {"command": "rm -rf /"})
except Exception as e:
    delete_session(session_id)
    raise

4. Use Context Managers

# Best - automatic cleanup
with session_manager.session(task=task) as session:
    result = session.call_tool("submit", {"answer": "42"})
    # Automatically deleted when exiting context

Debugging Sessions

Common Issues

Issue: “404 Session not found”
  • Cause: Session timed out or was deleted
  • Fix: Check that episode completes within 15 minutes or use /ping
Issue: “Session already exists”
  • Cause: Trying to create episode with already-used session ID
  • Fix: Generate new session ID with /create_session
Issue: “Session deleted” (410)
  • Cause: Calling tool after /delete was called
  • Fix: Don’t reuse session IDs after deletion

Next Steps

Rewards Concept

Understand reward signals in episodes

Implementing a Client

Build a client that manages sessions

Key Takeaway: Sessions are RL episodes. They start with a task, continue until finished: true, and should always be cleaned up with /delete.