Use this file to discover all available pages before exploring further.
A reward signal defines the goal in a reinforcement learning problem. Typically this is a single number (scalar feedback) that tells an agent how good or bad an outcome is in an immediate sense.In ORS, rewards are obtained by executing tools and are returned as part of a ToolOutput. They can be used for training in reinforcement learning, or for agentic evaluation - for example, binary reward
can be used for computing accuracy.
def calculate_progress_reward(self, state) -> float: """Reward based on how close to solution""" # Example: Math problem solving # Check if agent has seen the problem if not state["viewed_problem"]: return 0.0 # Check if agent has attempted calculation if state["attempted_calculation"]: reward = 0.3 # Check if calculation is close to answer if state["last_guess"]: error = abs(state["last_guess"] - self.task["answer"]) if error < 5: reward = 0.7 # Getting close! if error < 1: reward = 0.9 # Very close! return reward return 0.1 # Viewed problem
Pros:
Strong learning signal
Guides exploration
Accelerates training
Cons:
Requires domain knowledge
Can create reward hacking opportunities
Complex to implement
Best for: Well-understood domains, complex navigation
Rewards in ORS can be used for evaluation as well as training:
# Run evaluationtotal_rewards = []for task in test_tasks: session = create_session(task) episode_reward = 0.0 while not finished: result = agent.step(session) episode_reward += result.reward or 0.0 finished = result.finished total_rewards.append(episode_reward)# Evaluation metricsavg_reward = sum(total_rewards) / len(total_rewards)success_rate = sum(1 for r in total_rewards if r > 0.9) / len(total_rewards)print(f"Average reward: {avg_reward:.3f}")print(f"Success rate: {success_rate:.1%}")
Understand episode lifecycle and reward accumulation
Implementing a Server
Build an ORS server with reward logic
Key Takeaway: Rewards are the learning signal for RL. Design them carefully to align with your true objective, provide timely feedback, and avoid unintended behaviors. Good reward design is critical for successful RL training.