openreward

Python SDK for the OpenReward platform.

145 个版本 Python >=3.11
安装
pip install openreward
poetry add openreward
pipenv install openreward
conda install openreward
描述

OpenReward Python SDK

PyPI version Docs

The official Python SDK for OpenReward — a platform for building, hosting, and training on RL environments for language models.

The SDK has two complementary roles:

  • Build environments — define evaluation tasks, expose tools, and serve them via a standards-compliant API that can be deployed on the OpenReward platform.
  • Train agents — connect to any environment (local or hosted), run agent loops, and log rollouts with rewards back to OpenReward.

Installation

pip install openreward

For environments that process documents (PDF, DOCX, Excel, PowerPoint):

pip install "openreward[tools]"

Requires Python 3.11+.

Core concepts

Environment

An Environment subclass defines a benchmark or task distribution. Implement three required methods:

Method Purpose
list_splits() Return split names, e.g. ["train", "test"]
list_tasks(split) Return a deterministically ordered list of task dicts
get_prompt() Return the task instructions as a list of TextBlock / ImageBlock

Actions are defined as async methods decorated with @tool. Each tool receives a Pydantic model as input and returns a ToolOutput.

ToolOutput

Every tool returns a ToolOutput containing:

  • blocks — a list of TextBlock or ImageBlock results
  • reward — optional float reward signal
  • finished — whether the episode is complete
  • metadata — optional arbitrary metadata

Server

Server wraps one or more Environment classes in a FastAPI app and exposes the Open Reward Standard API over HTTP with SSE streaming.

Key endpoints:

Endpoint Description
POST /create Spawn a new environment session
POST /{env}/call Execute a tool (streamed via SSE)
GET /{env}/prompt Get the current task prompt
GET /{env}/tools List available tools
POST /{env}/tasks List all tasks for a split

Sandboxes

Environments that need isolated compute (e.g. code execution) can spin up Docker containers via the sandbox API using SandboxSettings. Containers are managed automatically — started in setup() and torn down in teardown().

Overriding session env vars

When calling a hosted environment, pass env_overrides to Environment.session(...) to point the session at a custom inference backend. Owners of the environment may override any env var; other callers are restricted to OPENAI_BASE_URL, OPENAI_API_KEY, ANTHROPIC_BASE_URL, ANTHROPIC_API_KEY — any other key returns a 400.

env = client.environments.get("GeneralReasoning/counter")
session = env.session(
    env_overrides={"OPENAI_BASE_URL": "https://my-vllm.example.com/v1"},
)

Toolsets

Group reusable tools into Toolset classes and compose them across environments via the toolsets class attribute.

Rollout logging

Log agent trajectories with reward signals back to OpenReward for analysis and training. The client's rollout API supports normalized message types as well as raw outputs from Anthropic, OpenAI, and Google GenAI SDKs.

CLI

The orwd CLI helps you scaffold and create environments.

Scaffold a new environment locally

# Minimal environment
orwd init my-env

# Environment with a Docker sandbox for code execution
orwd init my-env --template sandbox

Create an environment on OpenReward

Registers a new environment under your account (requires OPENREWARD_API_KEY):

orwd create my-env --description "A short description of my environment"

By default the environment is created under your personal namespace. To create it under an organisation you are a member of, pass --namespace:

orwd create my-env --description "A short description" --namespace my-org

Pass --private to make the environment private:

orwd create my-env --description "A short description" --private

Deploying to OpenReward

  1. Push your environment to a GitHub repository.
  2. Connect the repository in the OpenReward dashboard.
  3. Configure compute resources (CPU, memory, scaling).
  4. Every push to the connected branch triggers an automatic build and deployment.

Your environment is then accessible to any agent via the OpenReward API using the username/environment-name namespace.

Environment variables

Variable Description
OPENREWARD_API_KEY API key for authentication
OPENREWARD_URL Override base URL (default: https://openreward.ai)
OPENREWARD_USE_STRUCTURED_LOGS Set to 1 for JSON logging (recommended in production)
OPENREWARD_ROLLOUT_LOGGING_FORMAT pretty or structured for rollout log output

Documentation

Full documentation, guides, and examples are at docs.openreward.ai.

License

Apache 2.0

版本列表
0.1.134 2026-06-18
0.1.133 2026-06-17
0.1.132 2026-06-16
0.1.131 2026-06-15
0.1.130 2026-06-15
0.1.129 2026-06-15
0.1.128 2026-06-15
0.1.127 2026-06-12
0.1.127.dev0 2026-06-12
0.1.126 2026-06-04
0.1.125 2026-05-21
0.1.124 2026-05-20
0.1.123 2026-05-19
0.1.123.dev1 2026-05-19
0.1.122 2026-05-19
0.1.122.dev1 2026-05-19
0.1.121 2026-05-18
0.1.121.dev0 2026-05-14
0.1.120 2026-05-11
0.1.119 2026-05-11
0.1.118 2026-05-10
0.1.117 2026-05-10
0.1.116 2026-05-10
0.1.115 2026-05-08
0.1.115.dev1 2026-05-08
0.1.114 2026-05-08
0.1.114.dev1 2026-05-08
0.1.113 2026-05-08
0.1.112 2026-05-05
0.1.111 2026-05-05
0.1.110 2026-05-05
0.1.109 2026-04-28
0.1.108 2026-04-28
0.1.107 2026-04-28
0.1.106 2026-04-24
0.1.105 2026-04-23
0.1.104 2026-04-22
0.1.103 2026-04-22
0.1.102 2026-04-22
0.1.101 2026-04-22
0.1.101.dev2 2026-04-22
0.1.100 2026-04-22
0.1.99 2026-04-21
0.1.98 2026-04-20
0.1.97 2026-04-15
0.1.96 2026-04-15
0.1.96.dev2 2026-04-15
0.1.96.dev1 2026-04-14
0.1.96.dev0 2026-04-14
0.1.95 2026-04-14
0.1.95.dev0 2026-04-14
0.1.94 2026-04-12
0.1.93 2026-04-10
0.1.93.dev0 2026-04-09
0.1.92 2026-04-08
0.1.91 2026-04-07
0.1.90 2026-04-03
0.1.89 2026-04-01
0.1.89.dev1 2026-04-03
0.1.88 2026-03-31
0.1.87 2026-03-30
0.1.86 2026-03-23
0.1.85 2026-03-23
0.1.84 2026-03-23
0.1.83 2026-03-22
0.1.82 2026-03-22
0.1.81 2026-03-20
0.1.80 2026-03-20
0.1.79 2026-03-19
0.1.78 2026-03-19
0.1.77 2026-03-19
0.1.76 2026-03-19
0.1.75 2026-03-18
0.1.74 2026-03-18
0.1.73 2026-03-18
0.1.72 2026-03-18
0.1.71 2026-03-17
0.1.70 2026-03-17
0.1.69 2026-03-17
0.1.68 2026-03-17
0.1.67 2026-03-17
0.1.66 2026-03-17
0.1.65 2026-03-17
0.1.64 2026-03-16
0.1.63 2026-03-16
0.1.62 2026-03-16
0.1.61 2026-03-16
0.1.60 2026-03-13
0.1.59 2026-03-12
0.1.58 2026-03-12
0.1.57 2026-03-12
0.1.56 2026-03-11
0.1.56.dev0 2026-03-11
0.1.55 2026-03-11
0.1.54 2026-03-11
0.1.53 2026-03-11
0.1.53.dev0 2026-03-11
0.1.52 2026-03-11
0.1.51 2026-03-11
0.1.50 2026-03-09
0.1.49 2026-03-07
0.1.48 2026-03-06
0.1.47 2026-03-06
0.1.46 2026-03-05
0.1.45 2026-03-05
0.1.44 2026-03-05
0.1.43 2026-03-05
0.1.42 2026-03-05
0.1.41 2026-03-05
0.1.40 2026-03-05
0.1.39 2026-03-05
0.1.38 2026-03-04
0.1.37 2026-03-04
0.1.36 2026-03-03
0.1.35 2026-03-03
0.1.34 2026-03-03
0.1.33 2026-03-02
0.1.32 2026-03-01
0.1.31 2026-02-28
0.1.30 2026-02-27
0.1.29 2026-02-26
0.1.28 2026-02-26
0.1.27 2026-02-23
0.1.26 2026-02-20
0.1.25 2026-02-19
0.1.22 2026-02-02
0.1.21 2026-02-01
0.1.19 2026-01-21
0.1.18 2026-01-18
0.1.17 2026-01-18
0.1.16 2026-01-17
0.1.14 2026-01-16
0.1.13 2026-01-15
0.1.11 2026-01-11
0.1.10 2026-01-10
0.1.9 2026-01-09
0.1.7 2026-01-01
0.1.6 2025-12-19
0.1.5 2025-12-18
0.1.4 2025-12-18
0.1.3 2025-12-08
0.1.2 2025-12-08
0.1.1 2025-12-08
0.1.0 2025-12-08
0.0.1 2025-08-11