This example shows how to integrate Steel with OpenAI's Computer Use Assistant (CUA) API to create a browser automation agent. The assistant sees the browser through Steel's cloud sessions, analyzes the screen, and performs actions like clicking, typing, and navigating.
- A Steel API key — Get one here
- An OpenAI API key with access to the Computer Use Assistant preview
Clone this repository and navigate to the project directory:
git clone https://github.com/steel-dev/steel-cookbook
cd steel-cookbook/examples/steel-oai-computer-use-python-starter
# Create and activate virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
- Create a
.env
file in the project directory by copying the example:
cp .env.example .env
- Edit the
.env
file and add your API keys:
STEEL_API_KEY=your_steel_api_key_here
OPENAI_API_KEY=your_openai_api_key_here
This example demonstrates:
- Creating a Steel browser session - Launches a remote browser in the cloud
- Connecting with Playwright - Establishes a direct connection to control the browser
- Integrating with OpenAI's Computer Use Assistant - Sends screenshots to OpenAI and receives actions to execute
- Action execution - Translates OpenAI's commands into browser actions (click, type, scroll, etc.)
- Continuous interaction loop - Maintains a cycle of screenshots and actions until the task is complete
Execute the main script:
python main.py
You'll be prompted to enter a task for the assistant to perform. Examples:
- "Search for Steel browser on Bing and tell me about it"
- "Find today's weather for New York City"
- "Go to Wikipedia and find information about machine learning"
The script will:
- Create a Steel session (you'll see a URL where you can watch the session live)
- Send the initial screenshot to OpenAI
- Execute the commands received from OpenAI
- Send updated screenshots after each action
- Continue this loop until the task is complete
A wrapper around the Steel session and Playwright browser that provides methods for:
- Creating and managing a browser session
- Taking screenshots
- Executing various browser actions (click, type, scroll, etc.)
The script connects to OpenAI's Computer Use Assistant API to:
- Send browser screenshots
- Receive actions to execute
- Process text responses from the assistant
You can modify the example to:
- Change the initial URL (currently Bing.com)
- Adjust the browser dimensions
- Add more action types
- Implement additional error handling
- Customize the UI/UX of the interaction