Skyvern: Automating Browser-Based Workflows with AI
Skyvern is an innovative open-source project designed to revolutionize browser automation by leveraging Large Language Models (LLMs) and computer vision technologies. Unlike traditional automation tools that rely on rigid scripting with DOM parsing and XPath selectors—which often break due to minor website changes—Skyvern uses vision-enabled LLMs to dynamically understand and interact with web interfaces. This approach allows it to handle unseen websites, adapt to layout variations, and apply workflows across diverse sites without custom code.
Core Functionality
At its heart, Skyvern operates as a swarm of AI agents inspired by task-driven autonomous systems like BabyAGI and AutoGPT, but enhanced with browser automation capabilities via libraries such as Playwright. The system comprehends web pages visually, plans actions, and executes them to achieve user-defined goals. Key advantages include:
- Adaptability: Operates on novel websites by mapping visual elements to required actions.
- Resilience: No predefined selectors, making it robust against UI changes.
- Scalability: Applies a single workflow to multiple websites through reasoned interactions.
- Complex Reasoning: Handles nuanced scenarios, such as inferring eligibility from user data in insurance quotes or equating similar products despite minor discrepancies in descriptions.
A detailed technical report on Skyvern 2.0 highlights its state-of-the-art performance, achieving 85.8% on the WebVoyager evaluation.
Performance and Evaluation
Skyvern excels on benchmarks like WebBench, attaining 64.4% accuracy overall and leading in WRITE tasks (e.g., form filling, logins, file downloads)—crucial for Robotic Process Automation (RPA). It outperforms competitors in practical scenarios, making it ideal for enterprise automation.
Key Features
- Tasks and Workflows: Define single tasks with URLs, prompts, and optional data schemas for structured outputs. Chain tasks into workflows supporting loops, validations, file parsing, email sending, HTTP requests, and custom code blocks.
- Livestreaming: Real-time viewport streaming for debugging and intervention.
- Form Filling and Data Extraction: Natively fills forms using provided context and extracts data per JSON schemas.
- File Handling: Downloads files and uploads them to block storage.
- Authentication: Supports logins with 2FA (TOTP, email, SMS), password managers (Bitwarden, 1Password, LastPass), and secure credential management.
- Integrations: Model Context Protocol (MCP) for custom LLMs; Zapier, Make.com, N8N for no-code workflows.
- Supported LLMs: OpenAI (GPT-4o, etc.), Anthropic (Claude 3.5), Azure OpenAI, AWS Bedrock, Gemini, Ollama, OpenRouter, and OpenAI-compatible endpoints.
Real-World Applications
Skyvern powers diverse automations:
- Downloading invoices from various portals.
- Automating job applications by navigating career sites.
- Procuring materials for manufacturing via supplier searches.
- Registering on government sites and filling forms.
- Submitting contact forms at scale.
- Retrieving insurance quotes in multiple languages.
Demos showcase its efficacy, such as quoting from Geico or BCI Seguros.
Installation and Usage
Quickstart
Requires Python 3.11+, NodeJS, and optionally Rust for Windows.
- Install:
pip install skyvern - Setup:
skyvern quickstart - Run:
skyvern run allfor service and UI at http://localhost:8080.
Code Example
from skyvern import Skyvern
skyvern = Skyvern()
task = await skyvern.run_task(prompt="Find the top post on hackernews today")
print(task)Advanced options include custom browsers (Chrome CDP), remote connections, consistent schemas, and Docker Compose for deployment.
Cloud Option
Skyvern Cloud offers managed hosting with anti-bot measures, proxies, and CAPTCHA solving for production use.
Roadmap and Community
Recent milestones include open-sourcing, workflow support, UI builders, and caching. Upcoming: Debug modes, Chrome extensions, action recorders, and observability integrations. Contributions are welcome via PRs; join Discord for support.
Licensed under AGPL-3.0, with telemetry opt-out available. For enterprise needs, contact the Skyvern AI team.
