LmCast :: Stay tuned in

Fara-7B: An efficient agentic model for computer use

Recorded: Nov. 27, 2025, 1:02 a.m.

Original Summarized

GitHub - microsoft/fara

Skip to content

Navigation Menu

Toggle navigation

Sign in

Appearance settings

Platform

GitHub Copilot

Write better code with AI

GitHub Spark


New

Build and deploy intelligent apps

GitHub Models


New

Manage and compare prompts

GitHub Advanced Security

Find and fix vulnerabilities

Actions

Automate any workflow

Codespaces

Instant dev environments

Issues

Plan and track work

Code Review

Manage code changes

Discussions

Collaborate outside of code

Code Search

Find more, search less

Explore

Why GitHub

Documentation

GitHub Skills

Blog

Integrations

GitHub Marketplace

MCP Registry

View all features

Solutions

By company size

Enterprises

Small and medium teams

Startups

Nonprofits

By use case

App Modernization

DevSecOps

DevOps

CI/CD

View all use cases

By industry

Healthcare

Financial services

Manufacturing

Government

View all industries

View all solutions

Resources

Topics

AI

DevOps

Security

Software Development

View all

Explore

Learning Pathways

Events & Webinars

Ebooks & Whitepapers

Customer Stories

Partners

Executive Insights

Open Source

GitHub Sponsors

Fund open source developers

The ReadME Project

GitHub community articles

Repositories

Topics

Trending

Collections

Enterprise

Enterprise platform

AI-powered developer platform

Available add-ons

GitHub Advanced Security

Enterprise-grade security features

Copilot for business

Enterprise-grade AI features

Premium Support

Enterprise-grade 24/7 support

Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback


We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

microsoft

/

fara

Public

Notifications
You must be signed in to change notification settings

Fork
40

Star
486

License

MIT license

486
stars

40
forks

Branches

Tags

Activity

Star

Notifications
You must be signed in to change notification settings

Code

Issues
7

Pull requests
3

Actions

Projects
0

Models

Security

Uh oh!

There was an error while loading. Please reload this page.


Insights

Additional navigation options

Code

Issues

Pull requests

Actions

Projects

Models

Security

Insights

microsoft/fara

 mainBranchesTagsGo to fileCodeOpen more actions menuFolders and filesNameNameLast commit messageLast commit dateLatest commit History15 Commitsautogen @ a9d2927autogen @ a9d2927  endpoint_configsendpoint_configs  figuresfigures  scriptsscripts  src/farasrc/fara  webevalwebeval  .gitattributes.gitattributes  .gitignore.gitignore  .gitmodules.gitmodules  CODE_OF_CONDUCT.mdCODE_OF_CONDUCT.md  LICENSELICENSE  README.mdREADME.md  SECURITY.mdSECURITY.md  TRANSPARENCY_NOTE.mdTRANSPARENCY_NOTE.md  pyproject.tomlpyproject.toml  View all filesRepository files navigationREADMECode of conductMIT licenseSecurity
Fara-7B: An Efficient Agentic Model for Computer Use

Overview
Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, Fara-7B is an ultra-compact Computer Use Agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems.
Try Fara-7B locally as follows (see Installation for detailed instructions):
# 1. Clone repository
git clone https://github.com/microsoft/fara.git
cd fara

# 2. Setup environment
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
playwright install
Then in one process, host the model:
vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
Then you can iterative query it with:
fara-cli --task "whats the weather in new york now"
Hint: might need to do --tensor-parallel-size 2 with vllm command if you run out of memory
What Makes Fara-7B Unique
Unlike traditional chat models that generate text-based responses, Fara-7B leverages computer interfaces—mouse and keyboard—to perform multi-step tasks on behalf of users. The model:

Operates visually by perceiving webpages and taking actions like scrolling, typing, and clicking on directly predicted coordinates
Uses the same modalities as humans to interact with computers—no accessibility trees or separate parsing models required
Enables on-device deployment due to its compact 7B parameter size, resulting in reduced latency and improved privacy as user data remains local
Completes tasks efficiently, averaging only ~16 steps per task compared to ~41 for comparable models

Fara-7B is trained using a novel synthetic data generation pipeline built on the Magentic-One multi-agent framework, with 145K trajectories covering diverse websites, task types, and difficulty levels. The model is based on Qwen2.5-VL-7B and trained with supervised fine-tuning.
Key Capabilities
Fara-7B can automate everyday web tasks including:

Searching for information and summarizing results
Filling out forms and managing accounts
Booking travel, movie tickets, and restaurant reservations
Shopping and comparing prices across retailers
Finding job postings and real estate listings

Performance Highlights
Fara-7B achieves state-of-the-art results across multiple web agent benchmarks, outperforming both comparable-sized models and larger systems:

Model
Params
WebVoyager
Online-M2W
DeepShop
WebTailBench

SoM Agents

SoM Agent (GPT-4o-0513)
-
90.6
57.7
49.1
60.4

SoM Agent (o3-mini)
-
79.3
55.4
49.7
52.7

SoM Agent (GPT-4o)
-
65.1
34.6
16.0
30.8

GLM-4.1V-9B-Thinking
9B
66.8
33.9
32.0
22.4

Computer Use Models

OpenAI computer-use-preview
-
70.9
42.9
24.7
25.7

UI-TARS-1.5-7B
7B
66.4
31.3
11.6
19.5

Fara-7B
7B
73.5
34.1
26.2
38.4

Table: Online agent evaluation results showing success rates (%) across four web benchmarks. Results are averaged over 3 runs.
WebTailBench: A New Benchmark for Real-World Web Tasks
We are releasing WebTailBench, a new evaluation benchmark focusing on 11 real-world task types that are underrepresented or missing in existing benchmarks. The benchmark includes 609 tasks across diverse categories, with the first 8 segments testing single skills or objectives (usually on a single website), and the remaining 3 evaluating more difficult multi-step or cross-site tasks.
WebTailBench Detailed Results

Task Segment
Tasks
SoM GPT-4o-0513
SoM o3-mini
SoM GPT-4o
GLM-4.1V-9B
OAI Comp-Use
UI-TARS-1.5
Fara-7B

Single-Site Tasks

Shopping
56
62.5
71.4
38.1
31.0
42.3
41.1
52.4

Flights
51
60.1
39.2
11.1
10.5
17.6
10.5
37.9

Hotels
52
68.6
56.4
31.4
19.9
26.9
35.3
53.8

Restaurants
52
67.9
59.6
47.4
32.1
35.9
22.4
47.4

Activities
80
70.4
62.9
41.7
26.3
30.4
9.6
36.3

Ticketing
57
58.5
56.7
37.4
35.7
49.7
30.4
38.6

Real Estate
48
34.0
17.4
20.1
16.0
9.0
9.7
23.6

Jobs/Careers
50
49.3
44.0
32.7
22.7
20.7
20.7
28.0

Multi-Step Tasks

Shopping List (2 items)
51
66.0
62.7
17.0
7.8
34.0
20.9
49.0

Comparison Shopping
57
67.3
59.1
27.5
22.8
1.2
8.8
32.7

Compositional Tasks
55
51.5
39.4
26.7
17.0
10.3
9.1
23.0

Overall

Macro Average
609
59.7
51.7
30.1
22.0
25.3
19.9
38.4

Micro Average
609
60.4
52.7
30.8
22.4
25.7
19.5
38.4

Table: Breakdown of WebTailBench results across all 11 segments. Success rates (%) are averaged over 3 independent runs. Fara-7B achieves the highest performance among computer-use models across all task categories.
Coming Soon:

Task Verification pipeline for LLM-as-a-judge evaluation
Official human annotations of WebTailBench (in partnership with BrowserBase)

Evaluation Infrastructure
Our evaluation setup leverages:

Playwright - A cross-browser automation framework that replicates browser environments
Abstract Web Agent Interface - Allows integration of any model from any source into the evaluation environment
Fara-Agent Class - Reference implementation for running the Fara model

Note: Fara-7B is an experimental release designed to invite hands-on exploration and feedback from the community. We recommend running it in a sandboxed environment, monitoring its execution, and avoiding sensitive data or high-risk domains.

Installation
Install the package using either UV or pip:
uv sync --all-extras
or
pip install -e .
Then install Playwright browsers:
playwright install

Hosting the Model
Recommended: The easiest way to get started is using Azure Foundry hosting, which requires no GPU hardware or model downloads. Alternatively, you can self-host with VLLM if you have GPU resources available.
Azure Foundry Hosting (Recommended)
Deploy Fara-7B on Azure Foundry without needing to download weights or manage GPU infrastructure.
Setup:

Deploy the Fara-7B model on Azure Foundry and obtain your endpoint URL and API key
Add your endpoint details to the existing endpoint_configs/ directory (example configs are already provided):

# Edit one of the existing config files or create a new one
# endpoint_configs/fara-7b-hosting-ansrz.json (example format):
{
"model": "Fara-7B",
"base_url": "https://your-endpoint.inference.ml.azure.com/",
"api_key": "YOUR_API_KEY_HERE"
}

Run the Fara agent:

fara-cli --task "how many pages does wikipedia have" --start_page "https://www.bing.com"
That's it! No GPU or model downloads required.
Self-hosting with VLLM
If you have access to GPU resources, you can self-host Fara-7B using VLLM. This requires a GPU machine with sufficient VRAM.
All that is required is to run the following command to start the VLLM server:
vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
Testing the Fara Agent
Run the test script to see Fara in action:
fara-cli --task "how many pages does wikipedia have" --start_page "https://www.bing.com" --endpoint_config endpoint_configs/azure_foundry_config.json [--headful] [--downloads_folder "/path/to/downloads"] [--save_screenshots] [--max_rounds 100] [--browserbase]
In self-hosting scenario the endpoint_config points to endpoint_configs/vllm_config.json from the VLLM server above.
If you set --browserbase, export environment variables for the API key and project ID.
Expected Output
Initializing Browser...
Browser Running... Starting Fara Agent...
##########################################
Task: how many pages does wikipedia have
##########################################
Running Fara...

Thought #1: To find the current number of Wikipedia pages, I'll search for the latest Wikipedia page count statistics.
Action #1: executing tool 'web_search' with arguments {"action": "web_search", "query": "Wikipedia total number of articles"}
Observation#1: I typed 'Wikipedia total number of articles' into the browser search bar.

Thought #2: Wikipedia currently has 7,095,446 articles.
Action #2: executing tool 'terminate' with arguments {"action": "terminate", "status": "success"}
Observation#2: Wikipedia currently has 7,095,446 articles.

Final Answer: Wikipedia currently has 7,095,446 articles.

Enter another task (or press Enter to exit):

Reproducibility
We provide a framework in webeval/ to reproduce our results on WebVoyager and OnlineMind2Web.
Agentic evaluations on live websites present unique challenges due to day-to-day changes. We implement several measures to ensure reliable and comparable evaluations:
BrowserBase Integration
We employ BrowserBase to manage browser session hosting, enabling reliable browser instance management.
Time-sensitive Task Updates
Tasks in benchmarks like WebVoyager can become stale or impossible. We:

Removed ~48 impossible tasks from the original WebVoyager benchmark
Updated ~50 tasks with future dates to keep them achievable
Example: "Search for a hotel in Bali from Jan 1 to Jan 4, 2024" → "Search for a hotel in Bali from Jan 1 to Jan 4, 2026"
Our updated WebVoyager benchmark is available at webeval/data/webvoyager/WebVoyager_data_08312025.jsonl

Environment Error Handling
Browser errors (connection drops, page timeouts) are handled robustly:

Trajectories are retried up to 5 times when environment errors occur
Complete yet incorrect trajectories are never retried
Each retry starts with a fresh browser session, with no retained state

Step Budget
Each trajectory is capped at a maximum of 100 actions across all online benchmarks. Trajectories exceeding this budget without choosing to stop are considered incorrect.
WebEval Package Installation
conda create --name fara_webeval python=3.12
conda activate fara_webeval

# Install fara package
pip install -e .

# Install autogen submodule
git submodule update --init --recursive
cd autogen/python/packages
pip install -e autogen-core
pip install -e autogen-ext

# Install webeval
cd webeval
pip install -e .

# Install playwright
playwright install
Running Evaluations
Navigate to the scripts directory:
cd webeval/scripts
Make sure you set a valid OpenAI GPT-4o endpoint in endpoint_configs_gpt4o/dev in order to run the WebVoyager LLM-as-a-judge!
Option 1: Self-hosted VLLM
python webvoyager.py --model_url /path/where/you/want/to/download/model/ --model_port 5000 --eval_oai_config ../endpoint_configs_gpt4o/dev/ --out_url /path/to/save/eval/files --device_id 0,1 --processes 1 --run_id 1 --max_rounds 100
Option 2: Azure Foundry Deployment
Deploy Fara-7B on Foundry endpoint(s), then place endpoint URLs and keys in JSONs under endpoint_configs/:
python webvoyager.py --model_endpoint ../../endpoint_configs/ --eval_oai_config ../endpoint_configs_gpt4o/dev/ --out_url /path/to/save/eval/files --processes 1 --run_id 1_endpoint --max_rounds 100
Notes

We use the same LLM-as-a-judge prompts and model (GPT-4o) as WebVoyager, hence the --eval_oai_config argument
Set --browserbase for browser session management (requires exported API key and project ID environment variables)
Avoid overloading a single VLLM deployment with more than ~10 concurrent processes due to known issues
See debugging output in fara/webeval/scripts/stdout.txt

Analyzing Evaluation Results
Evaluation Output Structure
Evaluation results are stored under --out_url in folders organized by:

Model name
Dataset
Username
Run ID

Example path:
/runs/WebSurfer-fara-100-max_n_images-3/fara-7b/<username>/WebVoyager_WebVoyager_data_08312025.jsonl/<run_id>

Each evaluation folder contains:

gpt_eval/ - LLM-as-a-judge evaluation results
traj/ - Per-task trajectory subdirectories containing:

final_answer.json (e.g., Amazon--1_final_answer.json) - <no_answer> indicates abortion or step budget exceeded
scores/gpt_eval.json - LLM judge scores
web_surfer.log - Action history and errors
screenshot_X.png - Screenshots captured before each action X

Running Analysis
Use the analysis notebook to compute metrics:
cd webeval/scripts/analyze_eval_results/
jupyter notebook analyze.ipynb
The script:

Identifies trajectories aborted mid-execution and diagnostic reasons
Computes average scores across non-aborted trajectories
Distinguishes between aborted trajectories (errors during sampling) and completed trajectories (with terminate() call or step budget exceeded)

To re-run failed tasks, execute the evaluation script again with the same run_id and username - it will skip non-aborted tasks.

Example WebVoyager GPT Eval Result
{
"score": 1.0,
"gpt_response_text": "To evaluate the task, we need to verify if the criteria have been met:\n\n1. **Recipe Requirement**: A vegetarian lasagna recipe with zucchini and at least a four-star rating.\n\n2. **Search and Results**:\n - The screenshots show that the search term used was \"vegetarian lasagna zucchini.\"\n - Among the search results, \"Debbie's Vegetable Lasagna\" is prominently featured.\n \n3. **Evaluation of the Recipe**:\n - Rating: \"Debbie's Vegetable Lasagna\" has a rating of 4.7, which satisfies the requirement of being at least four stars.\n - The presence of zucchini in the recipe is implied through the search conducted, though the screenshots do not explicitly show the ingredients list. However, the result response confirms the match to the criteria.\n\nGiven the information provided, the task seems to have fulfilled the requirement of finding a vegetarian lasagna recipe with zucchini and a four-star rating or higher. \n\n**Verdict: SUCCESS**"
}

Citation
If you use Fara in your research, please cite our work:

About

No description, website, or topics provided.

Resources

Readme

License

MIT license

Code of conduct

Code of conduct

Security policy

Security policy

Uh oh!

There was an error while loading. Please reload this page.


Activity

Custom properties
Stars

486
stars
Watchers

3
watching
Forks

40
forks

Report repository

Releases
No releases published

Packages
0

No packages published

Uh oh!

There was an error while loading. Please reload this page.


Contributors
5

Languages

Python
74.6%

Jupyter Notebook
21.2%

JavaScript
4.2%

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms

Privacy

Security

Status

Community

Docs

Contact

Manage cookies

Do not share my personal information

You can’t perform that action at this time.

Fara-7B is Microsoft’s first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, it’s an ultra-compact Computer Use Agent (CUA) achieving state-of-the-art performance within its size class and competitively matching larger, more resource-intensive agentic systems. Fara-7B operates by utilizing computer interfaces—mouse and keyboard—to perform multi-step tasks on behalf of users, mirroring human interaction with technology. Unlike traditional chat models, it directly perceives webpages and takes actions like scrolling, typing, and clicking, relying on the same modalities as humans. This design results in on-device deployment, reducing latency and enhancing privacy by keeping user data local. Fara-7B averages only ~16 steps per task, in comparison to approximately 41 steps for comparable models, demonstrating efficient task completion.

The model is trained utilizing a novel synthetic data generation pipeline built on the Magentic-One multi-agent framework, leveraging 145K trajectories that cover diverse websites, task types, and levels of difficulty. It's based on Qwen2.5-VL-7B and refined through supervised fine-tuning.

Key Capabilities and Performance Highlights:

Fara-7B is capable of automating typical everyday web tasks including searching for information and summarizing results, filling out forms and managing accounts, booking travel, movie tickets, and restaurant reservations, and shopping and comparing prices across retailers. It achieved state-of-the-art results across multiple web agent benchmarks, surpassing comparable-sized models and larger systems. Specifically, in evaluations using WebVoyager, OnlineMind2Web, and DeepShop, Fara-7B demonstrated success rates of 73.5%, 34.1%, and 26.2%, respectively, surpassing many larger agents. These results were evaluated across eleven segments of WebTailBench, a new benchmark focusing on 11 real-world task types that are underrepresented or missing in existing benchmarks. The benchmark includes 609 tasks across diverse categories, with the first 8 segments testing single skills or objectives, and the remaining 3 evaluating more difficult multi-step or cross-site tasks.

The success of Fara-7B is largely due to its efficiency and ability to accurately complete tasks, as demonstrated in the WebTailBench evaluation. Fara-7B consistently outperformed other models, particularly in multi-step tasks, achieving success rates above 30% for complex scenarios.

The evaluation framework, dubbed WebEval, incorporates several crucial elements to ensure the reproducibility and reliability of results. This includes BrowserBase integration for robust browser session management, time-sensitive task updates to keep benchmarks achievable, and stringent error handling, including retry mechanisms and budget limitations to prevent task abortion or deviation from intended actions. Finally, a comprehensive analysis of results is supported through an analysis notebook.

The research team provided detailed documentation outlining the deployment methodology, including instructions for running the model locally using VLLM or Azure Foundry hosting, setting up environments, and executing test scenarios. The details of data generation and evaluation are thoroughly described offering a clear insight into the development and validation of Fara-7B .