UI-TARS Desktop
brew install --cask ui-tars
v0.2.4
GUI agent that uses vision-language AI to automate desktop tasks by understanding and controlling UI elements.
Why you might care
UI-TARS is an open-source multimodal AI agent built by ByteDance that can autonomously perform computer tasks by visually understanding and interacting with your GUI. It's useful if you want to automate repetitive UI-based workflows without manual scripting, and prefer a vision-based approach over traditional automation tools.
287
30-day installs · #715
894
90-day · #694
3.4k
365-day · #679
37.0k
★ GitHub stars · updated 3d ago
GitHub topics
agent
agent-tars
browser-use
computer-use
cowork
gui-agent
gui-operator
mcp
mcp-server
multimodal
tars
ui-tars
vision
vlm
Links
- https://github.com/bytedance/UI-TARS-desktop
- GitHub: bytedance/UI-TARS-desktop
- Brew formula source: Casks/u/ui-tars.rb
Blurb generated by claude-haiku-4-5 on today.
Raw metadata
{
"alternatives": [
"Agent TARS",
"AutoGPT",
"Claude Computer Use"
],
"artifacts": [
{
"uninstall": [
{
"quit": "com.bytedance.uitars"
}
]
},
{
"app": [
"UI TARS.app"
],
"target": "/Applications/UI TARS.app"
},
{
"zap": [
{
"trash": [
"~/Library/Application Support/ui-tars-desktop",
"~/Library/Logs/ui-tars-desktop"
]
}
]
}
],
"auto_updates": 1,
"categories": [
"automation",
"ai",
"dev-tools"
],
"deprecated": 0,
"deprecation_reason": null,
"desc": "GUI Agent for computer control using UI-TARS vision-language model",
"disable_reason": null,
"disabled": 0,
"display_name": "UI-TARS Desktop",
"enrichment_fetched_at": "2026-06-20T22:50:46+00:00",
"first_seen": "2026-06-20T00:47:34+00:00",
"full_token": "ui-tars",
"github_default_branch": "main",
"github_last_commit_at": "2026-06-18T03:47:02Z",
"github_readme_excerpt": "\u003cpicture\u003e\n \u003cimg alt=\"Agent TARS Banner\" src=\"./images/tars.png\"\u003e\n\u003c/picture\u003e\n\n\u003cbr/\u003e\n\n## Introduction\n\nEnglish | [\u7b80\u4f53\u4e2d\u6587](./README.zh-CN.md)\n\n[](https://trendshift.io/repositories/13584)\n\n\u003cb\u003eTARS\u003csup\u003e\\*\u003c/sup\u003e\u003c/b\u003e is a Multimodal AI Agent stack, currently shipping two projects: [Agent TARS](#agent-tars) and [UI-TARS-desktop](#ui-tars-desktop):\n\n\u003ctable\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth width=\"50%\" align=\"center\"\u003e\u003ca href=\"#agent-tars\"\u003eAgent TARS\u003c/a\u003e\u003c/th\u003e\n \u003cth width=\"50%\" align=\"center\"\u003e\u003ca href=\"#ui-tars-desktop\"\u003eUI-TARS-desktop\u003c/a\u003e\u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"center\"\u003e\n \u003cvideo src=\"https://github.com/user-attachments/assets/c9489936-afdc-4d12-adda-d4b90d2a869d\" width=\"50%\"\u003e\u003c/video\u003e\n \u003c/td\u003e\n \u003ctd align=\"center\"\u003e\n \u003cvideo src=\"https://github.com/user-attachments/assets/e0914ce9-ad33-494b-bdec-0c25c1b01a27\" width=\"50%\"\u003e\u003c/video\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cb\u003eAgent TARS\u003c/b\u003e is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.\n \u003cbr\u003e\n \u003cbr\u003e\n It primarily ships with a \u003ca href=\"https://agent-tars.com/guide/basic/cli.html\" target=\"_blank\"\u003eCLI\u003c/a\u003e and \u003ca href=\"https://agent-tars.com/guide/basic/web-ui.html\" target=\"_blank\"\u003eWeb UI\u003c/a\u003e for usage.\n It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world \u003ca href=\"https://agent-tars.com/guide/basic/mcp.html\" target=\"_blank\"\u003eMCP\u003c/a\u003e tools.\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cb\u003eUI-TARS Desktop\u003c/b\u003e is a desktop application that provides a native GUI Agent based on the \u003ca href=\"https://github.com/bytedance/UI-TARS\" target=\"_blank\"\u003eUI-TARS\u003c/a\u003e model.\n \u003cbr\u003e\n \u003cbr\u003e\n It primarily ships a\n \u003ca href=\"https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/",
"github_repo": "bytedance/UI-TARS-desktop",
"github_stars": 36995,
"github_topics": [
"agent",
"agent-tars",
"browser-use",
"computer-use",
"cowork",
"gui-agent",
"gui-operator",
"mcp",
"mcp-server",
"multimodal",
"tars",
"ui-tars",
"vision",
"vlm"
],
"homepage": "https://github.com/bytedance/UI-TARS-desktop",
"homepage_og_description": null,
"homepage_og_image": null,
"homepage_title": null,
"installs_30d": 287,
"installs_365d": 3353,
"installs_90d": 894,
"last_seen": "2026-06-20T00:47:34+00:00",
"llm_generated_at": "2026-06-20T23:05:18+00:00",
"llm_model": "claude-haiku-4-5",
"names": [
"UI-TARS Desktop"
],
"one_liner": "GUI agent that uses vision-language AI to automate desktop tasks by understanding and controlling UI elements.",
"rank_30d": 715,
"rank_365d": 679,
"rank_90d": 694,
"raw_hash": "c763ae57e4f61b10",
"ruby_source_path": "Casks/u/ui-tars.rb",
"tap": "homebrew/cask",
"token": "ui-tars",
"version": "0.2.4",
"why_use_this": "UI-TARS is an open-source multimodal AI agent built by ByteDance that can autonomously perform computer tasks by visually understanding and interacting with your GUI. It\u0027s useful if you want to automate repetitive UI-based workflows without manual scripting, and prefer a vision-based approach over traditional automation tools."
}