🍺 BREW Explorer

← all casks

UI-TARS Desktop

brew install --cask ui-tars v0.2.4

GUI agent that uses vision-language AI to automate desktop tasks by understanding and controlling UI elements.

Why you might care

UI-TARS is an open-source multimodal AI agent built by ByteDance that can autonomously perform computer tasks by visually understanding and interacting with your GUI. It's useful if you want to automate repetitive UI-based workflows without manual scripting, and prefer a vision-based approach over traditional automation tools.

Categories

Alternatives

Agent TARS AutoGPT Claude Computer Use
287
30-day installs · #715
894
90-day · #694
3.4k
365-day · #679
37.0k
★ GitHub stars · updated 3d ago

GitHub topics

agent agent-tars browser-use computer-use cowork gui-agent gui-operator mcp mcp-server multimodal tars ui-tars vision vlm

Links

Blurb generated by claude-haiku-4-5 on today.

Raw metadata
{
  "alternatives": [
    "Agent TARS",
    "AutoGPT",
    "Claude Computer Use"
  ],
  "artifacts": [
    {
      "uninstall": [
        {
          "quit": "com.bytedance.uitars"
        }
      ]
    },
    {
      "app": [
        "UI TARS.app"
      ],
      "target": "/Applications/UI TARS.app"
    },
    {
      "zap": [
        {
          "trash": [
            "~/Library/Application Support/ui-tars-desktop",
            "~/Library/Logs/ui-tars-desktop"
          ]
        }
      ]
    }
  ],
  "auto_updates": 1,
  "categories": [
    "automation",
    "ai",
    "dev-tools"
  ],
  "deprecated": 0,
  "deprecation_reason": null,
  "desc": "GUI Agent for computer control using UI-TARS vision-language model",
  "disable_reason": null,
  "disabled": 0,
  "display_name": "UI-TARS Desktop",
  "enrichment_fetched_at": "2026-06-20T22:50:46+00:00",
  "first_seen": "2026-06-20T00:47:34+00:00",
  "full_token": "ui-tars",
  "github_default_branch": "main",
  "github_last_commit_at": "2026-06-18T03:47:02Z",
  "github_readme_excerpt": "\u003cpicture\u003e\n  \u003cimg alt=\"Agent TARS Banner\" src=\"./images/tars.png\"\u003e\n\u003c/picture\u003e\n\n\u003cbr/\u003e\n\n## Introduction\n\nEnglish | [\u7b80\u4f53\u4e2d\u6587](./README.zh-CN.md)\n\n[![](https://trendshift.io/api/badge/repositories/13584)](https://trendshift.io/repositories/13584)\n\n\u003cb\u003eTARS\u003csup\u003e\\*\u003c/sup\u003e\u003c/b\u003e is a Multimodal AI Agent stack, currently shipping two projects: [Agent TARS](#agent-tars) and [UI-TARS-desktop](#ui-tars-desktop):\n\n\u003ctable\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth width=\"50%\" align=\"center\"\u003e\u003ca href=\"#agent-tars\"\u003eAgent TARS\u003c/a\u003e\u003c/th\u003e\n      \u003cth width=\"50%\" align=\"center\"\u003e\u003ca href=\"#ui-tars-desktop\"\u003eUI-TARS-desktop\u003c/a\u003e\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"center\"\u003e\n        \u003cvideo src=\"https://github.com/user-attachments/assets/c9489936-afdc-4d12-adda-d4b90d2a869d\" width=\"50%\"\u003e\u003c/video\u003e\n      \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\n        \u003cvideo src=\"https://github.com/user-attachments/assets/e0914ce9-ad33-494b-bdec-0c25c1b01a27\" width=\"50%\"\u003e\u003c/video\u003e\n      \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"left\"\u003e\n        \u003cb\u003eAgent TARS\u003c/b\u003e is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.\n        \u003cbr\u003e\n        \u003cbr\u003e\n        It primarily ships with a \u003ca href=\"https://agent-tars.com/guide/basic/cli.html\" target=\"_blank\"\u003eCLI\u003c/a\u003e and \u003ca href=\"https://agent-tars.com/guide/basic/web-ui.html\" target=\"_blank\"\u003eWeb UI\u003c/a\u003e for usage.\n        It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world \u003ca href=\"https://agent-tars.com/guide/basic/mcp.html\" target=\"_blank\"\u003eMCP\u003c/a\u003e tools.\n      \u003c/td\u003e\n      \u003ctd align=\"left\"\u003e\n        \u003cb\u003eUI-TARS Desktop\u003c/b\u003e is a desktop application that provides a native GUI Agent based on the \u003ca href=\"https://github.com/bytedance/UI-TARS\" target=\"_blank\"\u003eUI-TARS\u003c/a\u003e model.\n        \u003cbr\u003e\n        \u003cbr\u003e\n        It primarily ships a\n        \u003ca href=\"https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/",
  "github_repo": "bytedance/UI-TARS-desktop",
  "github_stars": 36995,
  "github_topics": [
    "agent",
    "agent-tars",
    "browser-use",
    "computer-use",
    "cowork",
    "gui-agent",
    "gui-operator",
    "mcp",
    "mcp-server",
    "multimodal",
    "tars",
    "ui-tars",
    "vision",
    "vlm"
  ],
  "homepage": "https://github.com/bytedance/UI-TARS-desktop",
  "homepage_og_description": null,
  "homepage_og_image": null,
  "homepage_title": null,
  "installs_30d": 287,
  "installs_365d": 3353,
  "installs_90d": 894,
  "last_seen": "2026-06-20T00:47:34+00:00",
  "llm_generated_at": "2026-06-20T23:05:18+00:00",
  "llm_model": "claude-haiku-4-5",
  "names": [
    "UI-TARS Desktop"
  ],
  "one_liner": "GUI agent that uses vision-language AI to automate desktop tasks by understanding and controlling UI elements.",
  "rank_30d": 715,
  "rank_365d": 679,
  "rank_90d": 694,
  "raw_hash": "c763ae57e4f61b10",
  "ruby_source_path": "Casks/u/ui-tars.rb",
  "tap": "homebrew/cask",
  "token": "ui-tars",
  "version": "0.2.4",
  "why_use_this": "UI-TARS is an open-source multimodal AI agent built by ByteDance that can autonomously perform computer tasks by visually understanding and interacting with your GUI. It\u0027s useful if you want to automate repetitive UI-based workflows without manual scripting, and prefer a vision-based approach over traditional automation tools."
}