v1.3.4 Live

The Most Intuitive Operating System.
The Most Powerful Agentic Platform on the Planet.

It started out as a tailor-built environment for agentic development. It ended up solving the biggest problems users face when swapping to Linux.

Arch Linux + Hyprland + Ollama + Claude Code. Free download, open source.

Download ISO Documentation

View Source LLM Benchmarks

Run LLMs Locally, Sub-500ms

A local LLM runs on your GPU, automatically selected to fit your VRAM. System queries are answered in under 500ms without sending a single byte to the internet. Your data never leaves your machine unless you choose to escalate.

$ "Turn up the volume"

Routed to local model (best fit for your VRAM). Runs wpctl via MCP. <500ms, 0 tokens sent anywhere.

$ "What's using my GPU?"

Local model queries /sys/class/drm and radeontop. Returns structured data via MCP tools.

$ "Restart Docker"

systemctl restart docker via MCP system_command. No escalation, no cloud roundtrip.

$ "Debug this segfault in my Rust code"

Auto-escalates to Claude Sonnet. Router detects code_debug category, selects cloud model.

<500ms

local response time

for system queries

0 bytes

sent to cloud

Benchmark-Verified

Multi-Model Routing

Every query is automatically routed to the best model for the job, backed by verified benchmark data from 12 sources. Works with zero API keys out of the box.

Query Type	Routes To	Evidence	Cost
Math & reasoning	Gemini 3 Flash / Pro	GPQA Diamond 94.3% (#1), AIME 100%	$0.50/M tokens
Frontend & web dev	Claude (Sonnet / Opus)	WebDev Arena #1 + #3	Plan or API key
General coding	Claude (Sonnet / Opus)	Chatbot Arena #1, SWE-bench 80.8%	Plan or API key
System commands	Best local model (VRAM-aware)	Auto-selects by GPU budget, free, instant, private	$0
Local reasoning	GPT-OSS:20b	MoE 3.6B active, o3-mini level, Apache 2.0	$0
Quick web queries	Groq / Gemini Flash	840 tok/s free tier	$0
Budget code tasks	Devstral / qwen3-coder	SWE-bench 68%, 256K context, free	$0
DevOps & terminal	GPT-5.3 Codex	Terminal-Bench 77.3% (#1)	API key

Free tier

Local Ollama + Groq free (14,400 req/day, includes 70B) + Gemini free (15 RPM) + Devstral (unlimited). Zero API keys needed.

BYOK

Bring your own API keys for Anthropic, OpenAI, Google, Groq, or Mistral. All providers except Anthropic use one OpenAI-compatible endpoint.

Self-improving

PyTorch MLP classifier retrains every 50 queries on your usage patterns. Fallback chain: ML classifier, regex patterns, default route.

View the benchmark data (12 sources, 35K data points)

Why We Still Use Claude for Most Things

The routing table sends math to Gemini and quick lookups to Groq. But the majority of queries still go to Claude. Here is why.

Plan Usage

Claude Pro / Max, not API

Costa OS authenticates with your Claude subscription. No API billing, no metered usage. You pay for the plan you already have, and the OS routes through it. This makes Claude effectively free for plan subscribers while other cloud providers charge per token.

Quality + Reliability

Chatbot Arena #1, SWE-bench 80.8%

Claude leads Chatbot Arena (overall quality), tops WebDev Arena for frontend work, and holds 80.8% on SWE-bench Verified for real-world code fixes. Gemini wins on specific benchmarks (GPQA, AIME), but Claude is the most reliable generalist across task types.

Instruction Following

System prompts that actually work

The local router depends on models following structured instructions: output format, tool selection, safety constraints. Claude follows complex multi-step system prompts more reliably than alternatives, which matters when the output drives real system commands.

The Integration Infrastructure

This is the real reason. Benchmark scores shift every quarter, but the tooling ecosystem is a durable advantage that no other provider matches.

Claude Code

The agent runtime that powers Costa OS. Hooks, plugins, slash commands, custom agents, autonomous sessions, and MCP server integration. No other LLM has an equivalent local agent framework with this depth of system access.

MCP (Model Context Protocol)

The protocol that connects the agent to 30+ system tools. Screen reading, window management, file ops, Obsidian vault, CLI wrappers. Claude created MCP and has first-class support. Other models can use MCP tools through Claude Code, but the native integration is deeper.

Agent SDK + Hooks

PreToolUse / PostToolUse hooks validate every action before execution. Custom agents with scoped tools and resource queues. Session scheduling with budget caps. Costa Flow YAML workflows with claude-code step types. This is infrastructure, not a chat wrapper.

Knowledge + Memory

21 knowledge files, CLAUDE.md project instructions, Obsidian vault via MCP, persistent memory across sessions. The agent accumulates context about your system, your preferences, and your projects. Switching the underlying model would lose all of this integration.

The Agent Has Real System Access

Not a chatbot. The AI executes commands, manages windows, controls audio, and navigates apps through a proprietary MCP server. "Close Firefox and open VS Code on workspace 3" just works.

30+ MCP Tools

30+ tools

AT-SPI screen reading, window management, typing, clicking, vault search, CLI registry. The agent interacts with your desktop through real APIs, not screenshots.

CLI-Anything Fast Path

~50ms agents

12 CLI wrappers respond in ~50ms with 0 LLM tokens. Firefox tabs, VS Code workspace, OBS status, Strawberry playback. Deterministic, extensible registry.

Invisible Virtual Monitor

112x

The agent operates on HEADLESS-2, a virtual monitor. Opens browsers, navigates pages, fills forms, and researches without touching your displays.

Claude Code + 5 MCP Servers

7 agents

5 MCP servers (costa-system, code-review-graph, context7, claude-code-enhanced, voicemode), knowledge files, Obsidian vault, 7 specialized agents. costa-session schedules autonomous Claude Code sessions with budget caps. Costa Flow defines YAML workflows.

Talk to Your Computer

Push-to-talk voice that actually understands your system. GPU-accelerated transcription, noise cancellation, 2-5 second end-to-end response.

One key

Push-to-Talk

Hold a hotkey and speak. Release to submit. Say 'draft' to review first.

0.5s

GPU Transcription

Vulkan-accelerated speech-to-text. Your voice becomes text in half a second.

99.8%

Noise Cancellation

LADSPA noise reduction crushes background noise. Auto-detects when you stop talking.

Any input

Every Input Works

Speak, type in the bar, use the terminal, paste a screenshot. All inputs feed the same agent.

Just Describe What You Want

You don't need to know Linux. You don't need to know how to code. Just describe what you want in plain English.

Learn terminal commands

Just say what you need

"Install Python and set up a project"

Edit config files by hand

Describe the change you want

"Make my desktop background darker"

Search forums for fixes

Ask and it gets fixed

"My sound isn't working, fix it"

MCP Workflows

$ Build an MCP server that controls my smart lights from the desktop

Neural Routing

$ Retrain the ML router with my latest query logs and evaluate accuracy

Agent Workflows

$ Set up a custom Claude Code workflow for automated code review on every commit

Full Apps

$ Create a voice command that deploys my staging branch to production

Persistent Memory

$ Check your notes about the auth refactor and what I told you about the API design

Workflow Automation

$ Run the security scan workflow and send results to my Telegram

Beautiful and Developer-Ready

A custom Mediterranean coastal theme across 15+ config domains. Pre-configured dev tools. Purpose-built desktop shell. Or change it all with one prompt.

Music Widget

Floating MPRIS controller with album art, queue browsing, library search, playlist switching, live audio quality badge, and support for 10+ players.

Keybind Editor

Visual GUI with keyboard recorder, conflict detection, mouse button discovery, per-device bindings, and support for every Hyprland bind type.

Settings Hub

Central GTK4 panel for display, input, model configuration, Claude Plan login or API keys, dev tools, and system updates.

Clipboard Intelligence

Auto-classifies pasted content including errors, URLs, JSON, commands, and code, then offers contextual actions.

Screenshot Analysis

Select any region, get instant analysis with OCR extraction, error detection, and auto-classification.

Persistent Memory

Obsidian vault connected via MCP. Claude reads and writes notes to remember your preferences, track projects, and store corrections across sessions.

Autonomous Sessions

costa-session schedules Claude Code sessions with budget caps and tool restrictions. Come back to commits, summaries, and desktop notifications.

7-Agent System

Specialized agents for deploys, server ops, architecture review, ISO builds, monitoring, cleanup, and screen navigation.

AI-Assisted Updates

Run costa-update and Claude pulls from GitHub, reviews every change, and fixes breakage automatically. Updates never touch our servers.

Why This Requires Linux

These capabilities depend on architectural features that Windows and macOS do not expose to applications.

Direct Compositor Access

Hyprland exposes every window, workspace, and input event via IPC. The agent reads and controls your desktop directly without accessibility hacks, screen scraping, or COM automation. Windows has no equivalent.

GPU Memory Control

Linux lets you query, allocate, and release VRAM programmatically. The VRAM manager hot-swaps ML models in real time based on what you're doing. Windows locks GPU memory behind driver abstractions you can't touch.

System-Wide Agent Integration

PipeWire, systemd, pacman, and hyprctl all have scriptable interfaces. The agent wires into audio routing, service management, package installs, and window control natively. On Windows, each one requires a different proprietary API with different permissions.

Zero-Overhead Voice Pipeline

Raw audio capture, LADSPA noise reduction, Vulkan-accelerated transcription, and direct text injection, all in user space with no kernel mode switches.

Filesystem as API

Everything is a file. Config changes take effect immediately. No registry, no restart, no 'applying changes.' The agent edits a config and reloads. Response times are measured in milliseconds.

Full Root Intelligence

The agent has the same access you do. Install any package, modify any service, create any systemd unit, change any config. No UAC, no Group Policy, no Defender blocking legitimate operations.

Companion Projects

Open source tools built alongside Costa OS. Ship with the ISO and work standalone on any Linux distribution.

airpods-helper

Native AirPods Pro on Linux & Windows

Full AirPods Pro integration for Linux and Windows. A Rust daemon speaks the Apple Accessory Protocol over raw Bluetooth for features Apple restricts to macOS/iOS. Optional install with Costa OS. Works standalone on any distro.

Rust + Tokio + BlueZ + Tauri + PipeWire + D-Bus

Learn More

ANC control (Off / Noise Cancel / Transparency / Adaptive)

Per-bud + case battery with charging status

Conversational awareness & ear detection

Parametric EQ via PipeWire filter chains

Desktop app (Tauri) + GTK4 bar widget

CLI + D-Bus (Linux) / HTTP API (Windows)

costa-terminal

Multi-Provider AI Terminal

A native terminal app for local and cloud AI. Routes queries across Ollama, Groq, Gemini, Mistral, and Claude using an ML classifier. Streaming responses, full Claude Code sessions with tool call visibility, and a settings wizard that auto-detects your hardware.

Rust + Tauri v2 + SolidJS + Tailwind + ONNX

Included in v1.3.4

ML-based query routing across local and cloud providers

Streaming responses via Tauri events

Claude Code sessions with tool call activity panel

Auto-discovery of Ollama models and cloud API keys

4-step onboarding with GPU/VRAM detection

Provider management with tier-based worker pools

Adapts to Your Hardware

The VRAM manager automatically selects the largest model your GPU can fit. Launch a game and models unload. Close it and they reload in seconds.

Model	Params	Q4 VRAM	Q8 VRAM	Notes
Qwen 3.5qwen3.5:0.8b	0.8B	1.5 GB	2 GB
qwen3.5:2b	2B	2.5 GB	3.5 GB
qwen3.5:4b	4.7B	4 GB	6.5 GB
qwen3.5:9b	9.7B	7 GB	12 GB
Qwen 3qwen3:14b	14B	9 GB	16 GB
Aprielapriel-15b-thinker	15B	10 GB	17 GB
GPT-OSSgpt-oss:20b	21B (3.6B active) MoE			MXFP4: 16 GB
Llama 3llama3.1:8b	8B	5.5 GB	9.5 GB
llama3.3:70b	70B	40 GB	74 GB
Mistralmistral-small3.1:24b	24B	14 GB	26 GB
mistral-nemo:12b	12B	8 GB	14 GB
Gemma 3gemma3:4b	4B	3.5 GB	5.5 GB
gemma3:12b	12B	8 GB	14 GB
gemma3:27b	27B	17 GB	30 GB
Phiphi4:14b	14B	9 GB	16 GB
DeepSeek R1deepseek-r1:7b	7B	5 GB	8.5 GB
deepseek-r1:14b	14B	9 GB	16 GB
deepseek-r1:32b	32B	19 GB	35 GB
Devstraldevstral	24B (MoE) MoE	14 GB	26 GB
Qwen Coderqwen3-coder	30B (3.3B active) MoE	8.5 GB	18 GB
qwen2.5-coder:14b	14B	10 GB	16 GB

VRAM figures are approximate at default context length. Green = fits 16 GB, amber = tight, red = needs more. MoE models load all parameters but only activate a fraction per token.

System Requirements

min4 GB RAM, any GPU, 20 GB disk— desktop + cloud models, no local LLM

rec16 GB RAM, 8 GB VRAM, 40 GB disk— local 7B model + voice + cloud escalation

full32 GB RAM, 12-16 GB VRAM, 80 GB disk— local 14B model, gaming + models simultaneously

Install Costa OS

From download to a working desktop in about 15 minutes.

Download the ISO

~2.1GB file. Save it anywhere on your computer.

Flash to USB

Use balenaEtcher (free, works on Windows/Mac/Linux). Select the ISO, select your USB drive, click Flash.

Boot from USB

Restart your computer and press the boot menu key (F12 for Dell, F9 for HP, F12 for Lenovo, F8 for ASUS).

Run the installer

Graphical installer launches automatically. Pick a disk, set a username, done.

First boot setup

Log into Claude Code first (it can fix everything else). Then hardware detection, model download, and voice setup run automatically.

Download Costa OS v1.3.4 (2.1 GB)Get balenaEtcher (USB flasher)

Requirements

64-bit processor, 4GB+ RAM

Any GPU from the last ~10 years (including integrated graphics)

USB drive, 8GB or larger

Internet connection for first boot setup

Anthropic account (Claude Pro, Max, or API key) for cloud models

Full installation guide with troubleshooting →

What's New

v1.3.4. Costa Terminal + model tier overhaul — New Costa Terminal app with multi-model orchestrator and worker pool. Apriel-15b as default model, gpt-oss:20b as premium tier. VRAM manager rewritten — fixes model swap budget bug. ISO manifest system for reproducible builds. Removed deprecated waybar configs.

v1.3.2. Adversarial security review — Sanitized personal data from training files. Hardened MCP command deny list, context gatherer secret redaction, and router command execution. Moved runtime files to XDG_RUNTIME_DIR. Fixed Firecrawl default credentials.

v1.3.1. Benchmark infrastructure overhaul — 2048-token benchmark runner, free-tier LLM judge, VRAM manager updated for Qwen 3.5. Both ML router stages retrained.

v1.3.0. Security hardening + Firecrawl + shell stability — UFW firewall, Bluetooth encryption, kernel hardening, security scanning agent. Self-hosted Firecrawl web scraping. AGS crash supervisor with auto-restart.

Support Costa OS

Costa OS is free and fully functional. If you'd like to support development, a one-time $9.99 purchase removes the small "Costa" watermark from the status bar. That's it. No features locked, no subscriptions, no telemetry.

Costa OS Pro

One-time purchase. No subscription.

$9.99

✓Remove status bar watermark
✓Offline license. Works forever, no account needed
✓Support an independent developer

Documentation LLM Benchmarks Terms of Use Privacy Policy GitHub airpods-helper

The Costa OS intelligence layer is open source under the Apache License 2.0. The installer and ISO distribution are proprietary.

The Most Intuitive Operating System.The Most Powerful Agentic Platform on the Planet.

Run LLMs Locally, Sub-500ms

Multi-Model Routing

Why We Still Use Claude for Most Things

Claude Pro / Max, not API

Chatbot Arena #1, SWE-bench 80.8%

System prompts that actually work

The Integration Infrastructure

The Agent Has Real System Access

30+ MCP Tools

CLI-Anything Fast Path

Invisible Virtual Monitor

Claude Code + 5 MCP Servers

Talk to Your Computer

Push-to-Talk

GPU Transcription

Noise Cancellation

Every Input Works

Just Describe What You Want

Beautiful and Developer-Ready

Music Widget

Keybind Editor

Settings Hub

Clipboard Intelligence

Screenshot Analysis

Persistent Memory

Autonomous Sessions

7-Agent System

AI-Assisted Updates

Why This Requires Linux

Direct Compositor Access

GPU Memory Control

System-Wide Agent Integration

Zero-Overhead Voice Pipeline

Filesystem as API

Full Root Intelligence

Companion Projects

airpods-helper

costa-terminal

Adapts to Your Hardware

Install Costa OS

Support Costa OS

Costa OS Pro

The Most Intuitive Operating System.
The Most Powerful Agentic Platform on the Planet.