Stop optimizing tokens.
Start optimizing outcomes.

Compass is the first model selection engine that balances cost against real business KPIs — so you spend less where outcomes don't suffer, and invest more where they matter most.

Request Early Access See How It Works ↓

Live Model Decisions ● Auto

Sonnet 4.5 → Billing dispute CSAT ↑12% · worth 3x cost

GPT-4.1 → Password reset same KPI · –78% cost

Gemini 2.5 → Tech support Escalation ↓23% · –31% cost

Llama 4 → FAQ queries same KPI · –91% cost

Cost ↔ Outcome (30-day) ● Live

CSAT Score

4.6

↑ 0.4 vs baseline

Cost per Resolution

$0.08

↓ 54% vs single-model

Escalation Rate

8.2%

↓ 31% vs baseline

Cost / CSAT Point

$0.02

↓ 61% vs GPT-4 only

The Problem

Choosing the right model is still a manual, never-ending process.

Without Compass

Manual experimentation, blind cost cuts

Teams either overspend by defaulting to the most expensive model for everything, or cut cost blindly and watch quality metrics drop — with no systematic way to find the right balance.

✕ One model for all tasks — overspending on simple ones

✕ Cost cuts that silently degrade business outcomes

✕ No visibility into cost-per-outcome by workflow

✕ Re-evaluation every time a new model ships

With Compass

Automatic, outcome-aware selection

Compass doesn't just cut cost — it finds the optimal tradeoff between what you spend and the business outcomes you get. Less where quality doesn't suffer, more where it matters.

✓ Cost per resolved ticket, not cost per token

✓ CSAT and escalation rate as optimization targets

✓ Spend allocated by outcome impact, not uniformly

✓ Continuous learning as models and costs change

How It Works

Three steps to outcome-driven model selection

Connect

Drop in our SDK or use the OpenAI-compatible API. Point your LLM calls through Compass. Define the KPIs that matter to your business.

import { Compass } from '@compass/sdk' const compass = new Compass({ kpis: ['csat', 'escalation_rate'], workflow: 'customer-support' })

Explore

Compass intelligently distributes traffic across models, running controlled experiments to learn which models drive the best business outcomes for each request type.

// Compass explores automatically // No manual A/B test setup const response = await compass.complete({ messages: conversation, // model selected automatically })

Converge

As KPI telemetry flows back, Compass converges on optimal selection policies per workflow. It continuously adapts as models improve and your product evolves.

// Report outcomes back compass.reportOutcome({ requestId: response.id, csat: 4.8, escalated: false, resolveTime: 240 // seconds })

The Closed Loop

What makes Compass different

A continuous feedback loop between model decisions and business outcomes — fully automated.

Your App

LLM Requests

→

Compass
Selection Engine

→

Claude Sonnet 4.5

GPT-4.1

Gemini 2.5 Flash

Llama 4 Scout

→

Output

Response

← KPI Telemetry: CSAT · Escalation Rate · Time-to-Close · Conversion · Custom Metrics → Selection Policy Update

The Insight

We've seen this problem before —
in a different industry.

Compass was born from a pattern we recognized across two seemingly different domains.

Co-founder Alex spent years building ad optimization engines for Return on Ad Spend (ROAS) campaigns. The breakthrough insight: ad platforms that optimized only on click-through rates and impressions consistently underperformed those that closed the loop with actual business outcomes — revenue, customer lifetime value, margin per acquisition.

The ad industry spent years to shift away from pure technical metrics (CTR, CPM) that are easy to measure but often misleading. The campaigns that actually moved the needle were the ones where ad telemetry was blended with downstream business data — creating a closed feedback loop that could learn what "good" really meant for each customer.

Today's LLM model selection typically relies on token cost, latency, and benchmark quality scores — proxies that are easy to measure but don't capture what matters downstream. The model that scores highest on a benchmark isn't necessarily the model that closes more tickets, converts more leads, or reduces churn. Compass brings the same closed-loop approach to model selection that transformed ad optimization.

📡

The Ad Engine Playbook

Ad Tech Stop optimizing for clicks

↓

Ad Tech Blend ad telemetry + business data

↓

Ad Tech Optimize for ROAS directly

Same playbook, new domain

Compass Stop optimizing for tokens

↓

Compass Blend model telemetry + KPI data

↓

Compass Optimize for business outcomes

Use Cases

Every AI workflow has a business metric

Compass learns the right model for each one — automatically.

Customer Support

Send complex billing issues to frontier models, simple FAQs to fast/cheap models. Optimize for CSAT and resolution time.

KPI: CSAT + Escalation Rate

Sales Outreach

Maximize reply rates and meeting bookings by learning which models generate the most effective personalized messaging.

KPI: Reply Rate + Meetings Booked

Content Generation

Balance creative quality with production speed. Compass learns which models produce content that drives the most engagement.

KPI: Engagement + Publish Rate

RAG & Search

Optimize retrieval-augmented generation for answer accuracy and user satisfaction across different query complexity levels.

KPI: Answer Accuracy + Click-through

Compliance Review

Direct sensitive regulatory content to models with the lowest error rates while keeping costs manageable for routine checks.

KPI: Error Rate + Processing Time

Product Recommendations

Increase conversion by learning which models generate the most compelling product suggestions for each customer segment.

KPI: Conversion Rate + AOV

Stop optimizing tokens.Start optimizing outcomes.