utahgpu.com

Instant, Private LLM API — Hosted in Utah

Spin up a dedicated Mistral-7B endpoint in seconds.
Enjoy ≤ 65 ms latency to the Western US, zero rate limits, and pricing that’s significantly lower than OpenAI.

🔒 Private by Default

Your prompts stay on a single-tenant RTX 3090 inside a secured Utah colocation.

⚡️ Blazingly Fast

Sub-75 ms median latency // ~135 tokens / sec throughput.

🧠 Cost Smart

After the free credit, pay just $0.20 / 100 k tokens or $0.28 / GPU-hour.

curl https://api.utahgpu.com/v1/chat/completions \ -H "Authorization: Bearer <YOUR_KEY>" \ -H "Content-Type: application/json" \ -d '{ "model": "mistral", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a Utah fun fact:"} ], "max_tokens": 64 }'

Plan	What You Get	Who It's For	Price
Starter	- 1M tokens; 1 model; Email support	Kick-the-tires devs	Free
Builders	- 10M tokens; 2 models; Chat & webhook support	Indie apps / side projects	$20 one-time
Dedicated GPU	* Full access to GPU (SSH or REST, 24/7 uptime SLA)	Production or fine-tunes	$0.28 / hr

Usage beyond included credits billed at $0.20 per additional 100 k tokens. Hourly rental billed per-second.

Thank you!

We've added you to our waitlist. You'll be notified when we're ready to onboard you!

Continue