Simple, transparent pricing
Licensed per deployment. No per-request fees. No usage metering. Just run it.
All plans include VRAM AI Gateway + X-Ray Dashboard.
Starter
For teams running a small number of models on a single GPU.
- VRAM AI Gateway
- X-Ray Dashboard
- LRU model eviction
- NVMe spill tier
- OpenAI-compatible API
- Docker deployment
- Email support
Growth
For teams scaling inference across multiple GPUs and models.
- Everything in Starter
- Multi-GPU support
- LoRA adapter switching
- Helm chart (Kubernetes)
- Prefetch engine
- Prometheus metrics
- Priority support
Enterprise
For large-scale GPU fleets with custom requirements.
- Everything in Growth
- GPU Direct Storage (GDS)
- Custom GPU limits
- Private Docker registry access
- SLA guarantee
- Dedicated Slack channel
- Custom onboarding
Frequently asked questions
How does the license work?
Each license key is cryptographically signed and contains your GPU and model limits. Validation is fully offline — no license server, no internet required at runtime.
Can I upgrade my plan mid-term?
Yes. Contact our team and we will issue a new license key with updated limits. No reinstallation required.
What happens when my license expires?
The gateway will refuse to start after expiry. We send a reminder 30 days before. Renewal takes under 5 minutes.
Do you support air-gapped deployments?
Yes. License validation is fully offline. The Docker image can be pushed to a private registry and deployed with no internet access.
What GPUs are supported?
Any NVIDIA GPU with CUDA 12.1+. Tested on A100, H100, A40, RTX 4090, RTX 3090, T4, and all major cloud GPU types.
Is there a free trial?
Yes — contact our team for a 30-day trial license key with Starter limits.