The world’s highest quality and cheapest inference engine

Deploy any open source model, auto-scale instantly, and pay for what you use

doller

20X cheaper

than GPT-4o

clock

Deploy

any model in seconds

Setup inference in minutes

deploy-icon

Deploy any open source or fine-tuned model

customize-icon

Customize your hardware configuration

impulse-icon

Only 5 minutes needed to deploy a model

privacy-ai-icon

Support for privacy preserving ML through TEEs

serverless

Serverless and Dedicated endpoints for any model

star-icon

No need to build your own ML infrastructure

Pricing

You only pay for what you use

Model type

Model Size

Price (per 1M Tokens)

Llama 3.2

1B & 3B

¢3

Llama 3.1 & Llama 3.2

8B & 11B

¢9

Llama 3.1 & Llama 3.3

70B & 70B

¢60

Llama 3.2

90B

¢90

Llama 3.1

405B

$2

Model type

Model Size

Price (per minute)

Llama 3.2

1B & 3B

¢1

Llama 3.1 & Llama 3.2

8B & 11B

¢3