The world’s highest quality and cheapest inference engine

Deploy any open source model, auto-scale instantly, and pay for what you use

20X cheaper

than GPT-4o

Deploy

any model in seconds

Setup inference in minutes

Deploy any open source or fine-tuned model

Customize your hardware configuration

Only 5 minutes needed to deploy a model

Support for privacy preserving ML through TEEs

Serverless and Dedicated endpoints for any model

No need to build your own ML infrastructure

Pricing

You only pay for what you use

Model type

Model Size

Price (per 1M Tokens)

Llama 3.2

1B & 3B

¢3

Llama 3.1 & Llama 3.2

8B & 11B

¢9

Llama 3.1 & Llama 3.3

70B & 70B

¢60

Llama 3.2

90B

¢90

Llama 3.1

405B

$2

Model type

Model Size

Price (per minute)

Llama 3.2

1B & 3B

¢1

Llama 3.1 & Llama 3.2

8B & 11B

¢3