Documentation

Everything you need to start using Waterfall. OpenAI-compatible API -- change two lines and go.

Quick Start

Get up and running in under a minute. Waterfall is a drop-in replacement for the OpenAI API.

1

Get an API key

Create an account and grab your key from the dashboard.

2

Point your SDK at Waterfall

Set base_url to https://api.getwaterfall.org/v1 and use your Waterfall API key.

3

Send requests

We handle routing, fallbacks, and caching automatically. Use model: "auto" for smart routing, or specify a model directly.

API Reference

Base URLhttps://api.getwaterfall.org/v1AuthenticationAuthorization: Bearer wf-sk-...FormatOpenAI-compatible JSON
POST/chat/completions

Create a chat completion. Fully compatible with the OpenAI chat completions API. Supports streaming, function calling, and all standard parameters.

Models

  • auto -- Smart routing to the cheapest capable model (default)
  • deepseek/deepseek-chat -- Specify a model directly
  • anthropic/claude-sonnet-4 -- Any supported model ID

See the models page for all available models and pricing.

Routing Strategies

Control how Waterfall selects models by passing a routing strategy header or using the model field.

autoDefault

Free models first, then cheapest capable paid model. Best for most use cases.

free-only

Only route to free models (Groq, Cerebras, Gemini free tier). Requests fail if no free model is available.

speed-first

Route to the fastest available provider. Great for real-time applications.

quality-first

Route to the best model available regardless of cost. Use when accuracy matters most.

Usage via header:

X-Waterfall-Strategy: free-only

Code Examples

PythonOpenAI SDK
import openai

client = openai.OpenAI(
    base_url="https://api.getwaterfall.org/v1",
    api_key="wf-sk-your-key-here"
)

response = client.chat.completions.create(
    model="auto",  # Smart routing to cheapest capable model
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

print(response.choices[0].message.content)

Install: pip install openai

TypeScriptOpenAI SDK
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.getwaterfall.org/v1",
  apiKey: "wf-sk-your-key-here",
});

const response = await client.chat.completions.create({
  model: "auto", // Smart routing to cheapest capable model
  messages: [{ role: "user", content: "Explain quantum computing" }],
});

console.log(response.choices[0].message.content);

Install: npm install openai

cURLDirect HTTP
curl https://api.getwaterfall.org/v1/chat/completions \
  -H "Authorization: Bearer wf-sk-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Pricing & Limits

Pricing

Waterfall adds a flat 4% markup on token costs. No credit purchase fees, no hidden charges. Free models remain free.

Rate limits

Rate limits depend on the upstream provider. Waterfall automatically handles rate limit errors by falling through to the next provider in the cascade.

Caching

Identical requests are automatically cached. Cached responses are free and instant. Your dashboard shows your cache hit rate.

Ready to get started?

Get your API key