Swan Inference provides an OpenAI-compatible REST API for accessing decentralized AI models. If you've used the OpenAI API or any OpenAI-compatible client, you already know how to use Swan Inference — just change the base URL and API key.
curlhttps://inference.swanchain.io/v1/chat/completions\-H"Authorization: Bearer sk-swan-YOUR-API-KEY"\-H"Content-Type: application/json"\-d'{ "model": "deepseek-r1-distill-llama-70b", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is Swan Chain?"} ] }'
from openai import OpenAIclient =OpenAI(base_url="https://inference.swanchain.io/v1",api_key="sk-swan-YOUR-API-KEY",)response = client.chat.completions.create(model="deepseek-r1-distill-llama-70b",messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is Swan Chain?"},],)print(response.choices[0].message.content)
That's it — any library or tool that supports the OpenAI API format works with Swan Inference.
Authentication
All API requests require an API key passed in the Authorization header:
Key Prefix
Purpose
sk-swan-*
Consumer API key — for making inference requests
sk-prov-*
Provider API key — for GPU providers connecting to the network
API Endpoints
List Models
Retrieve all available models and their current status.
Model availability depends on online providers. Check real-time status at inference.swanchain.io/models or call GET /v1/models.
Rate Limits
Requests are rate-limited per API key:
Model Category
Requests per Minute
LLM
200
Image
60
Embedding
500
Other
200
Maximum concurrent requests: 100 per API key.
When rate-limited, the API returns HTTP 429 Too Many Requests with a Retry-After header.
Request Limits
Parameter
Limit
Max input tokens (LLM)
128,000
Max output tokens (LLM)
16,384
Max input tokens (Embedding)
8,192
Max request body size
10 MB
Max messages per request
100
Max message length
100,000 characters
Request timeout
120 seconds
Error Handling
The API returns standard HTTP error codes with JSON error bodies:
Status Code
Meaning
400
Bad request — check your request body
401
Unauthorized — invalid or missing API key
404
Model not found or no providers available
429
Rate limit exceeded — slow down
500
Internal server error
503
Service unavailable — all providers busy
The platform automatically retries failed requests (up to 2 retries with exponential backoff) when a provider is temporarily unavailable, so most transient errors are handled transparently.
Response Headers
Swan Inference includes helpful headers in every response:
Header
Description
X-Request-ID
Unique request correlation ID for tracing
X-Swan-Connection-Mode
How the request was routed: websocket or external
Use X-Request-ID when contacting support or debugging request issues.
Using with LLM Frameworks
Swan Inference works with any framework that supports OpenAI-compatible APIs.
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709500000,
"model": "llama-3.2-3b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Blockchain is a decentralized, distributed digital ledger that records transactions across many computers so that no single record can be altered retroactively without the alteration of all subsequent blocks."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 38,
"total_tokens": 50
}
}
from openai import OpenAI
client = OpenAI(
base_url="https://inference.swanchain.io/v1",
api_key="sk-swan-YOUR-API-KEY",
)
stream = client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=[{"role": "user", "content": "Write a haiku about GPUs."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://inference.swanchain.io/v1",
apiKey: "sk-swan-YOUR-API-KEY",
});
const stream = await client.chat.completions.create({
model: "deepseek-r1-distill-llama-70b",
messages: [{ role: "user", content: "Write a haiku about GPUs." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}