Decentralized AI Inference Marketplace for Real-Time Model Serving
The Inference Marketplace is Swan Chain's decentralized platform for AI model serving, introduced as part of Swan 2.0. Unlike the existing AI Computing Marketplace which handles training workloads through task auctions, the Inference Marketplace provides real-time, low-latency AI inference through persistent WebSocket connections and an OpenAI-compatible API.
Swan Inference routes requests using configurable load balancing strategies:
Strategy
Description
Health-Aware
Routes to the provider with the best health score (default)
Round-Robin
Distributes requests evenly across providers
Least-Connections
Routes to the provider with the fewest active requests
Additional routing features:
Health monitoring with automatic circuit breaker for unhealthy providers
Model warmup to pre-load models and reduce cold-start latency
Retry and failover — up to 2 retries with exponential backoff if a provider fails
Rate limiting per API key with tiered limits by model category
Rate Limits (Default)
Category
Requests/min
LLM
200
Image
60
Embedding
500
Other
200
Pricing Model
Consumer Pricing
Pricing varies by model category:
Category
Pricing Unit
Examples
LLM
Per input token + per output token
Chat completions, text generation
Embedding
Per token
Text embeddings
Image
Per request
Image generation
Audio
Per request
Transcription
Prices are listed transparently in the model catalog at inference.swanchain.io/models. The default marketplace currency is USDC.
Platform Fee
The platform charges a 5% fee on each transaction. This fee funds protocol operations, staking rewards, and SWAN token burns.
Revenue Distribution
When a consumer pays for an inference request:
Recipient
Share
Description
Provider
70%
Paid in the request currency (USDC)
Protocol Treasury
20%
Funds ecosystem development
SWAN Buyback & Burn
10%
Deflationary mechanism
Settlement
Off-Chain Ledger
Usage is tracked in real time on an off-chain payment ledger:
Every inference request records: tokens processed, latency, model used, provider, consumer
Provider earnings accumulate in the ledger
Consumers are billed based on aggregated usage
On-Chain Settlement
Settlement uses a MerkleDistributor smart contract for gas-efficient batch payouts:
Daily batches — Provider earnings are aggregated into settlement batches
Merkle tree — A Merkle tree is computed from all provider balances in the batch
On-chain submission — The Merkle root is submitted to the smart contract
Provider claims — Providers claim their earnings by submitting a Merkle proof
Settlement status follows: pending → submitted → confirmed
This approach settles many provider payments in a single on-chain transaction, minimizing gas costs.
Minimum Payout
Providers must accumulate a minimum balance (default: $50) before a payout is triggered. This prevents dust transactions and reduces gas costs.
Provider Earnings
Providers earn through two complementary streams:
1. Inference Revenue (Stablecoins)
Direct payment for serving inference requests, paid in the consumer's currency (typically USDC). This is the primary revenue stream and scales with the number of requests served.
2. Contribution Rewards (SWAN Tokens)
Daily SWAN token rewards allocated proportionally based on the provider's Contribution Score. This replaces the legacy UBI model and rewards providers for:
Inference volume (requests processed)
Token throughput (tokens generated)
Uptime and availability
Quality (success rate, latency)
Model diversity (number of models served)
Earnings Dashboard
The provider dashboard provides:
Daily/weekly/monthly earnings views with CSV export
Wallet verification via MetaMask for secure payouts
Comparison with AI Computing Marketplace
Feature
AI Computing Marketplace
Inference Marketplace
Workload
Training, batch compute
Real-time inference
Latency
Minutes to hours
Milliseconds to seconds
Allocation
Task auction (bidding)
Real-time routing (WebSocket)
Payment
Per task
Per token / per request
Connection
Job-based
Persistent WebSocket
API
Swan SDK / Orchestrator
OpenAI-compatible REST API
Both marketplaces coexist within the Swan ecosystem, serving different use cases. The Inference Marketplace is optimized for interactive AI applications, while the AI Computing Marketplace handles batch training and compute-intensive tasks.