Endpoints
Last updated
Last updated
https://sn1.api.macrocosmos.ai
We provide two primary API endpoints:
POST /v1/chat/completions
POST /web_retrieval
/v1/chat/completions
supports various inference strategies, which can be configured via the request parameters.
Supported inference modes:
Standard Chat Completion
Multistep Reasoning
Test-Time Inference
Mixture of Miners
It selects appropriate miners based on either explicitly provided UIDs
or by applying internal filtering logic that matches miners to the task and model requirements.
/web_retrieval
enables distributed web search via a network of miners. A search query is dispatched to multiple miners, each of which performs an independent web retrieval process. The aggregated results are deduplicated based on URLs before being returned to the client.
Retrieves information from the web based on a search query using multiple miners.
Request model for the /web_retrieval endpoint.
List of specific miner UIDs to query. If not provided, miners will be selected automatically.
[1,2,3]
The query to search for on the web.
latest advancements in quantum computing
Number of miners to query for results.
3
Example: 15
Maximum number of results to return in the response.
1
Example: 5
Maximum time to wait for responses in seconds.
10
Example: 15
Main endpoint that handles both regular, multi step reasoning, test time inference, and mixture of miners chat completion.
Request model for the /v1/chat/completions endpoint.
List of specific miner UIDs to query. If not provided, miners will be selected automatically.
[1,2,3]
Random seed for reproducible results. If not provided, a random seed will be generated.
42
Task identifier to choose the inference type.
InferenceTask
Example: InferenceTask
Model identifier to filter available miners.
hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
Enable step-by-step reasoning mode that shows the model's thinking process.
false
Enable mixture of miners mode that combines responses from multiple miners.
false
Parameters to control text generation, such as temperature, top_p, etc.
{"temperature":0.7,"top_p":0.95,"top_k":50,"max_new_tokens":1024,"do_sample":true}
Example: {"do_sample":true,"max_new_tokens":512,"temperature":0.7,"top_k":50,"top_p":0.95}
Inference mode to use for the task.
Reasoning-Fast
Enable JSON format for the response.
false
Example: true
Enable streaming for the response.
false
Example: true
Successful response with streaming text