Endpoints

Base URL: https://sn1.api.macrocosmos.ai

We provide two primary API endpoints:

  • POST /v1/chat/completions

  • POST /web_retrieval

Chat completions endpoint

post

Main endpoint that handles both regular, multi step reasoning, test time inference, and mixture of miners chat completion.

Header parameters
api-keyany ofOptional
stringOptional
or
nullOptional
authorizationany ofOptional
stringOptional
or
nullOptional
Body

Request model for the /v1/chat/completions endpoint.

uidsany ofOptional

List of specific miner UIDs to query. If not provided, miners will be selected automatically.

Example: [1,2,3]
integer[]Optional
or
nullOptional
seedany ofOptional

Random seed for reproducible results. If not provided, a random seed will be generated.

Example: 42
integerOptional
or
nullOptional
taskany ofOptional

Task identifier to choose the inference type.

Default: InferenceTaskExample: InferenceTask
stringOptional
or
nullOptional
modelany ofOptional

Model identifier to filter available miners.

Example: hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
stringOptional
or
nullOptional
test_time_inferencebooleanOptional

Enable step-by-step reasoning mode that shows the model's thinking process.

Default: false
mixturebooleanOptional

Enable mixture of miners mode that combines responses from multiple miners.

Default: false
sampling_parametersany ofOptional

Parameters to control text generation, such as temperature, top_p, etc.

Default: {"temperature":0.7,"top_p":0.95,"top_k":50,"max_new_tokens":1024,"do_sample":true}Example: {"do_sample":true,"max_new_tokens":512,"temperature":0.7,"top_k":50,"top_p":0.95}
objectOptional
or
nullOptional
inference_modeany ofOptional

Inference mode to use for the task.

Example: Reasoning-Fast
stringOptional
or
nullOptional
json_formatbooleanOptional

Enable JSON format for the response.

Default: falseExample: true
streambooleanOptional

Enable streaming for the response.

Default: falseExample: true
Responses
200
Successful response with streaming text
Responseany
post
POST /v1/chat/completions HTTP/1.1
Host: sn1.api.macrocosmos.ai
Content-Type: application/json
Accept: */*
Content-Length: 397

{
  "uids": [
    1,
    2,
    3
  ],
  "messages": [
    {
      "content": "Tell me about neural networks",
      "role": "user"
    }
  ],
  "seed": 42,
  "task": "InferenceTask",
  "model": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
  "test_time_inference": false,
  "mixture": false,
  "sampling_parameters": {
    "do_sample": true,
    "max_new_tokens": 512,
    "temperature": 0.7,
    "top_k": 50,
    "top_p": 0.95
  },
  "inference_mode": "Reasoning-Fast",
  "json_format": true,
  "stream": true
}
{
  "id": "ca0b8681-7b78-4234-8868-71ad1ebfa9ed",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Neural networks are a type of machine learning model inspired by the human brain's structure. They consist of interconnected nodes arranged in layers, including input, hidden, and output layers. These networks learn by adjusting weights during training using optimization algorithms. Neural networks find applications in image recognition, speech processing, and many other domains requiring pattern recognition and prediction capabilities.",
        "refusal": null,
        "role": "assistant",
        "audio": null,
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1743016348,
  "model": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": null
}

/v1/chat/completions supports various inference strategies, which can be configured via the request parameters.

Supported inference modes:

  • Standard Chat Completion

  • Multistep Reasoning

  • Test-Time Inference

  • Mixture of Miners

It selects appropriate miners based on either explicitly provided UIDs or by applying internal filtering logic that matches miners to the task and model requirements.

Web retrieval endpoint

post

Retrieves information from the web based on a search query using multiple miners.

Header parameters
api-keyany ofOptional
stringOptional
or
nullOptional
authorizationany ofOptional
stringOptional
or
nullOptional
Body

Request model for the /web_retrieval endpoint.

uidsany ofOptional

List of specific miner UIDs to query. If not provided, miners will be selected automatically.

Example: [1,2,3]
integer[]Optional
or
nullOptional
search_querystringRequired

The query to search for on the web.

Example: latest advancements in quantum computing
n_minersinteger · min: 1Optional

Number of miners to query for results.

Default: 3Example: 15
n_resultsinteger · min: 1Optional

Maximum number of results to return in the response.

Default: 1Example: 5
max_response_timeinteger · min: 1Optional

Maximum time to wait for responses in seconds.

Default: 10Example: 15
Responses
200
Successful response with web search results
application/json
post
POST /web_retrieval HTTP/1.1
Host: sn1.api.macrocosmos.ai
Content-Type: application/json
Accept: */*
Content-Length: 125

{
  "uids": [
    1,
    2,
    3
  ],
  "search_query": "latest advancements in quantum computing",
  "n_miners": 15,
  "n_results": 5,
  "max_response_time": 15
}
{
  "results": [
    {
      "url": "https://example.com/article",
      "content": "Quantum computing has seen significant advancements in the past year...",
      "relevant": "This article discusses the latest breakthroughs in quantum computing research."
    }
  ]
}

/web_retrievalenables distributed web search via a network of miners. A search query is dispatched to multiple miners, each of which performs an independent web retrieval process. The aggregated results are deduplicated based on URLs before being returned to the client.

Last updated