Subnet 9 Mining Setup Guide
Introduction
IOTA (Incentivized Orchestrated Training Architecture) is a data- and pipeline-parallel training algorithm designed to operate on a network of heterogeneous, unreliable devices in adversarial and trustless environments.
Miners purpose in IOTA
In a decentralized LLM-training network, miners are the workers that supply GPU compute, memory, and bandwidth to collaboratively train models. IOTA utilizes data- and pipeline-parallelism, meaning that miners run sections of the model rather than its entirety. This reduces the hardware requirement for participation. Each miner downloads its assigned section of the model, runs forwards and backward passes of activations and periodically sync their weights with peers in the same layer via a merging process. By distributing workloads across a large number of independent miners, the network achieves massive parallelism, fault tolerance, and censorship resistance while eliminating single-point infrastructure costs.
The IOTA incentive mechanism continuously scores miners on throughput and the quality of their work during the training and merging processes. In turn, they are rewarded with subnet 9 alpha tokens based on the quality of their contributions.
Operations explained
Joining the network
Miners join the network and get registered with the orchestrator using their API client, which assigns them to a training layer. Up to 50 miners operate per layer. Miners download the current global weights for their layer and begin processing activations.
Activations
There are two activation types: forward and backward.
Forwards activations propagate samples through the model to produce losses.
Backwards activations propagate the samples in the opposite direction to produce gradients for training their layer weights.
Backwards activations are given precedence over forwards activations as they provide the learning signal.
If a miner fails to process an activation that it has been assigned, it is penalized. This is like an assembly line, where workers pass between adjacent stages in the process. An important part of the design is that samples propagate through the pipeline in random and stochastic paths.
Activation processing happens in all layers at once, but miners process samples and train asynchronously. Miners must process as many activations as possible in each epoch — their score is based on throughput.
Merging
Once the orchestrator signals that enough samples have been processed in the network, the state of the system changes from training mode to merging mode.
In merging mode, the miners perform a multi-stage process which is a modified version of Butterfly All-Reduce.
Miners upload their local weights and optimiser states to the s3 bucket.
They are assigned a set of random weight partitions.
Importantly, multiple miners are assigned to the same partitions which provides redundant measurement of results for improved fault tolerance and robustness!
Miners then must download their partitions, perform a local merge (currently the element-wise geometric mean) and then upload their merged partitions.
This design is tolerant to miner failures, so merging is not blocked if some miners do not successfully complete this stage.
Finally, miners download the complete set of merged weights and optimiser states. The merging stage is currently the slowest, so we amortise this by running the training stage for longer and effectively training on larger batch sizes.
Once merging is complete, the orchestrator state returns to training mode and the miners continue processing activations. The miners cycle between training mode and merging mode in perpetuity.
Figure 1 below illustrates the training loop.
Figure 1 Explanation - While inside the training loop, the miner is responsible for performing forward and backward passes while uploading their activations to the dedicated storage bucket. In the forward direction, miners receive activations from the previous layer, compute transformed outputs, and propagate them downstream. During the backward pass, they consume gradients, compute local weight updates, and send gradients upstream. Importantly, the number of forward and backward passes per training loop is controlled via an orchestrator level hyperparameter called BATCHES_BEFORE_MERGING.
Setting Up Mining
This section provides an instruction on the IOTA miner set up. The setting up process is described as a full flow, assuming use of the Terminal. Some operations should be adjusted, when using any UI dev tools for set up. At the bottom of the page you may find instructions on RunPod setup examples for the ones who are not familiar with infrastructure setup.
If you have any questions, not covered in the instruction or facing issues with installation, reach us out for support in:
Prerequisites
To start setting up the miner on IOTA requires:
Training infrastructure: Miners must run on GPUs with at least 80 GB of VRAM (A100-class or higher); hardware with less memory will process updates more slowly and consequently may earn markedly lower rewards.
Basic HuggingFace Access token to pull the model from HuggingFace - no need to modify the permissions
Installation
Provide the files executable rights to run on your local machine
Bittensor Wallet Registraton
Miner Registration and Launch
Register on Mainnet
Change miner values, where necessary, and copy them:
wallet_name="wallet_name" #change the wallet_name to your wallet name (coldkey name)
wallet_hotkey="wallet_hotkey" #change the wallet_hotkey to your hotkey name
netuid=9
network="finney"
MOCK=False
BITTENSOR=True
HF_TOKEN="hf_token" #change to your HuggingFace Access token
ORCHESTRATOR_HOST="iota.api.macrocosmos.ai"
ORCHESTRATOR_PORT=443
ORCHESTRATOR_SCHEME=https
Expected output:
🎉Welcome to the Cosmos!
Last updated