Subnet 1 Current Competitions
Registry of competitions currently active on SN1 APEX.
1. iota Simulator
iota SimulatorThe iota Simulator models a distributed compute network where activations flow through layers of miners. Miners submit routing and load-balancing algorithms to guide activations through the network as fast as possible. The top-performing algorithms from this competition will be considered for use in subnet 9 iota's orchestration layer.
Simulator Details
A detailed simulator, submission, and log file description can be found in the info doc within the iota_simulator folder. The simulator code is currently proprietary.
96 simulated miners (by default, number may change from round to round) are distributed across layers in a simulated network with bandwidth, latency, queuing, and caching.
Each activation travels forward through layers 0→N-1, then backward N-1→0.
An activation completes when it has finished all forward and backward passes, ending back at layer 0.
If an activation takes too long to complete, it will become stale and is dropped, removing it from all applicable caches.
The default timeout for staleness is 60 simulated seconds.
Miners have forward and backward queues.
Processing backward activations take priority over forward activations.
Miners cache activations after forward processing; cache pressure can stall forward queues
A miner, by default, has a cache size of 5, meaning it can hold space for up to 5 in-flight activations at a time.
An activation is added to a simulated miner's cache after it has been forward-processed.
An activation is removed from a simulated miner's cache after it has been backward-processed.
An epoch completes when 500 activations (by default) finish their full round-trip.
Between epochs, a merge phase occurs: queues clear, your
/balance-orchestratoris called, and miner properties may drift slightly.All time is simulated — HTTP latency to your server does not count toward your score.
Simulated miners may have different drop out and rejoin probabilities, bandwidth, latency, etc.
Simulator Diagram Examples - Queues and Caches
NOTE:
These images don't reflect actual time steps and are just to illustrate the general sequence of events for specific activations and nodes. In the actual simulation, all nodes are active.
During the simulation, different nodes may process activations at different rates. Some may drop out unexpectedly, some may rejoin, and some have slower latency and bandwidths than other nodes.
Example 1: 2 Layers, 2 Miners
In this example, 1 activation is routed between 2 miners in 2 layers from start to finish.









Example 2: 3 Layers, 6 Miners
In this example, we follow activations within an in-progress network.

In this image:
Miner A in Layer 0 just received the red backward activation from miner C in Layer 1. Miner C's cache was decreased from 2 to 1.
Note: Miner A in layer 0 cannot process any more forward activations until it has finished processing the backward activation in its queue, because its cache is full (at 5).
Miner B in Layer 0 is processing the green forward activation.
Miner E in Layer 2 is currently processing the yellow backward activation.
Miner F in Layer 2 is currently processing the blue forward activation.

In this image:
Miner A in Layer 0 is currently processing the red backward activation.
Miner B in Layer 0 finished processing the green forward activation. Its cache is increased by 1, and it sends the activation to the Miner D's forward queue in Layer 1.
At the same time, Miner E in Layer 2 finished processing the yellow backward activation. Its cache is decreased by 1, and it and sends the activation to Miner D's backward queue in Layer 1.
Miner C in Layer 1 is currently processing the purple forward activation.
Miner F in Layer 2 is continues processing the blue forward activation.

In this image:
Miner A in Layer 0 finished processing the red backward activation. Its cache is decreased by 1 and the activation is completed.
The forward queue is no longer stalled - this node can process 1 more forward activation from the queue before its cache is filled again.
Miner C in Layer 1 is still processing the purple forward activation.
Miner D in Layer 1 is currently processing the yellow backward activation.
The backward activation was prioritized over the forward activation, and thus processed first.
Miner F in Layer 2 has finished processing the blue forward activation, and has added the activation to its backward queue.
Evaluation
Miners implement an HTTP server with two endpoints — /route and /balance-orchestrator — that control how activations are routed through a multi-layer miner network. Each evaluation task runs a simulation, each with a different random seed and number of layers (3-8) simulating 5 epochs of 500 activations traversing the network forward and backward through all layers.
/routeis called once per activation routing decision./balance-orchestratoris called once per epoch, between epochs.
An example of a submission implementing random routing and balancing can be found in the iota simulator folder.
Scoring
Score is calculated by:
total_epoch_time: sum of all epoch durations (simulated seconds), excluding merge phases.
max_epoch_time: analytically computed time ceiling with a safety multiplier.
To surpass the current winner, a miner must earn a raw score at least 1% higher than the current top raw score. If there is no current winner, the miner must beat the baseline raw score by at least 1%.
The score_to_beat is displayed in the Apex CLI dashboard, under competition information.
Miner Submissions
Miners submit a single
.pyfile.Maximum submission size: 50,000 characters.
Submission Fee: $10.00 USD.
Default round length: 1 day.
Standard Incentive mechanism.
Miners code is revealed 1 day after evaluation.
Logs are opened after the current round is completed.
Multiple submissions:
The rate limit is 4 submissions per hotkey within 24 hours, across all competitions.
An example of a submission implementing random routing and balancing can be found in the iota simulator folder.
The information about enabled packages is in requirements.txt.
All matches produce a history file, with activation logs and simulation timestamps detailing miner metrics at the given point in the simulation.
2. Energy Arbitrage
Energy storage arbitrage is a core problem in modern electricity markets: a battery operator can profit by purchasing power when prices are low and selling it back when prices are high, but must act under uncertainty as real-time prices deviate from day-ahead forecasts due to weather, demand shocks, and transmission congestion.
The Energy Arbitrage competition challenges miners to optimize battery dispatch decisions across a simulated electrical grid. Miners submit algorithmic policies that decide when to charge and discharge batteries at each time step to maximize profit, while respecting physical constraints including battery state-of-charge limits, network power flow limits, and transaction/degradation costs.
Evaluation Overview
Each evaluation runs the miner's policy across 100 challenge instances. Instances cycle through 5 scenarios of increasing difficulty:
Baseline
20
30
10
96
1 Day
Congested
40
60
20
96
1 Day
Multiday
80
120
40
192
2 Days
Dense
100
200
60
192
2 Days
Capstone
150
300
100
192
2 Days
Each time step represents 15 minutes. Scenarios increase in network size, congestion, price volatility, and battery heterogeneity.
Each rounds's evaluation set is seeded to ensure determinism.
This seed changes from round to round.
Step by Step
At each time step, the miner's policy function receives:
The current state: battery state-of-charge levels, real-time nodal electricity prices, exogenous grid injections, feasible action bounds per battery, and accumulated profit.
The challenge view: network topology (nodes, lines, PTDF matrix, flow limits), battery parameters (capacity, power limits, efficiency), exogenous injection schedule for all time steps, and day-ahead prices.
The policy returns a list of actions (MW), one per battery. Negative values charge; positive values discharge.
Actions must stay within the provided bounds.
Real-time prices are generated stochastically at each step from a hidden seed -- miners cannot predict future RT prices.
Day-ahead prices are known in advance for the full horizon.
Constraints
Battery SOC: Must remain between 10% and 90% of capacity. Starts at 50%.
Charge/discharge efficiency: 95% each direction.
Network flow limits: Actions must not cause line flows to exceed limits (DC power flow model). Violations cause the step to fail.
Action bounds: Pre-computed at each step based on current SOC and battery power limits.
Timeouts:
Per-step timeout = 30 seconds.
Total evaluation timeout = 1200 seconds.
Profit Calculation
At each time step, per battery:
Scoring
Each instance is scored by comparing the miner's total profit against a baseline (the better of two built-in heuristic policies -- greedy and conservative):
quality = (miner_profit - baseline_profit) / (baseline_profit + 1e-6)quality_int = round(clamp(quality, -10, +10) * 1,000,000)The miner's final score is the average quality across all 100 instances.
To surpass the current winner, a miner must earn a raw score > 1% higher than the current top raw score.
If there is no current winner, the miner must beat the baseline raw score by at least 1%.
The
score_to_beatis displayed in the Apex CLI dashboard under competition information.
Miner Submissions
Miners submit a single .py file implementing:
def policy(challenge: PolicyView, state: State) -> list[float]:
Maximum submission size: 50,000 characters.
Default round length: 1 day.
Submission Fee: $1.00 USD.
Miners code is revealed 1 day after evaluation.
Logs are opened after the current round is completed.
The submission rate limit is 4 submissions per hotkey within 24 hours, across all competitions.
An example of baseline solver implementations can be found in the energy_arbitrage/python folder.
The information about enabled packages is in requirements.txt. Only numpy is available beyond the standard library.
3. Reinforcement Learning: Tron
In this head-to-head reinforcement learning competition for Apex, miners train RL agents to play Tron on their own machines and submit their models in TorchScript (.pt files) for evaluation. All miners face off in a duels in a single elimination bracket-style tournament, where the winner takes emissions.
Tron Settings
Be sure the model loads with
torch.jit.load()before submitting.Submissions must be in
.ptfiles to be accepted.Max submission size is 100 MB.
The model must accept an input tensor of shape
(1, 5, H, W)and output Q-values / logits of shape(4,)over the action space[UP, RIGHT, DOWN, LEFT].
Round Structure
A round is run as a single-elimination bracket. Miners can make submissions while the round is OPEN. Miners get one submission per hotkey - if multiple submissions are made under the same hotkey, the most recent submission is used during the evaluation phase.
During the evaluation phase:
Miners are seeded into a single elimination bracket.
Every match is a head-to-head duel between two miners.
The winner advances to the next round of the bracket.
The last surviving miner is the round winner and receives all competition emissions, annealing with the burn.
Match Structure
A single match is a duel between two miners, consisting of several Tron games played head-to-head. The miner with the higher win rate across the games wins the match and advances in the bracket.
The default is 3 games per match. Player spawn slots alternate every game to minimize positional advantages.
Each game is played on a 30×30 playable grid (default) bordered 1-block walls.
Miners spawn in opposite corners (top-left and bottom-right) for maximum separation.
Each game runs for at most 500 ticks.
A game ends when:
one player is alive (that player wins),
both players die on the same tick (draw),
the tick limit is reached with both still alive (draw).
Trails are permanent - riding into your own trail, the opponent's trail, or a wall kills you.
A miner that does not respond in time on a given tick is defaulted to "continue straight" - there is no per-tick penalty, but continuing straight may run the miner into a wall or trail.
Per-Tick Miner I/O
The miner runs as a long-running HTTP server inside its sandbox. The orchestrator calls the miner every game tick (not just at the start) with the current state.
Each tick, the miner receives:
The full grid as a 2D array (
0=empty,1=wall,2+=trail cells)Its own position
[y, x], direction (0=UP,1=RIGHT,2=DOWN,3=LEFT), and alive statusThe opponent's position(s) and alive status
The pre-filtered list of valid actions (excludes reversing direction)
The miner returns a single integer action 0–3.
Timing
Per-tick move timeout: 0.1 seconds. Exceeding this defaults the miner to its current direction.
Per-game wall-clock timeout: 120 real-time second maximum per simulated game.
Evaluation
Game Score
Every game produces a per-game score for each miner based on a death-cause cascade. Rules apply in order; the first rule that matches your situation becomes your score.
1
You killed your opponent (their killed_by is you) and you're still alive
1.00
2
You're still alive and your opponent self-destructed (hit a wall or their own trail)
0.80
3
You killed your opponent but also died on the same step (head-on, mutual trail-kill, or you wall-died while your trail killed them)
0.40
4
Both players alive at max_steps (timeout draw)
0.25
5
Your opponent killed you and you did not kill them
0.10
6
You died alone (wall or your own trail), no kill credit
0.00
If a game fails to start due to model load failures, both miners receive 0.0 for that game.
The scoring rewards aggression: a clean kill (
1.00) is worth more than waiting for your opponent to crash (0.80).
Match Score
A match's score for each miner is the average of per-game scores across all games in the match:
So a match win rate is bounded in [0.0, 1.0].
Match Outcome
The miner with the higher match score wins the match and advances in the bracket.
A losing miner is eliminated from the bracket. Surviving miners are paired up for the next round of the bracket and play another match. This continues until one miner remains.
Information on game stats and outcomes can be found in the eval metadata.
Aggregate Stats
eval_raw_score= the number of rounds this submission has survived.eval_score = rounds_survived / total_rounds
These numbers do not determine the bracket winner - they are tracking stats. The round winner is the last surviving miner in the bracket.
Additional Details
View results on the website's competition dashboard.
Miner code is revealed 2 days after evaluation.
Round length: 2 days.
Submission fee: $1.40 USD, converted to the current TAO price.
Logs are opened after the current round is completed.
Multiple submissions:
The rate limit is 4 submissions per hotkey within 24 hours, across all competitions.
A guide on training a baseline model can be found in the
trainfolder.Information on the RL Tron Player API can be found in launch_tron_rl.py.
See
requirements.txtfor information on allowed packages.All matches produce a replay file (per-game grid history and tick-by-tick actions).
Last updated



