Subnet 9 - Pre-training
Bittensor can build the best models
Last updated
Bittensor can build the best models
Last updated
Subnet 9 is Bittensor's premier training subnet. It incentivises miners to optimise model training within given constraints such as model size and training dataset, providing intelligence that's ready for fine-tuning towards specific use cases.
Pre-training is expensive - the required computing power for pre-training . Research from estimates the dollar cost of LLM development is growing by 0.49 orders of magnitude a year. As this increases, so does the risk that pre-training and fine-tuning at scale will only be accessible for the very largest companies.
By rewarding miners for sharing the best pre-training models, subnet 9’s design ensures a continuously improving baseline of intelligence on Bittensor. Marginal improvements can have significant downstream benefits. Over time, we aim to build a library of open-source models at different sizes, modalities and architectures, which can then be fine-tuned across subnets or even teams outside the Bittensor ecosystem.
Our vision is to create an AI flywheel with subnet 9 at its center, building a Bittensor-trained, fine-tuned, specialized and deployed inference subnet providing foundational models on-demand with different datasets and architectures. New models will be powering agentic tools and other Apps in and out of the Bittensor environment.
Our latest experiments not only proved that dataset mixing is viable on Bittensor, but SN9 also shows how it can stimulate competitions, encourage innovation, and excite the community. What started out as a mere experiment into different pre-training architectures has evolved into a fundamentally stronger subnet.
We’re confident that, beyond volume and quality of public data, SN9 has no limits. However, most proprietary models don’t reveal their exact datasets, closing the door on open-source alternatives. Therefore, if we want to compete by continuously increasing SOTA results, we must expand our repertoire of open-source and accessible datasets, and scale upwards so our models can be well-fed with information. Training with synthetic data is also becoming a mainstream approach in machine learning, so this’ll be a part of our plans, too.
We can elevate the entire protocol and push SN9 to the top of the LLM-training ecosystem by highlighting Bittensor's capacity to build top-tier models. The success of dataset mixing is a step towards making that vision a reality.
For more details about subnet 9's R&D work, take a look our Substack articles:
Other related resources
The provides a detailed view of our pre-training efforts.