LogoLogo
  • Welcome
  • News and updates
  • Subnet Status Update
  • Macromedia
  • Bittensor
    • DTAO
  • SUBNETS
    • Subnet 1 - Apex
      • Subnet 1: How to use APEX
      • Subnet 1: Incentive Mechanism
      • Subnet 1: Base Miner Setup
      • Subnet 1: Roadmap
    • Subnet 9 - Pre-training
      • Subnet 9: How to use Pre-training
      • Subnet 9: Incentive Mechanism
      • Subnet 9: Roadmap
    • Subnet 13 - Data Universe
      • Subnet 13: How to use Data Universe
      • Subnet 13: Incentive Mechanism
      • Subnet 13: Roadmap
      • Subnet 13 API
    • Subnet 25 - Mainframe
      • Subnet 25: How to use Mainframe
      • Subnet 25: Incentive Mechanism
      • Subnet 25: Roadmap
  • Subnet 37 - Finetuning
    • Subnet 37: How to use Fine-Tuning
    • Subnet 37: Miners
    • Subnet 37: Validators
    • Subnet 37: Incentive Mechanism
    • Subnet 37: Competitions
    • Subnet 37: Roadmap
  • CONSTELLATION - USER GUIDES
    • Apex
      • Navigating Apex
      • FAQs
    • Gravity
      • Scraping data
      • Managing and Collecting your data
      • FAQs
    • Nebula
      • Explore Nebula
      • Analyzing data
  • Developers
    • API Documentation
      • Accessing API Keys
      • SN1 - APEX
        • Endpoints
        • Example Usage
        • Agentic Tasks
        • Supported Models
      • SN13 - Data Universe
        • Endpoints
        • Example usage
        • Scraping Youtube Data
      • SN25 - Mainframe
        • API Keys
        • Folding API
          • Running Folding API Server
          • Endpoints
        • Organic API
          • Endpoints
    • Macrocosmos SDK
      • Installation
      • API Keys
      • APEX
      • Gravity
    • Tools
      • Macrocosmos MCP
      • Developer Projects
Powered by GitBook
On this page
  • Getting Started (Backend CLI Setup)
  • Running the Scraper (CLI Mode)
  • Custom APIs – Download Transcripts via HTTP
  • Available Custom Endpoints
  • Video Scrapper
  • Channel Scrapper
  • Testing APIs with Postman
  • Frontend Interface
  • 🧑‍💻 Using the Frontend
  1. Developers
  2. API Documentation
  3. SN13 - Data Universe

Scraping Youtube Data

PreviousExample usageNextSN25 - Mainframe

Last updated 2 days ago

Requirements

  • YouTube API Key (generate via )

  • Python 3.10+ installed

  • Linux environment or WSL environment

Getting Started (Backend CLI Setup)

  1. Clone the Repository

git clone https://github.com/victorchimakanu/macrocosmos-youtube-scrapper.git
cd macrocosmos-youtube-scrapper

  1. Create and Activate a Virtual Environment

python -m venv venv 
source venv/bin/activate

3. Install Required Packages

pip install -r requirements.txt

This might take a while depending on your environment.

4. Get a YouTube API Key

  1. Set Up Environment Variables

Create a .env file in the root directory and add your API key:

YOUTUBE_API_KEY="YOUR_API_KEY_HERE"

6. Finalize Package Setup

pip install -e .

This validates and installs local dependencies including the data-universe package.

Running the Scraper (CLI Mode)

1. Navigate to the YouTube Scraper Module

cd scraping/youtube

2. Run the Scraper

python youtube_custom_scraper.py

If you encounter an error like:ModuleNotFoundError

ModuleNotFoundError: No module named 'common.data'

Go back to the root directory and run:

PYTHONPATH="." python3.11 -m scraping.youtube.youtube_custom_scraper

3. Choose an Option:

You’ll be prompted to select one of the following:

  1. Scrape using a default test script

  2. Scrape any video of your choice

  3. Scrape up to 5 random videos from a specific channel

Transcripts are returned in the terminal. For local downloads, use the Custom API endpoints below.

Custom APIs – Download Transcripts via HTTP

1. Start the Backend API Server

Navigate to the project root and run:

python backend/app.py

Available Custom Endpoints

Video Scrapper

Downloads transcript to local machine

Headers

Name
Value

X-API-KEY

"youtube_api_key"

Body (JSON)

Name
Details
Description

video_id

video id of youtube video

Response

{
  "job_type": "video",
  "status": "started",
  "video_id": "UH_sOZSIk10"
}
{
  "error": "Invalid request"
}

Channel Scrapper

Scrapes random videos , allowing you specify the total number of videos you'd like to scrape

Headers

Name
Value

X-API-KEY

"youtube_api_key"

Body (JSON)

Name
Details
Description

channel_id

Youtube Channel ID

Response

{
  "channel_id": "UC92OMuTHmkrk0Crz5Xqi-5w",
  "job_type": "channel",
  "max_videos": 3,
  "status": "started"
}
{
  "error": "Invalid request"
}

These endpoints wrap the CLI scraper logic and save the output to your local Transcripts folder in .txt and .pdf formats.

Testing APIs with Postman

You can test these APIs using Postman by:

  • Setting request method to POST

  • Using the appropriate URL

  • Providing the correct JSON body

  • Viewing transcript generation in your terminal and output folder

For this example, we're using custom video scrapper endpoint , when you send the request , a Transcript folder is generated and your desired transcript is downloaded to your local machine in PDF and .txt format!

Frontend Interface

Now lets interact with our custom scrapper using a sample frontend application

  1. To setup the frontend, open a split terminal and navigate into the frontend folder

cd frontend 

2. Install Dependencies

npm install

  1. Set Up Frontend Environment Variables

Create a .env file in the frontend directory:

VITE_API_BASE="http://127.0.0.1:5001/"
VITE_API_KEY="YOUR_YOUTUBE_API_KEY"

  1. Launch the frontend app

npm run dev

Follow any of the links and it will spin up a sample application for the youtube scrapper on your local machine

🧑‍💻 Using the Frontend

  • Paste a YouTube video URL or ID

  • Click Scrape Video

  • Then click Download Transcript

Transcripts are saved to the local Transcripts folder in .pdf formats. You can monitor scraper activity via the terminal.

Visit and follow to generate your API key.

POST

POST

{
  "video_id": "UH_sOZSIk10"
}
{
  "channel_id": "UC92OMuTHmkrk0Crz5Xqi-5w",
  "max_videos": 3
}
Google Cloud Console
Google Cloud Console
tutorial documentation
http://127.0.0.1:5001/api/scrape/video
http://127.0.0.1:5001/api/scrape/channel