Scraping Youtube Data

Requirements

YouTube API Key (generate via Google Cloud Console)
Python 3.10+ installed
Linux environment or WSL environment

Getting Started (Backend CLI Setup)

Clone the Repository

git clone https://github.com/victorchimakanu/macrocosmos-youtube-scrapper.git
cd macrocosmos-youtube-scrapper

Create and Activate a Virtual Environment

python -m venv venv 
source venv/bin/activate

3. Install Required Packages

pip install -r requirements.txt

This might take a while depending on your environment.

4. Get a YouTube API Key

Visit Google Cloud Console and follow tutorial documentation to generate your API key.

Set Up Environment Variables

Create a .env file in the root directory and add your API key:

YOUTUBE_API_KEY="YOUR_API_KEY_HERE"

6. Finalize Package Setup

pip install -e .

This validates and installs local dependencies including the data-universe package.

Running the Scraper (CLI Mode)

1. Navigate to the YouTube Scraper Module

cd scraping/youtube

2. Run the Scraper

python youtube_custom_scraper.py

If you encounter an error like:ModuleNotFoundError

ModuleNotFoundError: No module named 'common.data'

Go back to the root directory and run:

PYTHONPATH="." python3.11 -m scraping.youtube.youtube_custom_scraper

3. Choose an Option:

You’ll be prompted to select one of the following:

Scrape using a default test script
Scrape any video of your choice
Scrape up to 5 random videos from a specific channel

Transcripts are returned in the terminal. For local downloads, use the Custom API endpoints below.

Custom APIs – Download Transcripts via HTTP

1. Start the Backend API Server

Navigate to the project root and run:

python backend/app.py

Available Custom Endpoints

Video Scrapper

POST http://127.0.0.1:5001/api/scrape/video

Downloads transcript to local machine

Headers

Name

Value

X-API-KEY

"youtube_api_key"

Body (JSON)

Name

Details

Description

video_id

{
  "video_id": "UH_sOZSIk10"
}

video id of youtube video

Response

{
  "job_type": "video",
  "status": "started",
  "video_id": "UH_sOZSIk10"
}

{
  "error": "Invalid request"
}

Channel Scrapper

POST http://127.0.0.1:5001/api/scrape/channel

Scrapes random videos , allowing you specify the total number of videos you'd like to scrape

Headers

Name

Value

X-API-KEY

"youtube_api_key"

Body (JSON)

Name

Details

Description

channel_id

{
  "channel_id": "UC92OMuTHmkrk0Crz5Xqi-5w",
  "max_videos": 3
}

Youtube Channel ID

Response

{
  "channel_id": "UC92OMuTHmkrk0Crz5Xqi-5w",
  "job_type": "channel",
  "max_videos": 3,
  "status": "started"
}

{
  "error": "Invalid request"
}

These endpoints wrap the CLI scraper logic and save the output to your local Transcripts folder in .txt and .pdf formats.

Testing APIs with Postman

You can test these APIs using Postman by:

Setting request method to POST
Using the appropriate URL
Providing the correct JSON body
Viewing transcript generation in your terminal and output folder

For this example, we're using custom video scrapper endpoint , when you send the request , a Transcript folder is generated and your desired transcript is downloaded to your local machine in PDF and .txt format!

Frontend Interface

Now lets interact with our custom scrapper using a sample frontend application

To setup the frontend, open a split terminal and navigate into the frontend folder

cd frontend

2. Install Dependencies

npm install

Set Up Frontend Environment Variables

Create a .env file in the frontend directory:

VITE_API_BASE="http://127.0.0.1:5001/"
VITE_API_KEY="YOUR_YOUTUBE_API_KEY"

Launch the frontend app

npm run dev

Follow any of the links and it will spin up a sample application for the youtube scrapper on your local machine

🧑‍💻 Using the Frontend

Paste a YouTube video URL or ID
Click Scrape Video
Then click Download Transcript

Transcripts are saved to the local Transcripts folder in .pdf formats. You can monitor scraper activity via the terminal.

Last updated 2 months ago

Getting Started (Backend CLI Setup)

Clone the Repository

Create and Activate a Virtual Environment

3. Install Required Packages

Running the Scraper (CLI Mode)

1. Navigate to the YouTube Scraper Module

2. Run the Scraper

3. Choose an Option:

Custom APIs – Download Transcripts via HTTP

1. Start the Backend API Server

Available Custom Endpoints

Video Scrapper

Channel Scrapper

Testing APIs with Postman

Frontend Interface

🧑‍💻 Using the Frontend