Scraping Youtube Data
Requirements
YouTube API Key (generate via Google Cloud Console)
Python 3.10+ installed
Linux environment or WSL environment
Getting Started (Backend CLI Setup)
git clone https://github.com/victorchimakanu/macrocosmos-youtube-scrapper.git
cd macrocosmos-youtube-scrapper
python -m venv venv
source venv/bin/activate
3. Install Required Packages
pip install -r requirements.txt
This might take a while depending on your environment.
4. Get a YouTube API Key
Visit Google Cloud Console and follow tutorial documentation to generate your API key.
Set Up Environment Variables
Create a .env
file in the root directory and add your API key:
YOUTUBE_API_KEY="YOUR_API_KEY_HERE"
6. Finalize Package Setup
pip install -e .
This validates and installs local dependencies including the data-universe
package.
Running the Scraper (CLI Mode)
1. Navigate to the YouTube Scraper Module
cd scraping/youtube
2. Run the Scraper
python youtube_custom_scraper.py
If you encounter an error like:ModuleNotFoundError
ModuleNotFoundError: No module named 'common.data'
Go back to the root directory and run:
PYTHONPATH="." python3.11 -m scraping.youtube.youtube_custom_scraper
3. Choose an Option:

You’ll be prompted to select one of the following:
Scrape using a default test script
Scrape any video of your choice
Scrape up to 5 random videos from a specific channel
Transcripts are returned in the terminal. For local downloads, use the Custom API endpoints below.
Custom APIs – Download Transcripts via HTTP
1. Start the Backend API Server
Navigate to the project root and run:
python backend/app.py
Available Custom Endpoints
Video Scrapper
POST
http://127.0.0.1:5001/api/scrape/video
Downloads transcript to local machine
Headers
X-API-KEY
"youtube_api_key"
Body (JSON)
video_id
{
"video_id": "UH_sOZSIk10"
}
video id of youtube video
Response
{
"job_type": "video",
"status": "started",
"video_id": "UH_sOZSIk10"
}
Channel Scrapper
POST
http://127.0.0.1:5001/api/scrape/channel
Scrapes random videos , allowing you specify the total number of videos you'd like to scrape
Headers
X-API-KEY
"youtube_api_key"
Body (JSON)
channel_id
{
"channel_id": "UC92OMuTHmkrk0Crz5Xqi-5w",
"max_videos": 3
}
Youtube Channel ID
Response
{
"channel_id": "UC92OMuTHmkrk0Crz5Xqi-5w",
"job_type": "channel",
"max_videos": 3,
"status": "started"
}
These endpoints wrap the CLI scraper logic and save the output to your local Transcripts
folder in .txt
and .pdf
formats.
Testing APIs with Postman
You can test these APIs using Postman by:
Setting request method to
POST
Using the appropriate URL
Providing the correct JSON body
Viewing transcript generation in your terminal and output folder


For this example, we're using custom video scrapper endpoint , when you send the request , a Transcript
folder is generated and your desired transcript is downloaded to your local machine in PDF and .txt format!
Frontend Interface
Now lets interact with our custom scrapper using a sample frontend application
To setup the frontend, open a split terminal and navigate into the frontend folder
cd frontend
2. Install Dependencies
npm install
Set Up Frontend Environment Variables
Create a .env
file in the frontend directory:
VITE_API_BASE="http://127.0.0.1:5001/"
VITE_API_KEY="YOUR_YOUTUBE_API_KEY"
Launch the frontend app
npm run dev
Follow any of the links and it will spin up a sample application for the youtube scrapper on your local machine
🧑💻 Using the Frontend
Paste a YouTube video URL or ID
Click Scrape Video
Then click Download Transcript

Transcripts are saved to the local Transcripts
folder in .pdf formats. You can monitor scraper activity via the terminal.
Last updated