Developed project with intention of understanding of web scrapping, flask-api integration, talking to 3rd google api.
This project provides an overview of how to implement Webscrapping using python.
Introduction of Webscrapping: Web scrapping is extracting data from a website and creating our own analysis on it.
- app.py is the core api layer which accepts request and return a response/download/upload a file
- Operations:
- "/" : Redirects to home screen
- "/channel" : Redirects to the searched channel
- "/channel/videos" : Redirects to the channel videos
- "/channel/video/comments" : Redirects to the channel, video comments
- "/channel/video/download" : Downloads the channel video
- "/channel/video/s3upload" : Uploads to the S3 bucket
- AppConfig.py is the application related configuration information
- DbModel.py, MongoDbModel.py, SnowflakeDbModel.py are the database related connection setting and CRUD operations
- YTChannels.py is the core file which handle to extraction process from YouTube, upload to s3, save to database.
- YTExceptions.py, YTLogger.py are exception handling and logging files
- conf.ini application configuration information
- generate_secrets.py is the secrets generation file
- requirements.txt is the application package related information file
- requests==2.27.1
- beautifulsoup4==4.11.1
- requests
- mysql-connector-python
- flask
- requests_html
- pytube
- pybase64
- boto3
- cryptography
- pymongo
- pymongo[srv]
- snowflake-connector-python
- gunicorn==20.0.4