Skip to content

Designed and built a Selenium web scraper to automate data extraction from an e-commerce site. Scraped product details using XPath for accurate element location and handled dynamic content with sleep intervals. Data was cleaned and exported to CSV files using Pandas, streamlining product info collection for market research and inventory management.

License

Notifications You must be signed in to change notification settings

zshafique25/Web-Scraping

Repository files navigation

Web Scraping Project

This project scrapes product information from a website using Selenium WebDriver. The goal is to extract details about products, including names, prices, links, images, ratings, descriptions, and additional attributes.

Project Structure

  • xpaths.py: Contains XPath expressions used to locate elements on the web pages.
  • selenium_utils.py: Provides utility functions for setting up the Selenium WebDriver, navigating pages, and interacting with web elements.
  • scraper.py: Implements the main scraping logic, including extracting product details, descriptions, and color variations.
  • data_processing.py: Processes the scraped data, formats it, and prepares it for export.
  • main.py: The entry point of the application. Sets up the browser, performs the search, scrapes the data, and stores it in a CSV file.

Dependencies

  • selenium: For web scraping.
  • webdriver-manager: For managing ChromeDriver.
  • pandas: For data processing and exporting to CSV.
  • json: For parsing JSON data.

Setup

  1. Install Dependencies

    Install the required Python packages using pip:

    pip install selenium webdriver-manager pandas
    
  2. Setup WebDriver

    Make sure you have Chrome installed. The webdriver-manager package will handle the installation of the ChromeDriver automatically.

Usage

  1. Run the Main Script

    Execute the main.py script to start the scraping process:

    python main.py

    This will:

  • Set up the Selenium WebDriver.
  • Navigate to the specified URL and perform a search.
  • Scrape product details from the search results.
  • Process the data and save it to a CSV file named products.csv.

Notes

  • Adjust the sleep times if you encounter issues with page loading.
  • Update XPath expressions in 'xpaths.py' if the website structure changes.
  • Ensure that you comply with the website's terms of service and robots.txt file when scraping data.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Designed and built a Selenium web scraper to automate data extraction from an e-commerce site. Scraped product details using XPath for accurate element location and handled dynamic content with sleep intervals. Data was cleaned and exported to CSV files using Pandas, streamlining product info collection for market research and inventory management.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages