Skip to content

Dashboard Web Scraper

smsmith97 edited this page Mar 15, 2021 · 6 revisions

This will open every unique link of the longest postcodes/journeys for a given council_id (e.g. DER) as a command line argument. Only seems to be working with Chrome, requires selenium.

N.B.: Recently webdriver is unable to control Chrome version 89+. My solution has been to download Chrome version 87 and change the auto update time for Chrome (see below for commands to do this).

import sys
from selenium import webdriver

url = "http://127.0.0.1:8000/dashboard/council/" + sys.argv[1] + "/"

driver = webdriver.Chrome()
driver.get(url)
links = set(link.get_attribute('href') for link in driver.find_elements_by_partial_link_text(' '))
for link in links:
	if '/dashboard/postcode/' in link:
		 driver.execute_script("window.open('" + link +"')")
		 driver.switch_to.window(driver.current_window_handle)

Changing Chrome update time (may only work on OSX):

 defaults write com.google.Keystone.Agent checkInterval 200000000000000