Skip to content

Writing an import script (WiP)

Will Roper edited this page Sep 26, 2024 · 1 revision

Accessing council exports

To do this you will need to speak to someone who can give you access to the relevant AWS accounts and s3 buckets.

There are two ways of running import scripts locally against the data we have received from councils. 1) be logged into the relevant AWS SSO profile when you run a script, and let the app handle grabbing the data. 2) Maintain a local mirror of the s3 bucket. Here's a bit more detail about the 2 approaches.

1) Use SSO profile

This is useful for one offs and reviews, and is easier in that you don't have to think about aws s3 cli commands and the sharp edges that go with them (overwriting/deleting data 🙀 ). OTOH if you're not on a decent network and want to run a script multiple times it is a bit slower.

How

Set S3_DATA_BUCKET = "<bucket-name>" in settings/local.py.

Then to run an import script:

aws sso login --profile prod-wdiv-dc
AWS_PROFILE=prod-wdiv-dc ./manage.py import_council_x

2) Maintain a local mirror

This is a bit more faff, as you have to go and make sure your local copy of the bucket is up to date. Once this is done though, import scripts will run faster as they don't need to fetch the csv each time.

How

Sync a folder somewhere on local with the s3 bucket.:

cd path/to/local/mirror/
aws --profile prod-wdiv-dc s3 sync s3://<bucket-name> . --exclude '*' --include '*2024*' --dryrun

(drop the --dryrun flag to actually do something)

Set PRIVATE_DATA_PATH = "<bucket-name>" in settings/local.py.

Then to run an import script:

./manage.py import_council_x

General Pointers

  • Import scripts are django management commands.
  • Import scripts extend one of a number of base classes defined in base_importers. These define functions for tidying up data and working with data in various formats. For a stations/districts importer your import script will need to implement the abstract methods station_record_to_dict(self, record) and district_record_to_dict(self, record).
  • We tag each import script with an array of election ids describing the election(s) this data relates to. This helps us to batch-import them.
  • Polling stations and districts are should be linked by an id/code. This mapping can be in either direction. i.e: 'this polling station serves this district' or 'this polling district is served by this station'. Sometimes the data allows you to go either way. Sometimes the 'shape' of the data dictates that the mapping must go one way or the other. Once the data is imported, the website's data_finder app uses the logic in PollingStationManager to work out a user's polling station based on their location.
  • After you've run an import script, it will produce a summary report which you can use to validate the data and spot potential problems.
  • You can also use the --verbosity switch on your import scripts to spit out some debug info which might be helpful.
  • Unfortunately, importing the data is sometimes not straightforward. There are various special cases that we sometimes have to deal with. These are documented in the comments in import_polling_stations and import_polling_districts