Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable area identifiers #326

Closed
torotil opened this issue Jun 5, 2018 · 5 comments
Closed

Stable area identifiers #326

torotil opened this issue Jun 5, 2018 · 5 comments

Comments

@torotil
Copy link
Contributor

torotil commented Jun 5, 2018

I’m trying to integrate an external service with mapit. For this I have to map data in the external service to areas in mapit.

Currently there two ways to uniquely identify an area in mapit within an installation. Both seem not ideal for different reasons:

  1. The area_id: A simple serial that is specific to the mapit installation. Another mapit installation might have a different area_id for the same physical area. → Using the area_id as identifier would bind the external service to one specific instance of mapit.
  2. Using the area name, type and parent area name. → The area names might change between different versions of the boundary dataset.

So I’m looking for a way to keep the mapping intact over updates of the mapit database or even reinstalls. Any suggestions?

@torotil torotil changed the title Provide stable area identifiers Stable area identifiers Jun 5, 2018
@dracos
Copy link
Member

dracos commented Jun 5, 2018

I'm not sure I have enough information here to provide good guidance, it would depend upon whether you have control over this MapIt installation you mention, what boundaries you are talking about, and so on.

The generic import script has a number of options depending upon your source data and particular use case. If your source areas have their own stable identifiers, they can be imported and then you can use them. If they don't, then MapIt IDs are pretty "stable" in that, bar reinstalling in a different manner (reinstalling in the same manner should end up with the same IDs, I would have thought), they won't change. It depends upon what you consider a MapIt area to be representing, in some manner. We use MapIt IDs ourselves as stable identifiers – though for full openness I agree this is not perfect, see e.g. https://github.com/mysociety/mapit/pull/322/files#diff-6f3ff8adb016a075c8b27d028fbe36d8R17 . But we certainly do know that areas change name (and when that happens we keep the same area - another option would be to create a new area, again this would depend upon your precise use case). So in our own cases, we have generally gone with your option 1.

But in general, if you don't have any external stable identifiers for your areas, and you don't wish to state that the MapIt IDs generated upon import are to be used as stable identifiers, then there's not much I can think you can do beyond creating your own stable identifiers for your areas first in whatever manner you wish to do so, and then importing them for use in MapIt.

@torotil
Copy link
Contributor Author

torotil commented Jun 5, 2018

Thanks for your answer. For now I’m dealing mainly with UK boundaries in my own instance of mapit.

If they don't, then MapIt IDs are pretty "stable" in that, bar reinstalling in a different manner (reinstalling in the same manner should end up with the same IDs, I would have thought), they won't change.

That’s only true when the whole history of imports is the same. So to get exactly the same IDs I’d have to import all generations in the exact same way. Importing just the current data would yield different area IDs.

I think source IDs are the closest to what I need. Thanks for the pointer! But in general I would be looking for the exact thing described in the diff you’ve referenced. Can you give me pointers to what exactly the currently included area codes (gss, ons, unit_id) mean?

@dracos
Copy link
Member

dracos commented Jun 5, 2018

"Importing just the current data would yield different area IDs." - indeed, that's why I said in the same manner :) And I think we had to do that once, long ago. Anyway, in that case, dumping the database and reimporting would be easier.

ons is no longer used. gss are the identifiers given to administrative boundaries by the UK’s Office for National Statistics, and are stable whilst the boundary remains basically the same (I believe tiny modifications (to stop splitting a new housing estate or similar) do not lead to a new identifier, whilst e.g. ward redrawing etc do), but nevertheless are identifiers for an area, not a concept. The UK government provide stable identifiers for the concept of local authorities e.g. https://local-authority-eng.register.gov.uk/ which https://mapit.mysociety.org/ includes. unit_id are the identifiers included by Ordnance Survey in their Boundary-Line product and I have no insight into their stability or not, and generally ignore them, but do import them.

An actual stable identifier for a ward (assuming you care about modifications in whatever way ONS do it) would be something like <council identifier>/<date of ward redrawing>/<ward name> - as you see, if you actually want to do this conceptually properly it gets complicated quickly. And why we have managed to get by with what we have okay for 15 years or so.

More generally, https://opencivicdata.readthedocs.io/en/latest/data/division.html or https://opencivicdata.readthedocs.io/en/latest/proposals/0002.html might be of interest, or in the UK what Democracy Club do for elections https://elections.democracyclub.org.uk/reference_definition/

@torotil
Copy link
Contributor Author

torotil commented Jun 5, 2018

Perhaps I’ll use gss numbers for now. They seem a lot better suited than the area ids even if they are not perfect.

Thanks again for taking the time to explain things!

@dracos
Copy link
Member

dracos commented Jun 5, 2018

No problem. Note MapIt has a special GSS-code lookup that might be useful to you at /area/<gss ID>. The underlying issue of concepts vs areas (see also e.g. #127) is something I have long carried and not been able to do anything about, which I feel bad about. But it on the whole works fine and provides use to many things, so I try not to let it get to me too much. We try our best :)

@dracos dracos closed this as completed Jun 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants