Skip to content

3. How the EC could improve these data

Alberto Cottica edited this page Apr 23, 2015 · 2 revisions

This project used as source files the FP7 projects extract and the FP7 organisations files. While FP7 data are interesting, working on them we found room for improvement in the way the data are released to the public. We recommend the European Commission's open data portal admins do the following:

  1. store the data in non-proprietary format. Excel, really? JSON would be great, but even CSV is an improvement.
  2. include a unique identifier field for organisations – the PIC code is a natural candidate. As it is, organisations are identified by a text field, and there are many, MANY mistakes and inconsistencies in how text fields are filled. Fore the same entity, you will often have ten or more variants (Some Company, SomeCompany, Some Company Ltd, Some Company Ltd., SOME COMPANY, Some Comprany...). Failure to do so implies – like it did for us – long and tedious hours of reconciliation with OpenRefine to obtain data that are minimally meaningful. Doing that while knowing that the EC does have and use unique identifiers feels dreary and disrespectful.
  3. include a field for the nature of the organisation: university, corporation, NGO, etc. Again, the CORDIS validation service has this information, so it should be open.
  4. include the amount of the EC contributions allocated to each partner in the consortium, not just the overall contribution to the project. Ditto.
  5. include similarly structured data on unfunded projects. This would allow proper causal analysis of the influence of network variables (eg. network centrality) of writing a successful FP7 funding application.

All three moves are extremely simple and cheap, but they would improve dramatically the re-usability of the data.

Clone this wiki locally