-
Notifications
You must be signed in to change notification settings - Fork 1
1. Network structure
Rough first draft!! Handle with care
Under their 7th Framework programme (FP7), the European Commission funded 25,788 research projects, delivered by about 31,000 organizations (normally organized in consortia). The Commission's choice not to output their organization-related data by unique identifier (Participant Identification Code or PIC in EC parlance) makes these data quite messy: we cannot be precise on the number of participating organizations, but they are definitely between 31 and 32 thousands. Together, projects and organizations give rise to a graph defined as follows: both projects and organizations are nodes; participation of organization 1 to project A is represented by an edge connecting 1 to A. This graph has 57,644 nodes and 135,911 edges.
We next projected this graph onto an organization-to-organization graph: organization 1 is connected to organization 2 if they participated together in a project. 1 and 2 can be connected by more than one edge; if they participated together in 3 projects, they are connected by 3 edges. This graph has over 31,000 nodes and almost 900,000 edges. Like all affiliation networks, this graph is very dense, impossible to visualize in any meaningful way. We reduced it by "stacking" all edges connecting the same pair of organizations; in our example, organization 1 co-participating with organization 2 in 3 projects would now be give rise to only one edge connecting them. The edge encodes the number of projects that the two organizations have been partners in. The reduced graph has the same number of nodes, 31,000, and about 680,000 edges. The maximum number of collaborations for a pair of partners is 127, and it concerns two French organizations, the Centre National de la Recherche Scientifique and the Commissariat National a l'Energie Atomique et aux Energies Alternatives.
This graph consists of 351 small "islands", representing organizations that co-participate only with each other, and only to one project; and one giant component, that includes 98% of nodes and over 99.9% of the edges. The "stacked edges" FP7 graph. The giant component is obviously bottom left.
The giant component of the "stacked edges" FP7 graph
This graph has a degree distribution that, at least to visual inspection (warning! No formal statistical goodness-of-fit test yet!!!) looks like a very fat-tailed distribution. In fact, it looks like a power law with some extra-fat in the tail. This is consistent with the following two statements (no causality implied at this stage):
-
the probability of a newcomer in the network of FP7 participants to partner up with existing organizations is proportional to its degree, i.e. organizations who have co-participated with many other organizations are more likely to get even more new partners. This dynamics is known as preferential attachment.
-
even accounting for preferential attachment, there are more highly connected hubs in the network than you would expect. So, there could be forces giving extra advantages to large, highly connected organizations. The most connected organizations are the Fraunhofer Institut in Germany (partnered with 6979 organizations); the Centre National de la Recherche Scientifique in France (partnered with 4244 organizations); and the Consiglio Nazionale delle Ricerche in Italy (partnered with 4098 organizations).
The degree distribution of the "stacked edges" FP7 graph, on a log-log scale
At the deep center of this component, we found a mathematical structure called a K-core of degree 159. This gathers 160 organizations that are connected to at least 159 other organizations each. Interestingly, they give rise to a complete subnetwork: each of them is connected to every other organization in the core. This, however, turned out to be a mathematical artefact, generated by a single very large project (EGEE-III) with 160 partners.
The EGEE-III project is the largest, but, by construction, all projects give rise to cliques in this network. This can be misleading, suggesting highly cohesive structures that are, really, only the reflex of several partners doing one project together. To get around the problem, we dropped edges encoding co-partnerships around one single project. The remaining graph shows relationships of stable partnerships: each edge now links two organisations that have collaborated with each other on at least two projects in the course of FP7. This graph has the name number of nodes as the unreduced one (over 31,000), but far fewer edges: about 98,000. Most organisations – over 22,000 – turn out to have degree 0: that is, they have not partnered up with any other organisation more than once in the course of FP7. About 200 organisations with at least one stable partnership are to be found in small connected components with 2 to 7 nodes each. Many are 4- and 5-cliques, which seems to indicate consortia that applied successfully to more than one call while not engaging with the broader FP7 scene. Finally, the stable partnerships graph has its own giant component, that includes about 9,000 organisations connected by about 96,000 edges.
The giant component of the stable partnerships relationships in FP7 graph. Nodes are color-coded for number of stable relationships: redder nodes indicate a higher number of stable partners.
The highest-connected organisations in the stable partnerships graph are the same as in the more general FP7 partnerships graphs. Fraunhofer tops the bill with 1936 stable partnerships, followed by Centre National de la Rechérche Sciéntifique (1,471), Consiglio Nazionale delle Ricerche (1,246) and Commissariat à l'Energie Atomique et aux Energies Alternatives (1,121). No other organisation in FP7 has more than 1,000 stable partners.
It is worthy of note that the stable partnerships graph has 1,836 nodes with degree 1. Of these, 1,697 are in the stable partnerships giant component. These peripheral organisations have participated in at least 2 FP7 projects each, but they only have access to the broader FP7 stable partnership network through one partner each. This might signal a de facto subordinate position of the peripheral organisations with respect to their better connected partners. We have found 707 intermediary organisations in the giant component; these have more than one stable partnership connection, and at least one of their connections is to a stable partner that has no other stable partner than them. Fraunhofer is far and away the most powerful intermediary: it is the only stable partner of 117 organisations. Centre National de la Rechérche Sciéntifique is a distant second, at 28. Universidad de las Andes (Colombia) is the only stable partner of 25 organisations; Bundesministerium fuer Gesundheit (Austria) of 22; Stichting Dienst Landbouwkundig Onderzoek (The Netherlands) of 21, Consiglio Nazionale delle Ricerche of 20.
Even after disposing of the mathematical effects aforementioned, we still find a highly cohesive K-core at the heart of the FP7 stable partnership network, a K-core with K=80. Its 153 members form a very dense subnetwork: there are 8,324 within-core stable collaboration edges out of a possible maximum of 11,628. Fraunhofer, Centre National de la Rechérche Scientifique and Centro Nazionale delle Ricerche all are in the core. Upon visual inspection, it seems that diversity in the core is low. All organisations seem to be universities or large research institutes; we cannot be sure of this because the data do not contain a field with the organisations' legal status (for example "university" or "private company"), though these data do exist in the EC's databases. A complete list of the organisations in the core can be found here.
The Death Star in the network