Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve in-degree distributions visualizations #18

Open
albertocottica opened this issue Jul 1, 2015 · 9 comments
Open

Improve in-degree distributions visualizations #18

albertocottica opened this issue Jul 1, 2015 · 9 comments
Assignees
Milestone

Comments

@albertocottica
Copy link
Owner

This is what we have:
!(https://github.com/albertocottica/communities-network-design/blob/master/Pictures/inDegDIstroCompared.png)

It would be nice to take full control of the drawings, so that that they all have the same scale etc.

Unfortunately, something broke in my configuration; I can still run powerlaw.py from iPython, but I can no longer produce pictures. What it boils down to is that I need a back end for MatPlotLib.

Ben: maybe you can try to do better?

@renoust
Copy link
Collaborator

renoust commented Jul 9, 2015

Matplotlib seems a good option!
I can automatize the generation of pictures similarly to what you did,
and even integrate them in a single shot, a bit like in the file attach
(it's just an example of overlays).
However what do you mean by same scale?

Anyway, this is a distribution we're showing, so I've played around
displaying histograms and so on,
but it seems that this kind of drawing shows it as best. I can tune,
colors, legends, etc.
This baseline, outlines of the PDF with log scales seems the best to
represent the distributions.

Benjamin

On 2 July 2015 at 04:23, Alberto Cottica [email protected] wrote:

Assigned #18
#18
to @renoust https://github.com/renoust.


Reply to this email directly or view it on GitHub
#18 (comment)
.

@albertocottica
Copy link
Owner Author

That is matplotlib.

Same scale means that the two boxes will have the same size, and that they will comprise the same intervals (on the x axis, from k^0 to k^3).

@renoust
Copy link
Collaborator

renoust commented Jul 10, 2015

Of course I know :) Powerlaw actually produces Matplotlib axes.
I see better what you mean: it's to emphasize the comparison, right.
However, I can't put my hand on the Edgeryders' and InnovatoriPA's data,
can you put the degree distributions in the datasets folder?
Thanks!

Benjamin

On 9 July 2015 at 22:07, Alberto Cottica [email protected] wrote:

That is matplotlib.

Same scale means that the two boxes will have the same size, and that they
will comprise the same intervals (on the x axis, from k^0 to k^3).


Reply to this email directly or view it on GitHub
#18 (comment)
.

@albertocottica
Copy link
Owner Author

@renoust
Copy link
Collaborator

renoust commented Jul 17, 2015

Ok, here is a proposal:
https://github.com/albertocottica/communities-network-design/blob/master/Pictures/PDF_Edgeryders_figure_1.png
https://github.com/albertocottica/communities-network-design/blob/master/Pictures/PDF_InnovatoriPA_figure_1.png

I'm cumulating data from 10 generations over each simulation, but I'm
currently running up to 1000 of each for a better accuracy.

I order each curve from (nu_1, nu_2), then plot them with a gradient of
green,
then plot the real data in red, and the corresponding simulation (no
onboarding for InnovatoriPA, (1,1) for Edgeryders) in green.

I'm planning to make the gradient ascending for Edgeryders, and descending

for InnovatoriPA, but here you have an idea.

Benjamin

On 10 July 2015 at 16:42, Alberto Cottica [email protected] wrote:

Done!

https://github.com/albertocottica/communities-network-design/tree/master/Datasets/RealWorldDegrees


Reply to this email directly or view it on GitHub
#18 (comment)
.

@renoust
Copy link
Collaborator

renoust commented Jul 23, 2015

I stopped the generation at 600 for each model because it was time
consuming (it took a few days),
but if you feel it's statistically worth it to go up to 1000 for each, no
problem, I'll generate some more.
So I've pushed some more pictures, also a comparison of the degree
distribution of all generated models,
in hope it helps :)

Benjamin

On 17 July 2015 at 15:50, Benjamin Renoust [email protected] wrote:

Ok, here is a proposal:

https://github.com/albertocottica/communities-network-design/blob/master/Pictures/PDF_Edgeryders_figure_1.png

https://github.com/albertocottica/communities-network-design/blob/master/Pictures/PDF_InnovatoriPA_figure_1.png

I'm cumulating data from 10 generations over each simulation, but I'm
currently running up to 1000 of each for a better accuracy.

I order each curve from (nu_1, nu_2), then plot them with a gradient of
green,
then plot the real data in red, and the corresponding simulation (no
onboarding for InnovatoriPA, (1,1) for Edgeryders) in green.

I'm planning to make the gradient ascending for Edgeryders, and descending

for InnovatoriPA, but here you have an idea.

Benjamin

On 10 July 2015 at 16:42, Alberto Cottica [email protected]
wrote:

Done!

https://github.com/albertocottica/communities-network-design/tree/master/Datasets/RealWorldDegrees


Reply to this email directly or view it on GitHub
#18 (comment)
.

@albertocottica
Copy link
Owner Author

No, Ben. I think this does not tell the story we want.

First of all, there is an issue of consistency of the viz with the data:

  • Innovatori => no onboarding. So it should be compared with the control group.
  • Edgeryders => with onboarding (but we do not know how effective the onboarding or how responsive the community.

In the case of Innovatori, having the small curves is just misleading.

In the case of Edgeryders, the small curves make sense, but emphasising one in particular does not.

But the more important problem is this: if you draw a bunch of curves they will look like a thick straight line. We know from the data that, when onboarding is present, this is not the case: the goodness-of-fit test is strongly rejected. If we want to make the point, I think we are down to comparing ONE curve (real-world data) with ONE curve (simulated data). Moreover, I am not convinced they should be in the same diagram: exponents could be different. All we are want to illustrate is that they are straight or not. The way that works best for me is still:

! Innovatori PA is a straight line
Edgeryders is arched downwards

We would need to do the same for generated data, with and without onboarding, and then we are done.

@renoust
Copy link
Collaborator

renoust commented Aug 25, 2015

After offline progress/discussions with @albertocottica:

Because we are submitting to a journal, potentially with "unlimited" space:

Benjamin

On 20 August 2015 at 01:19, Alberto Cottica [email protected]
wrote:

No, Ben. I think this does not tell the story we want.

First of all, there is an issue of consistency of the viz with the data:

  • Innovatori => no onboarding. So it should be compared with the
    control group.
  • Edgeryders => with onboarding (but we do not know how effective the
    onboarding or how responsive the community.

In the case of Innovatori, having the small curves is just misleading.

In the case of Edgeryders, the small curves make sense, but emphasising
one in particular does not.

But the more important problem is this: if you draw a bunch of curves they
will look like a thick straight line. We know from the data that, when
onboarding is present, this is not the case: the goodness-of-fit test is
strongly rejected. If we want to make the point, I think we are down to
comparing ONE curve (real-world data) with ONE curve (simulated data).
Moreover, I am not convinced they should be in the same diagram: exponents
could be different. All we are want to illustrate is that they are straight
or not. The way that works best for me is still:

! Innovatori PA is a straight line
https://github.com/albertocottica/communities-network-design/blob/fe0bb05d5d069dcd5fd499bb3c485207f5d9c25e/Pictures/InnovatoriPA%20degree%20distribution.png
[image: Edgeryders is arched downwards]
https://github.com/albertocottica/communities-network-design/blob/fe0bb05d5d069dcd5fd499bb3c485207f5d9c25e/Pictures/Edgeryders%20degree%20distribution.png

We would need to do the same for generated data, with and without
onboarding, and then we are done.


Reply to this email directly or view it on GitHub
#18 (comment)
.

@renoust
Copy link
Collaborator

renoust commented Aug 25, 2015

Following the last point, I've uploaded a series of pictures named
"comparison_"...
They display pdf of the 600 generations with different parameters, with the
following color coding:

  • in dotted red: no on boarding generations
  • in dotted blue: on boarding with pref. att. nu1 = nu2 = 0
  • in dotted green: on boarding with nu1 = nu2 = 1
  • in thin plain light green: generated curves with the varying parameters

most of the curves only compare "no onboarding" with all generated curves
(in title is the fixed parameter) even though we've proven nu2 to be
ineffective... so the most interesting essentially compare no onboarding
with different values of nu1, one by one.

2 other views are available:

Benjamin

On 25 August 2015 at 10:32, Benjamin Renoust [email protected] wrote:

After offline progress/discussions with @albertocottica:

Because we are submitting to a journal, potentially with "unlimited" space:

Benjamin

On 20 August 2015 at 01:19, Alberto Cottica [email protected]
wrote:

No, Ben. I think this does not tell the story we want.

First of all, there is an issue of consistency of the viz with the data:

  • Innovatori => no onboarding. So it should be compared with the
    control group.
  • Edgeryders => with onboarding (but we do not know how effective the
    onboarding or how responsive the community.

In the case of Innovatori, having the small curves is just misleading.

In the case of Edgeryders, the small curves make sense, but emphasising
one in particular does not.

But the more important problem is this: if you draw a bunch of curves
they will look like a thick straight line. We know from the data that, when
onboarding is present, this is not the case: the goodness-of-fit test is
strongly rejected. If we want to make the point, I think we are down to
comparing ONE curve (real-world data) with ONE curve (simulated data).
Moreover, I am not convinced they should be in the same diagram: exponents
could be different. All we are want to illustrate is that they are straight
or not. The way that works best for me is still:

! Innovatori PA is a straight line
https://github.com/albertocottica/communities-network-design/blob/fe0bb05d5d069dcd5fd499bb3c485207f5d9c25e/Pictures/InnovatoriPA%20degree%20distribution.png
[image: Edgeryders is arched downwards]
https://github.com/albertocottica/communities-network-design/blob/fe0bb05d5d069dcd5fd499bb3c485207f5d9c25e/Pictures/Edgeryders%20degree%20distribution.png

We would need to do the same for generated data, with and without
onboarding, and then we are done.


Reply to this email directly or view it on GitHub
#18 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants