Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-150: Replace jar distribution strategy with bittorrent #71

Closed
wants to merge 20 commits into from

Conversation

ptgoetz
Copy link
Member

@ptgoetz ptgoetz commented Apr 10, 2014

This is a resurrection of a pre-apache pull request: nathanmarz/storm#629
(original discussion and code review can be found there)

I've updated it to merge cleanly with the latest master branch and fixed a bug that would prevent multi-lang topologies from working.

Using bittorrent to distribute topology code will make is easier to implement HA Nimbus easier since multiple nimbus nodes could seed to one another when a topology is submitted (i.e. each would have a full copy of the code). We could also incorporate it into leader election such that only a nimbus node with a complete copy of topology code could be elected leader.

It also helps alleviate the "thundering herd" issue where supervisors all hit nimbus at once to download jars.

I've tested this on multi-node clusters with regular, multi-lang, and trident topologies, as well as topology jars larger than 100MB.

public abstract class BasePeer {
private static final Logger LOG = LoggerFactory.getLogger(BasePeer.class);

protected HashMap<String, Client> clients = new HashMap<String, Client>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not very clear here what the key is to the clients. Could we either have a comment that this is topology ID, or a clearer name.

@revans2
Copy link
Contributor

revans2 commented Apr 15, 2014

For the most part things look OK. I have two concerns. The first one is that the ttorrent library is very noisy. There is a line output on both nimbus and the supervisors every 3 seconds for every topology. We can fix that with the correct logback config, but I would like that to be the default.

My other concern is with security. I know we have not put security into storm yet, but I have patches that I would like to push in with it. If we only support bit-torrent for distribution, then I have to figure out how to secure it, which is going to slow things down a lot. Could we do something similar to what we did for the messaging layer and have a pluggable API. Or at least have a config option to either use ttorrent or the Utils.downloadFile calls until I have time to dig into ttorrent.

@revans2
Copy link
Contributor

revans2 commented Apr 15, 2014

I have been doing some manual testing and if nimbus goes down it does not reseed previous torrents on startup.

  1. Bring up Nimbus.
  2. Submit a topology
  3. Kill Nimbus.
  4. Relaunch Nimbus.
  5. Launch Supervisor.

2014-04-15 19:05:33 c.t.t.c.a.Announce [WARN] The requested torrent does not exist on this tracker

@ptgoetz
Copy link
Member Author

ptgoetz commented Apr 15, 2014

Thanks for the review @revans2. I will address all when I'm better connected.

To be clear, I don't see this as a high priority -- more of a discussion point. So not a priority for the next release.

knusbaum pushed a commit to knusbaum/incubator-storm that referenced this pull request Feb 11, 2015
Handle nested config validation nulls distinctly
@d2r
Copy link

d2r commented Feb 23, 2015

This pull request has been open for most of a year. Should we close this for now if we are not planning to do anything about it?

@d2r
Copy link

d2r commented Oct 8, 2015

@ptgoetz @revans2,
shall we keep this pull request open?

@ptgoetz
Copy link
Member Author

ptgoetz commented Oct 8, 2015

@d2r Thanks for the reminder. Closing.

@ptgoetz ptgoetz closed this Oct 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants