Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SpreadSASLMemcachedCache: sharding values > 1MB #32

Open
reubano opened this issue Sep 21, 2013 · 12 comments
Open

Support SpreadSASLMemcachedCache: sharding values > 1MB #32

reubano opened this issue Sep 21, 2013 · 12 comments

Comments

@reubano
Copy link

reubano commented Sep 21, 2013

Does this library support spreadsaslmemcachedcache? See http://pythonhosted.org/Flask-Cache/.

SpreadSASLMemcachedCache – spreadsaslmemcachedcache

Same as SASLMemcachedCache however, it has the ablity to spread value across multiple keys if it is bigger than the memcached treshold which by default is 1M. Uses pickle.

@alevy
Copy link
Member

alevy commented Sep 21, 2013

I mean... no. this would be very difficult (and probably not desirable) to do exactly since pickle is very Python specific. However, doing something similar, where values are broken up would be relatively straight forward (i've done it for Dalli). Not sure if this makes sense as a separate library/code snippet. If you get around to this I'll definitely appreciate a pull request. Otherwise, I might get around to it.

In general, there are some tricky tradeoffs with the kinds of strategies you might use to spread values across multiple keys, that I'm not sure how best to generalize (as opposed to giving code snippets for how to write a custom thing for each app, or a higher level library with multiple options or something). But very interested in seeing what can be done.

@reubano
Copy link
Author

reubano commented Sep 22, 2013

Can you post the code you used? The implementation doesnt matter (pickle vs something else) as long i can replicate the result.

Sent from my iPhone

On Sep 21, 2013, at 9:29 PM, Amit Levy [email protected] wrote:

I mean... no. this would be very difficult (and probably not desirable) to do exactly since pickle is very Python specific. However, doing something similar, where values are broken up would be relatively straight forward (i've done it for Dalli). Not sure if this makes sense as a separate library/code snippet. If you get around to this I'll definitely appreciate a pull request. Otherwise, I might get around to it.

In general, there are some tricky tradeoffs with the kinds of strategies you might use to spread values across multiple keys, that I'm not sure how best to generalize (as opposed to giving code snippets for how to write a custom thing for each app, or a higher level library with multiple options or something). But very interested in seeing what can be done.


Reply to this email directly or view it on GitHub.

@reubano
Copy link
Author

reubano commented Nov 8, 2013

Bump

@alevy alevy closed this as completed Sep 11, 2014
@reubano
Copy link
Author

reubano commented Sep 12, 2014

Just curious why you closed this since the issue isn't solved.

@alevy
Copy link
Member

alevy commented Sep 12, 2014

Hrm... mostly I was just doing cleaning, and hadn't touched this issue in almost a year... but..

I wasn't convinced this is a good feature to have in this level of library since it would need to provide a specific choice amongst a set of imperfect semantics: what happens when one of the keys is evicted? do you store keys sequntially (i.e. mykey_1, mykey_2, etc) and read until you get a miss or have a base key that contains a list of subkeys? do you split up values automatically if it's too large to fit or do you have a separate command?

My hunch is that all of the answers are correct for some people and I'm not sure there is a "good" one for most.

However, I can mention what I've done in a certain app algorithmically:

set:

  1. split data into chunks < 1MB
  2. set base key to the number of chunks, retrieving the version of the base key (this is not supported in memjs, but could and should be)
  3. Do a compare-and-swap with the version number from the base key on each of the subsequent keys. If it fails with an earlier version number, redo using the new version number. If it fails with a later version number, abort -- another thread has override your previous work already.

get:

  1. Get the base key with version number
  2. Get all referenced subkeys 0-n ensuring until the version doesn't match. If the non-matched version is higher, start over as someone has overriden the value.

@reubano
Copy link
Author

reubano commented Sep 12, 2014

Well, I'm far from an expert on this topic, but why not just do it how the python version does it? There has to be a nodejs equivalent of pickle, right? At the least, can you post code showing how you worked around this issue before?

@reubano
Copy link
Author

reubano commented Sep 12, 2014

And to answer one of the questions, you would select a spread mode upon instantiating memjs. In that mode, any values > max size would be split up automatically. Also, what's wrong with reading until you get a miss? Couldn't you just put it in a loop and place that in a try/except block?

@alevy
Copy link
Member

alevy commented Sep 12, 2014

pickle doesn't help much here, it's just a serialization format, and in any case, memjs only deals with values of type String or Buffer which are already serialized. So as far as splitting things up, that's just:

// largeValue is a `Buffer` to split, maxSize is the maximum size of any part 
function split(largeValue, maxSize) {
  var parts = []
  while(largeValue.length > 0) {
    parts.push(largeValue.slice(0, maxSize);
    largeValue = largeValue.slice(maxSize);
  }
  return parts;
}

Here is a gist of some ruby skeleton code for spreading this across the cache like this with Dalli: https://gist.github.com/alevy/5512907

@alevy
Copy link
Member

alevy commented Sep 12, 2014

The problem with just iterating until you get a miss is two-fold:

  1. When you actually get a miss, how do you know if the miss is because you finished reading all the parts? Or because that sub-key was evicted in the cache? So you need some sort of terminator sequence, but then what if that sequence happens to exist in the value itself, so you need to escape it in the value, etc...
  2. It's very slow to retrieve a long value -- you have to wait for a response from each subkey in order to proceed to the next one (you could mitigate this by batching, say, 5 at a time, but that sucks for values smaller than 5 sub-keys). Ideally, you'd like to be able to run most of the requests "quietly" such that your not waiting a round trip on each request until sending the next one. This is the difference between retrieving a large value in <10ms vs. many 100's of ms.

I think spreading values automatically is problematic because it breaks performance assumptions. I expect that when I call a memcache function on a low-level library, I get the performance of a single request (should be on the order 1ms). If all of a sudden I get a 2x or more performance hit because my value is slightly larger, that's less intuitive behavior (for a performance sensitive library) IMO than an error.

@alevy
Copy link
Member

alevy commented Sep 12, 2014

Having said all that, I think this discussion you've helped highlight some missing lower level features that would make doing this easier and would also be useful on their own (specifically having a way to get at the version of get and set response, as well as supporting compare-and-swap). I'll open new issues for those (and /cc you). I don't mind at all including a utility function (although probably in a different namespace) to split up values in a particular way -- we've basically already pseudo-coded the algorithm in this thread :).

@reubano
Copy link
Author

reubano commented Oct 1, 2014

I think you've misunderstood me. Spreading values wouldn't happen automatically. It would be an option in the setup, e.g., client = memjs.Client.create({spread: True}). By default, spread would be False.

@alevy
Copy link
Member

alevy commented Oct 2, 2014

Oh interesting. OK, i don't mind that as much... reopening

@alevy alevy reopened this Oct 2, 2014
@dterei dterei changed the title support spreadsaslmemcachedcache Support SpreadSASLMemcachedCache: sharding values > 1MB Apr 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants