-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add a function for lazily generating integers #365
base: master
Are you sure you want to change the base?
Conversation
For the doc tests that are nondeterministic, use this Also, you should add test for this function. They should go in the |
@steven-cutting fixed the tests. There are two issues. For whatever reason, the python 2 tests are failing. Not sure why thats the case as The other issue is range is not lazy in python 2. What is your preference for doing either
or adding the future/builtins as a dependency? This project has no dependencies so don't want to add any. |
I should point out a couple of things. Before you put too much effort into this it would be a good idea to wait for feedback from the core maintainers (e.g. @mrocklin). They will have the final say about accepting this pull request. Also, try to reduce the number of commits you are generating to one. Squash your commits into one. I had the same problem the first time I submitted a pull request. As far as the issue with Here is my suggestion for the interface: # you can think of a better name.
def random_fixed_sample(seq, sample_size, random_state=None):
pass
In [2]: list(random_sample(range(100, 1000), 5))
Out[2]: [447, 484, 729, 770, 819] |
Also, a challenge with your current implementation is that each item does not have the same probability of being chosen. At the very least this should be pointed out in the documentation. |
In general I agree with @steven-cutting 's assessment of the interface. One goal of toolz is not to have too many functions. If we accept >>> list(take(5, random_sequence(range(100))))
[5, 29, 59, 19, 50] The new thing we would be adding is just the ability to generate a random stream from a fixed collection. I'm also not really in charge of this library any more though. @eriknw probably has the final word here. Just thought I'd put forward this idea of trying to find orthogonal functions. |
@mrocklin @steven-cutting I can agree with passing in the sequence, though you need to know the population size in order for this to work. If you pass in a generator, you can't find its size without evaluating it. @mrocklin Also if you don't pass the sample size into |
Ah sorry, I was thinking of consuming a fixed collection. But I see now that this wasn't your objective. I guess this opens up the question of "what is your objective?" If it is genuinely to produce a lazy sequence of integers in a particular range then I would respond that that this is too specific to be part of toolz. We endeavor to find operations that are generally useful in a wide variety of applications and operations that compose well with other operations to build more specific solutions. This may still be out of scope for toolz, but you might want to look into reservoir sampling? |
@mrocklin Yes this is an implementation of reservoir sampling. My reasoning for including it in the toolz library is it's a lazy method and generating random numbers seems like a generic enough of an operation. |
This PR adds a function for lazily generating a sample of numbers from a range of ints.