Skip to content
This repository has been archived by the owner on Nov 15, 2024. It is now read-only.

daedalus/fastBloomFilter

Repository files navigation

Libraries.io SourceRank pypi downloads lint_python Upload Python Package CodeQL GitHub issues GitHub forks GitHub stars GitHub license

Simple and fast pythonic bloomfilter

From wikipedia: "A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with a "counting" filter); the more elements that are added to the set, the larger the probability of false positives."

This filter supports:

- Saving, reloading with pickle. 
- Stats
- Entropy analysis
- Internal and external hashing of data.
- raw filter merging

Installing:

sudo pip install fastbloomfilter

External creation of the bloom filter file:

python mkbloom.py /tmp/filter.blf

Importing:

>>> from fastBloomFilter import bloom
>>> bf = bloom.BloomFilter(array_size=1024**3)

Or

>>> from fastBloomFilter import bloom
>>> bf = bloom.BloomFilter(filename='/tmp/filter.blf')

Adding data to it:

>>> bf.add('30000')
>>> bf.add('1230213')
>>> bf.add('1')

Printing stats:

>>> bf.stat()

Or:

>>> bf.info()

Querying data:

>>> print(bf.query('1'))
True
>>> print(bf.query('1230213'))
True
>>> print(bf.query('12'))
False
>>> print(bf['1'])
True

Querying data and at the same time adding it:

>>> print(bf.update('1'))
False 
# False means the object wasn't existing and was added.
>>> print(bf.update('1')) 
True  
# True means the object existed and nothing new was added.
>>> print(bf.update('2'))
False
>>> print(bf.update('2'))
True

Merging two filters:

Create first filter:

>>> from fastBloomFilter import bloom
>>> bf1 = bloom.BloomFilter(array_size=1024**3)
>>> bf1.add("1")

Create second filter:

>>> from fastBloomFilter import bloom
>>> bf2 = bloom.BloomFilter(array_size=1024**3)
>>> bf2.add("2")

Merge the two filters into a third filter:

>>> bf3 = bf1 + bf2

Check the elements in the third filter:

>>> print(bf3["1"])
True
>>> print(bf3["2"])
True

Contributing

Contributons:
    Are welcome!
    Criteria: - They should not include hidden folders or files of any ide environment.
              - They should not delete big portions of the project.
              - They should not include files that does not have anything to do with the project.
              - They should not change the API. (API changes should be proposed with Issues as enhancements)
              - They should not include any obfuscated code.
              - They should not include binaries.
              - They should be in small PRs for faster reviewing process.
              - They should include a small testcase.
              - Any contribution not hornoring this criteria will be rejected until it does.