Skip to content

Bloom filter implementation in Python with json serialization/deserialization wrappers for easy interfacing with MongoDB.

Notifications You must be signed in to change notification settings

mohummedalee/persistent-bloom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

persistent_bloom

Bloom filter implementation in Python with json serialization/deserialization wrappers for easy interfacing with MongoDB.

Note 1: In the below readme, I refer to each Bloom filter as the haystack and the key to be inserted as the pin.

Note 2: Whatever DB you use should included a collection called bloom

There are two interfaces to the library:

The Batch Loader: bloom-batch.py

Usage: python bloom-batch.py <events_file> where events_file should be a csv with each line of the form: haystackID, pin

The environment variables you can change to configure this script are:

  • DB_HOST
  • DB_PORT

The API: bloom-api.py

Usage: The two web services provided are:

  • exists/haystackID,pin
  • add/haystackID,pin

Can easily be called like http://localhost:8888/add/55488b38f405c70300efceb2,powpow

The API will maintain the log in a file called bloom.log.

Things configurable through environment variables:

  • DB_HOST
  • DB_PORT
  • DB_NAME
  • BLOOMAPI_PORT
  • BLOOMAPI_ADDR

Dependencies:

  • mmh3
  • bitarray
  • tornado

Unit tests are available in the Tests folder and can be run like: python -m unittest Tests.test_name

About

Bloom filter implementation in Python with json serialization/deserialization wrappers for easy interfacing with MongoDB.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages