Skip to content

Bloom Checker: A smart tool using Bloom filters to verify email lists efficiently with a user-friendly GUI, handling large datasets with ease and accuracy.

License

Notifications You must be signed in to change notification settings

Nekyro/Bloom-Checker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bloom Checker

Overview

Bloom Checker is a fast and efficient tool for verifying whether an email or dataset item is present in a database. Using the Bloom Filter algorithm, it provides quick results with low memory usage, perfect for handling large datasets.

Background & Problem Context

The Cache Penetration Problem

Imagine an email verification service that needs to check if millions of email addresses exist in a database. A common implementation might look like this:

def check_email(email):
    # First, check cache
    if cache.get(email):
        return True
    
    # If not in cache, check database
    if database.exists(email):
        cache.set(email, True)
        return True
        
    return False

This approach faces two significant challenges:

  1. Cache Miss: When a valid email isn't in the cache but exists in the database:

    Client → Cache (Miss) → Database (Found) → Update Cache
    

    This creates one extra unnecessary lookup, but it's manageable.

  2. Cache Penetration: When checking non-existent emails:

    Client → Cache (Miss) → Database (Not Found) → No Cache Update
    

    This becomes problematic when:

    • Attackers deliberately query non-existent emails
    • Each query unnecessarily hits both cache and database
    • System resources are wasted on known-invalid queries

The Bloom Filter Solution

Bloom Checker solves this by adding a Bloom Filter as a preliminary check:

Client → Bloom Filter → Cache → Database

When checking an email:

  • If Bloom Filter says "No" → Email definitely doesn't exist (stop here)
  • If Bloom Filter says "Yes" → Email might exist (proceed to cache/database)

Real-world example:

# Without Bloom Filter:
check_email("[email protected]")  # Cache miss + DB query wasted
check_email("[email protected]") # Cache miss + DB query wasted
check_email("[email protected]") # Cache miss + DB query wasted

# With Bloom Checker:
check_email("[email protected]")  # Bloom Filter: No (stops here)
check_email("[email protected]") # Bloom Filter: No (stops here)
check_email("[email protected]") # Bloom Filter: No (stops here)

Benefits:

  • Protects against DoS attacks using non-existent emails
  • Reduces unnecessary database load
  • Extremely memory efficient (10 million emails ≈ 15MB of memory)
  • Quick response times (O(k) where k is number of hash functions)

Key Features

  • Fast Email Verification: Quickly checks whether an email is probably in the database or definitely not.
  • Bloom Filter Algorithm: Implements the space-efficient probabilistic data structure to minimize memory usage.
  • Low False Positive Rate: Configurable false positive rates to suit different application needs.
  • Customizable Parameters: Adjust the size of the Bloom Filter and the number of hash functions based on the dataset size.
  • Graphical User Interface (GUI): Intuitive and easy-to-use interface built with Tkinter.
  • File Input: Supports CSV files for email lists and results display.

Installation

  1. Clone the repository:

    git clone https://github.com/Yamil-Serrano/Bloom-Checker.git
  2. Navigate to the project directory:

    cd Bloom-Checker
  3. Install required dependencies:

    pip install -r requirements.txt

Usage

  1. Run the application:

    python main.py
  2. Use the interface to:

    • Select the initial database CSV file.
    • Select the verification CSV file.
    • View the verification results in the interface, with color-coded outputs:
      • Green: The email is probably in the database.
      • Red: The email is definitely not in the database.
  3. Adjust the false positive rate directly in the main.py script if needed.

Example CSV Format

Initial Database File

Email Address
[email protected]
[email protected]
[email protected]

Verification File

Email Address
[email protected]
[email protected]

Screenshot of the Interface

image

Icon Attribution

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

Contact

For questions, suggestions, or contributions, please reach out via:

About

Bloom Checker: A smart tool using Bloom filters to verify email lists efficiently with a user-friendly GUI, handling large datasets with ease and accuracy.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages