Bloom Checker

Overview

Bloom Checker is a fast and efficient tool for verifying whether an email or dataset item is present in a database. Using the Bloom Filter algorithm, it provides quick results with low memory usage, perfect for handling large datasets.

Background & Problem Context

The Cache Penetration Problem

Imagine an email verification service that needs to check if millions of email addresses exist in a database. A common implementation might look like this:

def check_email(email):
    # First, check cache
    if cache.get(email):
        return True
    
    # If not in cache, check database
    if database.exists(email):
        cache.set(email, True)
        return True
        
    return False

This approach faces two significant challenges:

Cache Miss: When a valid email isn't in the cache but exists in the database:
```
Client → Cache (Miss) → Database (Found) → Update Cache
```
This creates one extra unnecessary lookup, but it's manageable.
Cache Penetration: When checking non-existent emails:
```
Client → Cache (Miss) → Database (Not Found) → No Cache Update
```
This becomes problematic when:
- Attackers deliberately query non-existent emails
- Each query unnecessarily hits both cache and database
- System resources are wasted on known-invalid queries

The Bloom Filter Solution

Bloom Checker solves this by adding a Bloom Filter as a preliminary check:

Client → Bloom Filter → Cache → Database

When checking an email:

If Bloom Filter says "No" → Email definitely doesn't exist (stop here)
If Bloom Filter says "Yes" → Email might exist (proceed to cache/database)

Real-world example:

# Without Bloom Filter:
check_email("[email protected]")  # Cache miss + DB query wasted
check_email("[email protected]") # Cache miss + DB query wasted
check_email("[email protected]") # Cache miss + DB query wasted

# With Bloom Checker:
check_email("[email protected]")  # Bloom Filter: No (stops here)
check_email("[email protected]") # Bloom Filter: No (stops here)
check_email("[email protected]") # Bloom Filter: No (stops here)

Benefits:

Protects against DoS attacks using non-existent emails
Reduces unnecessary database load
Extremely memory efficient (10 million emails ≈ 15MB of memory)
Quick response times (O(k) where k is number of hash functions)

Key Features

Fast Email Verification: Quickly checks whether an email is probably in the database or definitely not.
Bloom Filter Algorithm: Implements the space-efficient probabilistic data structure to minimize memory usage.
Low False Positive Rate: Configurable false positive rates to suit different application needs.
Customizable Parameters: Adjust the size of the Bloom Filter and the number of hash functions based on the dataset size.
Graphical User Interface (GUI): Intuitive and easy-to-use interface built with Tkinter.
File Input: Supports CSV files for email lists and results display.

Installation

Clone the repository:

git clone https://github.com/Yamil-Serrano/Bloom-Checker.git

Navigate to the project directory:
```
cd Bloom-Checker
```
Install required dependencies:
```
pip install -r requirements.txt
```

Usage

Run the application:
```
python main.py
```
Use the interface to:
- Select the initial database CSV file.
- Select the verification CSV file.
- View the verification results in the interface, with color-coded outputs:
  - Green: The email is probably in the database.
  - Red: The email is definitely not in the database.
Adjust the false positive rate directly in the main.py script if needed.

Example CSV Format

Initial Database File

Email Address
[email protected]
[email protected]
[email protected]

Verification File

Email Address
[email protected]
[email protected]

Screenshot of the Interface

Icon Attribution

Lotus flower icons created by Freepik - Flaticon
File icons created by Good Ware - Flaticon

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

Contact

For questions, suggestions, or contributions, please reach out via:

GitHub: Neowizen

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
db_files		db_files
resources		resources
LICENSE.md		LICENSE.md
README.md		README.md
bloom_filter.py		bloom_filter.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bloom Checker

Overview

Background & Problem Context

The Cache Penetration Problem

The Bloom Filter Solution

Key Features

Installation

Usage

Example CSV Format

Initial Database File

Verification File

Screenshot of the Interface

Icon Attribution

License

Contact

About

Releases

Packages

Languages

License

Nekyro/Bloom-Checker

Folders and files

Latest commit

History

Repository files navigation

Bloom Checker

Overview

Background & Problem Context

The Cache Penetration Problem

The Bloom Filter Solution

Key Features

Installation

Usage

Example CSV Format

Initial Database File

Verification File

Screenshot of the Interface

Icon Attribution

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages