Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance test for util function is_np_dict_equal #4092

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

DanielYang59
Copy link
Contributor

Summary

@DanielYang59
Copy link
Contributor Author

The experimental implementation (with try-except) is slightly slower than using np.array_equal:

Array size: 10
is_np_dict_equal: 3.38 µs
is_np_dict_equal_try_except: 2.03 µs
--------------------------------------------------
Array size: 100
is_np_dict_equal: 1.64 µs
is_np_dict_equal_try_except: 1.77 µs
--------------------------------------------------
Array size: 1000
is_np_dict_equal: 1.86 µs
is_np_dict_equal_try_except: 2.29 µs
--------------------------------------------------
Array size: 10000
is_np_dict_equal: 4.20 µs
is_np_dict_equal_try_except: 7.01 µs
--------------------------------------------------
Array size: 100000
is_np_dict_equal: 25.15 µs
is_np_dict_equal_try_except: 43.69 µs
--------------------------------------------------
Array size: 1000000
is_np_dict_equal: 330.77 µs
is_np_dict_equal_try_except: 424.17 µs
--------------------------------------------------

Still not sure what dtype would cause np.array_equal to fail. Input would be hugely appreciated.

Benchmark script (GPT)
import timeit
import numpy as np
from pymatgen.util.misc import is_np_dict_equal, is_np_dict_equal_try_except


# Function to run the benchmarks
def benchmark(func, dict1, dict2, runs=10):
    timings = timeit.repeat(lambda: func(dict1, dict2), repeat=runs, number=1)
    avg_time = (sum(timings) / runs) * 1e6  # Convert to microseconds
    return avg_time

# Function to generate dictionaries with np arrays of different sizes
def generate_dict(size):
    arr = np.random.rand(size)
    dict1 = {"a": arr}
    dict2 = {"a": arr.copy()}  # Ensure they are equal but independent
    return dict1, dict2

# Sizes to test
sizes = [10, 100, 1000, 10_000, 100_000, 1_000_000]

# Benchmark each size
for size in sizes:
    dict1, dict2 = generate_dict(size)

    # Benchmark is_np_dict_equal
    avg_time_equal = benchmark(is_np_dict_equal, dict1, dict2)

    # Benchmark is_np_dict_equal_try_except
    avg_time_equal_try_except = benchmark(is_np_dict_equal_try_except, dict1, dict2)

    print(f"Array size: {size}")
    print(f"is_np_dict_equal: {avg_time_equal:.2f} µs")
    print(f"is_np_dict_equal_try_except: {avg_time_equal_try_except:.2f} µs")
    print("-" * 50)

@DanielYang59 DanielYang59 marked this pull request as ready for review October 13, 2024 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant