Enhance test for util function `is_np_dict_equal` #4092

DanielYang59 · 2024-10-03T03:19:04Z

Summary

Enhance test for is_np_dict_equal with different data types, follow up Fix dict equality check with numpy array #4086

DanielYang59 · 2024-10-03T03:26:57Z

The experimental implementation (with try-except) is slightly slower than using np.array_equal:

Array size: 10
is_np_dict_equal: 3.38 µs
is_np_dict_equal_try_except: 2.03 µs
--------------------------------------------------
Array size: 100
is_np_dict_equal: 1.64 µs
is_np_dict_equal_try_except: 1.77 µs
--------------------------------------------------
Array size: 1000
is_np_dict_equal: 1.86 µs
is_np_dict_equal_try_except: 2.29 µs
--------------------------------------------------
Array size: 10000
is_np_dict_equal: 4.20 µs
is_np_dict_equal_try_except: 7.01 µs
--------------------------------------------------
Array size: 100000
is_np_dict_equal: 25.15 µs
is_np_dict_equal_try_except: 43.69 µs
--------------------------------------------------
Array size: 1000000
is_np_dict_equal: 330.77 µs
is_np_dict_equal_try_except: 424.17 µs
--------------------------------------------------

Still not sure what dtype would cause np.array_equal to fail. Input would be hugely appreciated.

Benchmark script (GPT)

import timeit
import numpy as np
from pymatgen.util.misc import is_np_dict_equal, is_np_dict_equal_try_except


# Function to run the benchmarks
def benchmark(func, dict1, dict2, runs=10):
    timings = timeit.repeat(lambda: func(dict1, dict2), repeat=runs, number=1)
    avg_time = (sum(timings) / runs) * 1e6  # Convert to microseconds
    return avg_time

# Function to generate dictionaries with np arrays of different sizes
def generate_dict(size):
    arr = np.random.rand(size)
    dict1 = {"a": arr}
    dict2 = {"a": arr.copy()}  # Ensure they are equal but independent
    return dict1, dict2

# Sizes to test
sizes = [10, 100, 1000, 10_000, 100_000, 1_000_000]

# Benchmark each size
for size in sizes:
    dict1, dict2 = generate_dict(size)

    # Benchmark is_np_dict_equal
    avg_time_equal = benchmark(is_np_dict_equal, dict1, dict2)

    # Benchmark is_np_dict_equal_try_except
    avg_time_equal_try_except = benchmark(is_np_dict_equal_try_except, dict1, dict2)

    print(f"Array size: {size}")
    print(f"is_np_dict_equal: {avg_time_equal:.2f} µs")
    print(f"is_np_dict_equal_try_except: {avg_time_equal_try_except:.2f} µs")
    print("-" * 50)

DanielYang59 added 2 commits October 3, 2024 11:11

add unit test for more data types

18d9d53

Add experimental is_np_dict_equal_try_except function

dfdf737

DanielYang59 added 3 commits October 10, 2024 12:28

Merge branch 'master' into enhance-test-np-dict-equal

8215660

remove experimental implementation for now

843bb2d

Merge branch 'master' into enhance-test-np-dict-equal

c8f2715

DanielYang59 marked this pull request as ready for review October 13, 2024 05:35

DanielYang59 requested review from shyuep and mkhorton as code owners October 13, 2024 05:35

Merge branch 'master' into enhance-test-np-dict-equal

e20103d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance test for util function `is_np_dict_equal` #4092

Enhance test for util function `is_np_dict_equal` #4092

DanielYang59 commented Oct 3, 2024

DanielYang59 commented Oct 3, 2024

Enhance test for util function is_np_dict_equal #4092

Are you sure you want to change the base?

Enhance test for util function is_np_dict_equal #4092

Conversation

DanielYang59 commented Oct 3, 2024

Summary

DanielYang59 commented Oct 3, 2024

Enhance test for util function `is_np_dict_equal` #4092

Enhance test for util function `is_np_dict_equal` #4092