Skip to content

Latest commit

 

History

History
151 lines (104 loc) · 8.91 KB

README.md

File metadata and controls

151 lines (104 loc) · 8.91 KB

close_already - speeding up programs writing lots of files on Windows

GitHub Actions Crates.io Dependencies

Closing files on Windows is slow, taking 1-10 milliseconds compared to microseconds on MacOS, Linux, and friends. The "why?" is explained in this blog post by Gregory Szorc, which also suggests using thread pools to handle the closing of file handles on Windows. This is exactly what this crate implements, while being as unintruisive to the developer as possible. While not using this crate specifically, there are case studies in both rustup and Mercurial where this technique has massively improved performance

Should I use it?

If you're writing relatively small files in the order of magnitude of hundreds or greater, you would most likely benefit from close_already. It's designed to be easy to switch to and use, so try it out and benchmark it! Note that if your code is already trying to use multiple threads/cores to handle files (e.g. with rayon), your performance gains will be far more modest

Compatibility

Each listed backend comes with a corresponding feature backend-<name>. To use a non-default backend, set default-features = false and enable the corresponding backend-<name> feature

Supported backends:

  • threadpool - default, creates and uses its own OS-thread thread pool
  • blocking - uses blocking's thread pool
  • rayon - uses rayon's global thread pool
  • async-std - uses async-std's global executor. async_std's File is supported
  • smol - uses smol's global executor. smol's File is supported
  • tokio - uses tokio's global executor. tokio's File is supported. Enables the rt and fs features

How do I use it?

To add it to your project using the default threadpool backend:

cargo add close_already

Or with a different backend (see compatibility for available backends):

cargo add close_already -F backend-<name> --no-default-features

You can either construct a FastClose with FastClose::new, or take advantage of the FastCloseable trait and call .fast_close() to wrap your type. The File type of the standard library and any backends that provide an alternative are supported. That's it.

Or if you're more of a std::fs::read and std::fs::write user, then all the functions that can take advantage of close_already have been re-implemented in the fs module

What if I'm not always targeting/developing on Windows?

Not a problem! FastClose simply won't create/use a threadpool and send file closures to it, but all the same structs/methods/traits will be available so you don't need conditional compilation #[cfg]s everywhere

How does close_already work?

As explained, the basic principle is to provide a threadpool which handles file closures

This implementation uses a zero-sized wrapper type FastClose (no memory overhead, woo!), which has a custom Drop implementation, which will send the file handle to a thread pool when it's no longer needed, to allow multiple threads to parallelise the waiting time for file closures. The thread pool is lazily initialised when the first FastClose is dropped (using the newly stabilised OnceLock)*

The FastClose struct implements Deref and DerefMut, meaning you can completely ignore its existence for all intents and purposes, and then let the magic happen as it goes out of scope

The best part is how concise the solution is to implement, with the basic core logic taking under 30 lines; with most of the bulk coming from delegating trait implementations and providing standard library convenience function equivalents

(* on non-threadpool backends, the global thread pool / executor is used)

Does it work?

Synthetic benchmarks

Below are the pure write performance times on my machine (Ryzen 5600, Sabrent Rocket 4 NVMe SSD) against the non-async backends. The benchmark involved writing the ~2300 .glif files from within the Roboto Regular UFO

Writing/std::fs/Roboto-Regular.ufo
                        time:   [1.4257 s 1.4484 s 1.4712 s]
Writing/close_already blocking/Roboto-Regular.ufo
                        time:   [1.3094 s 1.3155 s 1.3223 s]
Writing/close_already rayon/Roboto-Regular.ufo
                        time:   [1.2031 s 1.2134 s 1.2241 s]
Writing/close_already threadpool/Roboto-Regular.ufo
                        time:   [1.2057 s 1.2143 s 1.2241 s]

In summary, you can look to see 9-16% effective decrease in write times, though this of course will depend on the workload

Case study: norad

norad is a library that supports the Unified Font Object standard, a source file format for fonts notorious for having a very large number of files. For example, check out Roboto Regular, the example 'decent size' font used in the below benchmark

Comparing single-threaded norad (i.e. default features) with and without close_already:

norad (default):
write Roboto-Regular.ufo
                        time:   [2.0756 s 2.0973 s 2.1211 s]
norad (default) + close_already (threadpool):
write Roboto-Regular.ufo
                        time:   [975.15 ms 1.0152 s 1.0596 s]

Twice as fast!

How about an already-multi-threaded workload? norad has opt-in rayon support:

norad (rayon):
write Roboto-Regular.ufo
                        time:   [867.16 ms 922.49 ms 985.35 ms]
norad (rayon) + close_already (rayon):
write Roboto-Regular.ufo
                        time:   [831.17 ms 871.48 ms 915.87 ms]

Still over 10% faster, despite the 2x speed-up norad already gained from using rayon!

You can run the numbers yourself on my fork using cargo bench and the before/after tags

close_already is being used in norad as of v0.14 for all workloads

Contributing

There's a Justfile for ease of running checks & tests across multiple backends. It requires cargo-hack to be installed, and the x86_64-pc-windows-msvc target for your toolchain. Run just to see available recipes

Please ensure your code is formatted with nightly rustfmt and there are no Clippy lints for any backend when submitting your PR

I want to add support for _____ backend!

Go for it! Put it behind a feature gate, add the feature name to the mutually_exclusive_features::exactly_one_of! block at the top of lib.rs, and then add a new definition of Drop::drop for windows::FastClose that's enabled by your feature flag. If you're lazily initialising your own thread pool / executor, you'll naturally need a static OnceLock as well, the same as how backend-threadpool works. That's it!

In the case of async backends that provide their own file types, you may also want to implement FastCloseable on that type, and forward any relevant traits (e.g. Async{Read,Seek,Write}). See mod smol_impls for an example

I want to add support for _____ trait that I need!

Go for it! Make sure the generic bounds include H: Send + 'static, and it should work out just fine. If the trait you're adding support for is not part of the standard library (or is on nightly), please put it behind a feature gate (default off)

License

MIT or Apache 2, at your option (the same as Rust itself)