Closing files on Windows is slow, taking 1-10 milliseconds compared to microseconds on MacOS, Linux, and friends. The "why?" is explained in this blog post by Gregory Szorc, which also suggests using thread pools to handle the closing of file handles on Windows. This is exactly what this crate implements, while being as unintruisive to the developer as possible. While not using this crate specifically, there are case studies in both rustup and Mercurial where this technique has massively improved performance
If you're writing relatively small files in the order of magnitude of hundreds or greater, you would most likely benefit from close_already
.
It's designed to be easy to switch to and use, so try it out and benchmark it!
Note that if your code is already trying to use multiple threads/cores to handle files (e.g. with rayon
), your performance gains will be far more modest
Each listed backend comes with a corresponding feature backend-<name>
.
To use a non-default backend, set default-features = false
and enable the corresponding backend-<name>
feature
Supported backends:
threadpool
- default, creates and uses its own OS-thread thread poolblocking
- usesblocking
's thread poolrayon
- usesrayon
's global thread poolasync-std
- usesasync-std
's global executor.async_std
'sFile
is supportedsmol
- usessmol
's global executor.smol
'sFile
is supportedtokio
- usestokio
's global executor.tokio
'sFile
is supported. Enables thert
andfs
features
To add it to your project using the default threadpool
backend:
cargo add close_already
Or with a different backend (see compatibility for available backends):
cargo add close_already -F backend-<name> --no-default-features
You can either construct a FastClose
with FastClose::new
, or take advantage of the FastCloseable
trait and call .fast_close()
to wrap your type.
The File
type of the standard library and any backends that provide an alternative are supported.
That's it.
Or if you're more of a std::fs::read
and std::fs::write
user, then all the functions that can take advantage of close_already
have been re-implemented in the fs
module
Not a problem!
FastClose
simply won't create/use a threadpool and send file closures to it, but all the same structs/methods/traits will be available so you don't need conditional compilation #[cfg]
s everywhere
As explained, the basic principle is to provide a threadpool which handles file closures
This implementation uses a zero-sized wrapper type FastClose
(no memory overhead, woo!), which has a custom Drop
implementation, which will send the file handle to a thread pool when it's no longer needed, to allow multiple threads to parallelise the waiting time for file closures.
The thread pool is lazily initialised when the first FastClose
is dropped (using the newly stabilised OnceLock
)*
The FastClose
struct implements Deref
and DerefMut
, meaning you can completely ignore its existence for all intents and purposes, and then let the magic happen as it goes out of scope
The best part is how concise the solution is to implement, with the basic core logic taking under 30 lines; with most of the bulk coming from delegating trait implementations and providing standard library convenience function equivalents
(* on non-threadpool
backends, the global thread pool / executor is used)
Below are the pure write performance times on my machine (Ryzen 5600, Sabrent Rocket 4 NVMe SSD) against the non-async backends. The benchmark involved writing the ~2300 .glif files from within the Roboto Regular UFO
Writing/std::fs/Roboto-Regular.ufo
time: [1.4257 s 1.4484 s 1.4712 s]
Writing/close_already blocking/Roboto-Regular.ufo
time: [1.3094 s 1.3155 s 1.3223 s]
Writing/close_already rayon/Roboto-Regular.ufo
time: [1.2031 s 1.2134 s 1.2241 s]
Writing/close_already threadpool/Roboto-Regular.ufo
time: [1.2057 s 1.2143 s 1.2241 s]
In summary, you can look to see 9-16% effective decrease in write times, though this of course will depend on the workload
Case study: norad
norad
is a library that supports the Unified Font Object standard, a source file format for fonts notorious for having a very large number of files.
For example, check out Roboto Regular, the example 'decent size' font used in the below benchmark
Comparing single-threaded norad
(i.e. default features) with and without close_already
:
norad (default):
write Roboto-Regular.ufo
time: [2.0756 s 2.0973 s 2.1211 s]
norad (default) + close_already (threadpool):
write Roboto-Regular.ufo
time: [975.15 ms 1.0152 s 1.0596 s]
Twice as fast!
How about an already-multi-threaded workload? norad
has opt-in rayon
support:
norad (rayon):
write Roboto-Regular.ufo
time: [867.16 ms 922.49 ms 985.35 ms]
norad (rayon) + close_already (rayon):
write Roboto-Regular.ufo
time: [831.17 ms 871.48 ms 915.87 ms]
Still over 10% faster, despite the 2x speed-up norad
already gained from using rayon
!
You can run the numbers yourself on my fork using cargo bench
and the before
/after
tags
close_already
is being used in norad
as of v0.14 for all workloads
There's a Justfile for ease of running checks & tests across multiple backends.
It requires cargo-hack
to be installed, and the x86_64-pc-windows-msvc
target for your toolchain.
Run just
to see available recipes
Please ensure your code is formatted with nightly rustfmt
and there are no Clippy lints for any backend when submitting your PR
Go for it!
Put it behind a feature gate, add the feature name to the mutually_exclusive_features::exactly_one_of!
block at the top of lib.rs
, and then add a new definition of Drop::drop
for windows::FastClose
that's enabled by your feature flag.
If you're lazily initialising your own thread pool / executor, you'll naturally need a static OnceLock
as well, the same as how backend-threadpool
works.
That's it!
In the case of async backends that provide their own file types, you may also want to implement FastCloseable
on that type, and forward any relevant traits (e.g. Async{Read,Seek,Write}
).
See mod smol_impls
for an example
Go for it!
Make sure the generic bounds include H: Send + 'static
, and it should work out just fine.
If the trait you're adding support for is not part of the standard library (or is on nightly), please put it behind a feature gate (default off)
MIT or Apache 2, at your option (the same as Rust itself)