Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock-free processing #141

Open
raphlinus opened this issue Mar 10, 2017 · 4 comments
Open

Lock-free processing #141

raphlinus opened this issue Mar 10, 2017 · 4 comments

Comments

@raphlinus
Copy link

For real, glitch-free low latency audio performance, the audio processing path needs to be lock-free. That, in turn, implies no allocations or deallocations. The music-synthesizer-for-android project (which was a testbed for high performance audio on Android) dealt with this by doing fixed-size ring buffers, and preallocating the state for all DSP. That meet the lock-free needs, but is limiting.

Thinking about this more and starting to prototype, I believe the answer is lock-free queues such as Treiber stacks (though a number of other lock-free queue data structures would also work). That way, a main thread allocates nodes at will, sends them over a lock-free queue to the realtime audio thread, which at that point basically has &mut access to the nodes. For garbage collection, the realtime thread can send them back over a return channel. Control and status can be multiplexed over the same channels (which has the additional benefit of ordering guarantees between these and graph mutation).

This is something of an exploratory issue. Is this a good repo to develop such ideas? The architectural changes are likely to be far-reaching. It may be less work to just develop a new graph runner and modules from scratch than retrofitting the RustAudio stack. But I'd like to at least sound out working with existing projects.

@mitchmindtree
Copy link
Member

Thanks a lot for opening this @raphlinus, performing allocations/deallocations on a separate thread is something I've had in mind for a while now but have not yet had the chance to explore. If you'd find it beneficial to experiment and improve upon dsp-chain I'd be more than happy to discuss and review changes.

The architectural changes are likely to be far-reaching.

I think the kind of changes you are discussing will be beneficial enough to warrant how far reaching they will be! I'd be happy to see them implemented here if you are still interested. Whatever you decide, I'd love to be involved.

I believe the answer is lock-free queues such as Treiber stacks (though a number of other lock-free queue data structures would also work)

I wonder if the std::sync::mpsc::channel could be a suitable, simple solution for this? I use multiple of these for communicating between the main, generator and real-time audio threads in a generative music project of mine and they've never come close to showing up as any sort of bottleneck in the profiling I've performed. That said, I have not gone out of my way to benchmark these against other options - there are probably others much better suited to optimal performance!

Rough Design Ideas

I can picture the graph constructor API looking something like this:

let (api, audio) = dsp::Graph::new();

where the api side would be kept on the main thread and the audio side would be moved to the audio thread where something along the lines of audio.fill_slice(slice, sample_hz) would be called at the rate of the OS' provided audio callback.

The user could call methods on the api end which would send Messages to the audio end, where Message might be an enum along the lines of:

enum Message<N, F> {
    AddNode(NodeIndex, N),
    AddConnection {
        index: EdgeIndex,
        buffer: Vec<F>,
        source: NodeIndex,
        destination: NodeIndex,
    },
    RemoveNode(NodeIndex),
    // etc
}

where N is the user's node type and F is the audio frame type stored within the buffer. The audio end could receive these messages at the beginning of each audio callback.

When RemoveNode messages are received by the audio end, it could send the node along with any buffers that are no longer necessary back to the api thread which could call something like api.collect_audio_garbage(); toward the end of an update. Alternatively, we could spawn a thread upon the Graphs construction which waits on a std::sync::mpsc::Receiver and solely does garbage collection. Perhaps this thread could do the allocations requested by the api end too?

Questions that come to mind

  • I wonder if we could work out a nice way of cloning the graph (or at least nodes from the graph)? E.g. I can picture a case where a user might be using a DAW and would like to spawn an asynchronous offline render of the current state while continuing to work. I guess most DAWs today don't let you do this due to the difficulty of safely cloning all the nodes while not blocking the audio thread, but it seems like something that maybe rust's type system could assist with? I've come across the desire for this a lot while making music - rendering stems can take minutes to hours and is very jarring for the creative process.
  • How would controlling the nodes work? I imagine the api end could have a method that allowed sending something like Message::UpdateNode(NodeIndex, Box<FnOnce(&mut N)>) to the audio end. This could probably be more efficient using a type implementing a NodeUpdate trait instead of a boxed closure.

Thanks again for the issue, I'll keep thinking on this.

@raphlinus
Copy link
Author

Thanks for the response. I've started prototyping on my end, but still not sure whether it's better to push that to release or work here.

The problem with std::sync::mpsc is that it does allocations. For realtime audio, you don't care about throughput benchmarks, the only thing that matters is worst case. Allocators tend to have locks internally, so if a lower-priority thread is holding an allocation lock, it can block the realtime thread. I think it's worth exploring whether it's possible to implement this with no allocation and no locking whatsoever, and my prototyping is encouraging.

Other than that, I think what you're saying is reasonable, especially having garbage collection done by the api object. There are some details I would do differently, for example I would atomically update a predecessor list rather than adding and removing edges one by one. I also have buffers on nodes rather than edges (and Sum is just another node, rather than having that be magical behavior in the graph runner). Each node has a fixed number (can be >1) of output buffers, but a variable number of input edges, reconfigurable dynamically.

I haven't gone too far into the api yet, but yes, cloning is important. I'm kinda thinking along the lines of a "factory" which basically replicates a source graph into the audio graph; the source graph can be in a more friendly format. The factory would also take care of maintaining invariants, such as building the graph in topological sort order so that no node has an edge from a nonexistent node.

Your UpdateNode idea is powerful, but Box<FnOnce(...)> would cause a deallocation of the box on invoking the function. Changing it to FnMut would solve that particular problem. The other problem is that the graph is holding Module trait objects, which your function would need to downcast to the concrete type of your processing module. I actually do think downcasting is feasible, the Module trait could have this method:

fn to_any(&mut self) -> &mut Any { self }

(Note: in playing with this, it's not quite so easy, the compiler complains about Sized bounds, but I think it can be made to work)

My original thought was just sending an enum of popular data types to an update method on the module trait. That's workable, I think, but less exciting.

@raphlinus
Copy link
Author

I released my prototype: https://github.com/google/synthesizer-io. I'm not sure how much I'm going to continue developing it, whether it's primarily useful as a source of ideas or as a real codebase. It does address a lot of the issues I outlined; it's fully lock-free, allows combinations other than sum, and has some other interesting properties.

@mitchmindtree
Copy link
Member

Excellent, thanks for sharing! I'm looking forward to having a closer look.

mitchmindtree added a commit to mitchmindtree/dasp that referenced this issue Jul 14, 2020
This adds a new crate for working with dynamic audio graphs.

From the new docs:

> `dasp_graph` is targeted towards users who require an efficient yet
> flexible and dynamically configurable audio graph. Use cases might
> include virtual mixers, digital audio workstations, game audio systems,
> virtual modular synthesizers and related usecases.

This work has been a long-time coming and is the result of many
discussions in the rust audio community and many lessons learned over
the last few years of working with rust and audio. In particular:

- the development of and reflection on [dsp-chain](https://crates.io/crates/dsp-chain) and its shortcomings.
- The (reasonable) limitations of [dasp_signal](https://crates.io/crates/dasp_signal) when dynamically configuring graphs.
- Discussion on the design of audio graphs with @raphlinus at RustAudio/dsp-chain#141.
- The development of the [spatial audio server](https://github.com/museumsvictoria/spatial_audio_server).
- A recent email discussion with Sami Perttu on DASP and audio graphs.

`dasp_graph` is of course not a one-size-fits-all solution. Instead, it
is designed specifically to work well alongside (and fill a gap within)
the rest of the `dasp` crate ecosystem. Please refer to the "Comparing
`dasp_signal`" section of the `dasp_graph` root documentation for a more
detailed overview of the design choices between the two, what
applications each are best suited toward and how the two best
interoperate together.

A small suite of node implementations are provided out of the box
including a `Delay`, `Sum`, `Pass`, `GraphNode` and `BoxedNode`, all of
which can be enabled/disabled via their associated features.

Following this, I have some ideas for adding an optional `sync` module
to the crate, aimed at controlling and monitoring a dasp graph and it's
nodes from a separate thread (i.e. for convenient use alongside a GUI)
in a non-dynamically-allocating, non-blocking manner. The work so far
has been performed with these plans in mind. The ideas are for the most
part based on the discussion at RustAudio/dsp-chain#141.

Also, `no_std` support for `dasp_graph` is currently blocked on petgraph
support for `no_std`. A PR is open for adding `no_std` support at
petgraph/petgraph#238. In the meantime, the `std` feature must be
enabled to use the new `dasp::graph` module. This is also noted in the
updated docs.

For more information about the crate and inner workings feel free to
read through the new `dasp_graph` docs. I'm yet to add examples, but
hopefully the added tests can give a good idea of how to use the crate
in the meantime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants