-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serial crash when receiving incoming data faster than handled #293
Comments
I could plug a debugger and got the following stack-trace on a crashed pico. It seems the ISR hangs when trying to acquire a lock that is already acquired by the interrupted code. Maybe TinyUSB_Device_FlushCDC should acquire __usb_mutex from Adafruit_TinyUSB_rp2040.cpp, like it's done for TinyUSB_Device_Task ? Stacktrace of the stuck pico :
|
FYI I ran in to a similar looking issue, so I speculatively tried this on a similar (?) codebase, ^ and it appears to have significantly improved on the issue of serial getting stuck (note, it still appears to happen, but less often.) |
Hi, I'm also running into a similar problem, where the RP2040 appears to lock up/freeze when connected over USB (serial and MIDI device). If I put a debugger on it, I'm also seeing that it seems to get stuck forever in __best_effort_wfe_or_timeout. The time it takes before this happens seems to vary, but nonetheless happens a lot. Serial isn't actually necessary for my app, as its mainly a MIDI thing, but nevertheless I find that if anything is sent on the serial connection but the host doesn't open the connection, the rp2040 will freeze; while it also seems to randomly freeze after a while anyway even with the host servicing the serial, but nothing being sent to it. @anrp-tri, I've attempted to apply your patch hoping it would ameliorate the problem, but when trying to compile the latest TinyUSB version with your patch applied, I hit up against problems. Are there any changes necessary to make it work with the up-to-date TinyUSB lib? |
Well, I managed to get @anrp-tri 's patch to apply OK in the end, but it hasn't seemed to improve much for me. I'm unclear if the problem I was running into is the same one reported here, but for me I was getting repeatedly and intermittently getting stuck in best_effort_wfe_or_timeout and finding the timeout set to OSAL_TIMEOUT_WAIT_FOREVER, so I guess I was stuck in a deadlock and never returning. Applying the patch below seems to have worked around this issue for me -- no other side effects observed yet, but maybe this causes problems elsewhere? index 95cb55a..7dccf4e 100644
--- a/src/tusb.c
+++ b/src/tusb.c
@@ -118,7 +118,7 @@ bool tu_edpt_claim(tu_edpt_state_t* ep_state, osal_mutex_t mutex)
// pre-check to help reducing mutex lock
TU_VERIFY((ep_state->busy == 0) && (ep_state->claimed == 0));
- (void) osal_mutex_lock(mutex, OSAL_TIMEOUT_WAIT_FOREVER);
+ (void) osal_mutex_lock(mutex, OSAL_TIMEOUT_NORMAL); //OSAL_TIMEOUT_WAIT_FOREVER);^M
// can only claim the endpoint if it is not busy and not claimed yet.
bool const available = (ep_state->busy == 0) && (ep_state->claimed == 0); |
Oh my god is this actually my issue the whole time? I'm currently working on an open source lightgun project using the RP2040 as my primary target and reference, using TinyUSB. At a certain point, one of the feature requests I've gotten is implementing support for MAMEHOOKER, which is a PC-side program that, among other things, sends a constant stream of serial commands. After some time, however, the same exact thing being reported here happens; the serial line dies, the board becomes unresponsive, whatever buttons that were depressed stay stuck on/off, and I have to physically pull and reinsert the USB connection to get it back up and running. This only happens with serial, but not just with ingest - even with adding just a simple I've tried to limit how often either serial is read or the button signals are sent (at this point now, serial and USB data essentially takes turns), but no matter what I do the crashing will happen eventually. And in this case, it's a key cornerstone of a feature request I've been adamant on implementing lately. |
I did start hosting a fork with anrp-tri's fix combined with doctea's addendum in the hopes that it resolves the issue in the short term. However, I did just receive a crash (lockup) report, even with the patched core. Is @doctea's patch applied on top of anrp-tri's modified core, or to upstream? Now I'm wondering; does this happen when using Pico SDK's USB stack? Or is this really exclusive to TinyUSB? I would test, but some of the input libraries I use rely on TinyUSB. I've also also noticed we're all using RP2040 hardware - is this exclusive to the Pico and its derivatives? Or has it also happened on other boards when using the TinyUSB stack? It seems like #238 is related. |
My patch applies fine onto the upstream without needing to apply anrp-tri's patches first. I've just ended up reinstalling my libraries and so lost all the patches I'd previously applied to it, and had the dreaded return of these intermittent freezes. Since applying my patch from up above, I haven't had a freeze so far. As the freezes were intermittent anyway its too early to say for sure, but this might have solved the problem entirely for me. Quite possible that @anrp-tri's patch fixes something important too, or that there are side-effects from my fix that I haven't noticed yet. I'll post back if I start getting crashes again! (edit -- I also seem to be able to print to the USB serial monitor without crashing now too -- not sure if this is because of this fix, or something else that's changed, but definitely not something I was able to do before!) |
I see! It may have been a mere misunderstanding on my part, because my initial instinct was to merge both your (and anrp-tri's) patches, but doing so I still got a report about those freezes. I've since made a second package, just based on the latest 3.6.1 core from upstream with only your one-line patch in the EDIT: Got another crash report. I don't know if it's just that my use case is much more severe or what, in terms of data being sent in both directions, but this remains a showstopping issue. |
Dang, sorry to hear that :(. FWIW my patch was originally intended to be applied in conjunction with @anrp-tri 's, so you weren't mistaken there, but my patch didn't rely on anything in anrp-tri's. Maybe I've just got lucky that I've not seen any crashes in my project lately. The only other things I can think that I've tweaked in my code since the last crashes is to be more careful about enabling/disabling interrupts in critical sections of code, perhaps that's contributed to the apparently increased stability. |
FWIW, my code doesn't use interrupts--minus one hardware timer that updates a camera tick flag every ~4ms, but that's all. I've looked at my external libraries that I use, and nothing else in either those libraries or the main sketch uses interrupts; there's only one other while() loop that gets triggered (for clearing an I2C buffer just-in-case for the camera) - the major things the code does is receives then processes serial data, and sends commands for the Arduino mouse/keyboard down the USB cable. Perhaps if either I or my sole tester had a different board that wasn't an RP2040, it might be even more helpful now, aha. But it just so happens we both only have Pico-adjacent boards, as does everyone in this thread. (^^; I could probably try and see if the Pico SDK USB stack changes things? But that would require redoing my project's input handling because we use a custom library tailored to TinyUSB for the absmouse & keyboard handling (and I'd probably explore that more if I didn't have other life obligations rn. Adulting is hard). |
Just wanted comment back here again as I've just spent several days trying to figure out why my project was crashing when running it from USB MIDI clock again, until reading through relevant issues here reminded me of my above patch and that I hadn't re-applied it since reinstalling libraries. Doing so seems again to have solved the problem with crashes. I also note that the same workaround/fix (changing OSAL_TIMEOUT_WAIT_FOREVER to OSAL_TIMEOUT_NORMAL) solved the problem for them too in #238 . What are the ramifications of this? Is there a more proper fix? |
I can say I haven't run into crashes personally when using a forked version of Earle's Pico core with the workaround applied (I've heard one conflicting report otherwise, but have yet to hear back from them), and at least for my purpose it hasn't affected the stability of what I'm trying to do either. Still, not an ideal circumstance, for sure. |
|
I've looked into the issue today so let me share some thoughts. The workaround @doctea shared long time ago: I think there can be at least 2 issues:
A reason why the mutex lock isn't released forever is very simple. It happens if:
Temporal fix for my project is to prevent the IRQ while calling irq_set_enabled(USBCTRL_IRQ, false);
bool result = usb_hid.sendReport(0, &report, sizeof(report));
irq_set_enabled(USBCTRL_IRQ, true);
// Might be a good idea to process events here?
// TinyUSB_Device_Task(); With this change, I don't see any deadlock/frame-drop on my device. However, I don't think it's feasible solution for many of you guys in this thread. Finally, my question to @hathach is: |
Let me note that a reason why it happens only if we use multiple USB classes (e.g. Serial and HID, Serial and MIDI) is; TU_VERIFY((ep_state->busy == 0) && (ep_state->claimed == 0)); |
Thank you for looking into this and identifying that my simple workaround this isn't a proper fix! Hoping this leads to a proper fix :). Wondering if the 10ms timeout on the deadlock explains some timing problems I see in my project, too (it is a MIDI app and the clock runs ever-so-slightly slower if a USB cable is connected..).
Indeed, my project doesn't call sendReport itself, so I don't think I can add the same "protection" to my project in this way? Would calling irq_set_enabled before I send data from my project perhaps be enough? Excited to see progress on a real solution! |
@doctea I am grateful to you. I didn't lose my motivation while prototyping because of your workaround :)
I'm not familiar with MIDI implementation but as far as I checked, Also don't forget to re-enable IRQ handler after |
I'll keep this info in mind! In my case, it might be useful; though it is possible that in my project, both cores might be trying to send input reports at around the same time (with one core sending mouse coords from camera processing, and the other handling button presses and analog polling). At least doctea's hacksaw fix has been working adequately for my needs in the meantime until I either perform all reports on a single thread (where IRQ flip-flopping would be safe) or the underlying deadlock is resolved. |
I'm sure that sending reports from multi-cores at the same time isn't problematic because mutex-lock works properly in this case. |
unfotunatly doing this did not work for me:
board still crashes for my application i was not able to find what .write() actually does though. I´m not a programmer and am having a difficult time digging in to the library. for now i will use the timeout_normal solution to finish prototyping. |
Just another data point - I am using the rp2040 with C code (no Adafruit libraries) using the pico-sdk v1.5.1 which bundles TinyUSB v0.15.0. I found that simply calling I implemented the |
Operating System
Linux
Arduino IDE version
Platform.io with
Board
pico
ArduinoCore version
earlephilhower#master
TinyUSB Library version
2.1.0
Sketch as ATTACHED TXT
The attached program just returns input or output '.' when no output is available, as fast as possible.
It limits the rate to 1000 characters per second, by using a delay in the loop.
serial-crash.zip
Using that program writing data from the host will quickly crash the pico (like by pasting 10 lines of codes again and again)
When crashed, the pico won't reboot for sketch upload.
Removing the delay in the loop seems to avoid crash of the program.
I've tried running on the second core, with the same results.
Compiled Log as ATTACHED TXT
No log available
What happened ?
The pico becomes unresponsive. Serial seems dead.
How to reproduce ?
Run the attached platform io program
Watch the serial monitor
Send bunch of data
The output will stop at some point, the pico won't even reboot.
Debug Log
No response
Screenshots
No response
The text was updated successfully, but these errors were encountered: