Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: ipc connect can fail on Windows, even after bind #4734

Merged
merged 2 commits into from
Aug 26, 2024

Conversation

minrk
Copy link
Member

@minrk minrk commented Aug 26, 2024

Solution:

  • move tcpip fallback to after connect instead of assuming successful ipc bind means ipc works (it's very weird and possibly a Windows bug that this can happen, but I can at least confirm that it does in very specific scenarios - Python from Windows Store + non-ascii username + default $env:TMP)
  • correct some Windows error code checks
  • set errno when connect fails (missing errno resulted in treating failure as success, leading to Problem: epoll crashes for some Windows users #4730)
  • make sure to clear _w back to retired_fd after closing it

Also addresses todo item to remember the tcpip fallback after ipc has failed once, so the failing path doesn't get retried every time. I can back that out if folks want.

I believe this actually closes #4730, though after some debugging, I do believe there are still missing checks to signaler::valid(), because even when make_fd_pair properly fails, send is still called on the invalid signaler, leading to the EBADFD reported in #4730, when it seems like it should probably be an assert in the signaler constructor (or immediately after calling the constructor) that it failed to create an fd pair.

@bluca
Copy link
Member

bluca commented Aug 26, 2024

Formatting with clang-format (using clang-format) and showing differences with latest commit
diff --git a/src/ip.cpp b/src/ip.cpp
index db10b13..30d70eb 100644
--- a/src/ip.cpp
+++ b/src/ip.cpp
@@ -24,8 +24,9 @@
 #ifdef ZMQ_HAVE_IPC
 #include "ipc_address.hpp"
 // Don't try ipc if it fails once
-namespace zmq {
-    static bool try_ipc_first = true;
+namespace zmq
+{
+static bool try_ipc_first = true;
 }
 #endif

@minrk
Copy link
Member Author

minrk commented Aug 26, 2024

The truly bizarre property of this situation is that IPC totally does work, but it doesn't work in the default temp directory for some process configurations and usernames. In the situations where I've reproduced this, setting $env:TMP to a location outside $HOME/AppData, e.g. $HOME/myAppData fixes everything, but any subdir of $HOME/AppData does not. I'm not sure if there's a property that can be interrogated to pick a different directory, or if just this catch on connect is enough. It's really weird that bind succeeds and connect fails.

@bluca bluca merged commit 1f4dd54 into zeromq:master Aug 26, 2024
65 checks passed
@minrk minrk deleted the ipc_connect_can_fail branch August 26, 2024 15:04
@BtbN
Copy link
Contributor

BtbN commented Sep 3, 2024

This introduced a compilation error:

15.95 /50-libzmq/src/ip.cpp: In function 'int zmq::make_fdpair(fd_t*, fd_t*)':
15.95 /50-libzmq/src/ip.cpp:678:1: error: jump to label 'try_tcpip'
15.95   678 | try_tcpip:
15.95       | ^~~~~~~~~
15.95 /50-libzmq/src/ip.cpp:568:14: note:   from here
15.95   568 |         goto try_tcpip;
15.95       |              ^~~~~~~~~
15.95 /50-libzmq/src/ip.cpp:572:18: note:   crosses initialization of 'const SOCKET listener'
15.95   572 |     const SOCKET listener = open_socket (AF_UNIX, SOCK_STREAM, 0);
15.95       |                  ^~~~~~~~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem: epoll crashes for some Windows users
3 participants