-
Notifications
You must be signed in to change notification settings - Fork 890
Linux: Always request and wait for an ACK from nl80211 #1510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thank you for waiting. In my test with Without these changes:
With these changes:
The expected error message is:
|
I agree I don't see that error message, but I don't see that error message by checking out the commit before 0ac887b either. I also don't see it before 24cf361 either. In fact, I checked out libpcap-1.10.4 and built that, and linked tcpdump against it, and saw the same:
message. I also saw it when checking out libpcap-1.10.3, which surprised me somewhat as the RPi has libpcap-1.10.3 installed on it from Debian Bookworm, and the libpcap on the RPi did give the error message you're looking for. Then a lightbulb went off:
The libpcap installed by Debian isn't built against libnl, so it's unable to set monitor mode for any interface, and libpcap returns So I think that this PR does fix the issue that was introduced by my earlier commit. I do agree that the error message you mention above is the correct one to give, but that's a separate pre-existing issue. I am willing to work on it as well, but perhaps we can commit this fix first? |
It probably wasn't being built against libnl because of #1358. |
I added another commit to return |
You are right, the user-visible message is not a new problem and addressing one thing at a time would be best. I tried comprehending the first proposed commit, among other things it seems to plug a memory leak on error. Specifically, in --- a/pcap-linux.c
+++ b/pcap-linux.c
@@ -648,9 +648,6 @@ get_if_type(pcap_t *handle, int sock_fd, struct nl80211_state *state,
if (ifindex == -1)
return PCAP_ERROR;
- struct nl_cb *cb = nl_cb_alloc(NL_CB_DEFAULT);
- nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, if_type_cb, (void*)type);
-
msg = nlmsg_alloc();
if (!msg) {
snprintf(handle->errbuf, PCAP_ERRBUF_SIZE,
@@ -674,7 +671,6 @@ get_if_type(pcap_t *handle, int sock_fd, struct nl80211_state *state,
* about that.)
*/
nlmsg_free(msg);
- nl_cb_put(cb);
return 0;
} else {
/*
@@ -685,11 +681,13 @@ get_if_type(pcap_t *handle, int sock_fd, struct nl80211_state *state,
"%s: nl_send_auto_complete failed getting interface type: %s",
device, nl_geterror(-err));
nlmsg_free(msg);
- nl_cb_put(cb);
return PCAP_ERROR;
}
}
+ struct nl_cb *cb = nl_cb_alloc(NL_CB_DEFAULT);
+ nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, if_type_cb, (void*)type);
+
err = nl_recvmsgs(state->nl_sock, cb);
if (err < 0) {
snprintf(handle->errbuf, PCAP_ERRBUF_SIZE,
@@ -723,6 +721,7 @@ nla_put_failure:
"%s: nl_put failed getting interface type",
device);
nlmsg_free(msg);
+ // Do not call nl_cb_put(): nl_cb_alloc() has not been called.
return PCAP_ERROR;
}
Another change is replacing /**
* @deprecated Please use nl_send_auto()
*/
int nl_send_auto_complete(struct nl_sock *sk, struct nl_msg *msg)
{
return nl_send_auto(sk, msg);
} One other thing that looks not quite right is that (notwithstanding the logic explained for the kernel end of a netlink socket) libnl is supposed to set /**
* Disable automatic request for ACK
* @arg sk Netlink socket.
*
* The default behaviour of a socket is to request an ACK for
* each message sent to allow for the caller to synchronize to
* the completion of the netlink operation. This function
* disables this behaviour and will result in requests being
* sent which will not have the NLM_F_ACK flag set automatically.
* However, it is still possible for the caller to set the
* NLM_F_ACK flag explicitely.
*/
void nl_socket_disable_auto_ack(struct nl_sock *sk)
{
sk->s_flags |= NL_NO_AUTO_ACK;
} If this is the case, the change to |
e4b5d21
to
6b4510e
Compare
Sounds good, I went with your proposed changes. |
pcap-linux.c
Outdated
@@ -754,49 +754,73 @@ DIAG_ON_NARROWING | |||
|
|||
err = nl_send_auto_complete(state->nl_sock, msg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, here nlmsg_free()
could be called exactly once; furthermore, it could be called implicitly from nl_send_sync()
, in which case this code block would be reduced to roughly this:
err = nl_send_sync(state->nl_sock, msg);
switch (err) {
// return one of the error codes
}
// success
Arguably, the latter would only make perfect sense if nl_wait_for_ack()
indeed returned -NLE_OPNOTSUPP
etc, which may be not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit surprised too that nl_wait_for_ack
returns -NLE_OPNOTSUPP
instead of the preceding send command, but apparently at least on the RPi that is the case, as seen in the error:
tcpdump: wlan0: nl_wait_for_ack failed adding mon0 interface: Operation not supported
Thank you, it is becoming less convoluted now. |
Thanks for all the review. I think I'm happy with the current state now; let me know if you want me to squash any commits or anything. |
Thank you for preparing solutions to the following problems:
On the one hand, some of these changes overwrite each other, but on the other hand it is desirable to make the commits easy to cherry-pick into the libpcap-1.10 branch. Let me see if there is a better way to arrange these commits. |
Thank you for waiting. Please see the "nl80211_no_mon_rejig" branch in my libpcap repository, this represents the sum of your changes almost exactly less a few trivial style fixups, but the order and contents of commits is different (one lot of clean-ups followed by three minimal bug fixes), also the bug fix commit messages now include "before" and "after". This should be much easier to follow later if necessary. I have tested that each of these commits works as expected both if applied to the master branch and if cherry-picked into the libpcap-1.10 branch. Please have a look and use for this pull request as much as it fits. One thing that is still missing is the rationale in the commit that fixes the original |
Use the constants NL_AUTO_PORT and NL_AUTO_SEQ instead of their value 0, for clarity. Use nl_send_auto() instead of nl_send_auto_complete(), which is deprecated. Factor out some common nlmsg_free() calls. Use nl_send_sync() in add_mon_if() and del_mon_if(), which combines nl_send_auto() and nl_wait_for_ack() when using the standard callback. Signed-off-by: John Thacker <johnthacker@gmail.com>
The function does not deallocate the struct nl_cb, no matter if it exits normally (wlan1) or due to an error (wlan0). To address the former code path, add a missing nl_cb_put() call and make it as early as possible; to address the latter code path, move the nl_cb_alloc() call to make it as late as possible. Before: # valgrind --quiet --leak-check=full ./testprogs/opentest -i wlan0 -I opentest: wlan0: Generic error (SIOCGIFINDEX: No such device) ==1748== 224 bytes in 1 blocks are definitely lost in loss record 2 of 3 ==1748== at 0x4889F94: calloc (vg_replace_malloc.c:1328) ==1748== by 0x48FF9BB: nl_cb_alloc (in /usr/lib/aarch64-linux-gnu/libnl-3.so.200.26.0) ==1748== by 0x12C6CB: get_if_type (pcap-linux.c:647) ==1748== by 0x12C6CB: enter_rfmon_mode (pcap-linux.c:4934) ==1748== by 0x12D157: setup_socket (pcap-linux.c:2505) ==1748== by 0x12D157: pcap_activate_linux (pcap-linux.c:1249) ==1748== by 0x110E2F: pcap_activate (pcap.c:2708) ==1748== by 0x10DF07: main (opentest.c:174) ==1748== # valgrind --quiet --leak-check=full ./testprogs/opentest -i wlan1 -I wlan1 opened successfully ==1754== 224 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==1754== at 0x4889F94: calloc (vg_replace_malloc.c:1328) ==1754== by 0x48FF9BB: nl_cb_alloc (in /usr/lib/aarch64-linux-gnu/libnl-3.so.200.26.0) ==1754== by 0x12C6CB: get_if_type (pcap-linux.c:647) ==1754== by 0x12C6CB: enter_rfmon_mode (pcap-linux.c:4934) ==1754== by 0x12D157: setup_socket (pcap-linux.c:2505) ==1754== by 0x12D157: pcap_activate_linux (pcap-linux.c:1249) ==1754== by 0x110E2F: pcap_activate (pcap.c:2708) ==1754== by 0x10DF07: main (opentest.c:174) ==1754== After: # valgrind --quiet --leak-check=full ./testprogs/opentest -i wlan0 -I opentest: wlan0: Generic error (SIOCGIFINDEX: No such device) # valgrind --quiet --leak-check=full ./testprogs/opentest -i wlan1 -I wlan1 opened successfully Signed-off-by: John Thacker <johnthacker@gmail.com>
ad41782
to
8d1f47b
Compare
get_if_type and add_mon_if share the same generic netlink socket. By default ACKs are sent for every transaction. If get_if_type does not wait for the ACK, then add_mon_if will receive the ACK and wrongly interpret its NL80211_CMD_NEW_INTERFACE to add the monitor device as a success even if it failed. This can cause libpcap to assume that a monitor device was added even when it failed, later causing a ENODEV error when trying to open the non-existent device. Before: # ./testprogs/opentest -i wlan0 -I opentest: wlan0: Generic error (SIOCGIFINDEX: No such device) After: # ./testprogs/opentest -i wlan0 -I opentest: wlan0: Generic error (wlan0: nl_send_sync failed adding mon0 interface: Operation not supported) Signed-off-by: John Thacker <johnthacker@gmail.com>
If Netlink reports that an interface is a mac80211 device (not in monitor mode), but then reports NLE_OPNOTSUPP when trying to add that interface as a monitor mode device, report that as the PCAP_ERROR_RFMON_NOTSUP so the libpcap error message is used instead of the generic Netlink error message. For other error types, continue to report a generic error. Before: # ./testprogs/opentest -i wlan0 -I opentest: wlan0: Generic error (wlan0: nl_send_sync failed adding mon0 interface: Operation not supported) After: # ./testprogs/opentest -i wlan0 -I opentest: wlan0: That device doesn't support monitor mode (That device doesn't support monitor mode) Signed-off-by: John Thacker <johnthacker@gmail.com>
8d1f47b
to
186f908
Compare
Thanks, I used your suggestion and added some details about why we must wait for the ACK when getting the interface type. |
The Linux kernel documentation for Netlink says:
"Note that unless the NLM_F_ACK flag is set on the request Netlink will not respond with NLMSG_ERROR if there is no error. To avoid having to special-case this quirk it is recommended to always set NLM_F_ACK." (https://docs.kernel.org/userspace-api/netlink/intro.html#nl-msg-type)
In other places, it suggests that sockets do have Auto-ACK enabled by default, however.
Regardless, follow instructions and always set NLM_F_ACK, and receive messages until we get the ACK.
Fix #1508