Ticket #72 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

avahi-daemon does not handle network disconnect and reconnect properly

Reported by: pharon@gmail.com Assigned to: lathiat
Priority: critical Milestone: Avahi 0.6.16
Component: avahi-daemon Version:
Keywords: Cc:

Description

I think I have found a problem with avahi-daemon.

There are several ways to reproduce but I will describe what I think is the simplest way.

On both the desktop and laptop full gnome desktop loaded with service-discovery-applet in the panel. Lots of services are listed, we'll take itunes / daap share as an example. My rhythmbox on the laptop is sharing my music and my desktop rhythmbox can connect and play easily.

Yank out the network cable from the laptop. Services disappear as they should from both sides.

Replug the network cable. Services don't get discovered again. Closing and running rhythmbox still won't publish the itunes/daap share.

To fix the state of the services, I have to either restart avahi-daemon using the init script ( which kills some services like seahorse-daemon key publishing ) or run avahi-discover-standalone ( which has some funny effects like duplicated services.

The same state is reached if avahi-daemon is started before the network interface is ready ( address acquired ).

I think avahi-daemon should be aware of network interface states. This can be achieved by polling or something. Another way is to listen on the dbus for NetworkManager? messages ( if it is available ). but don't take my word for it i am just a beginner.

I am using avahi-0.6.15 on latest gentoo.

Please request any extra information if you need it.

Thanks.

Attachments

avahi-log (18.5 kB) - added by pharon on 11/17/06 12:09:02.
avahi-discover-standalone.log (0.6 kB) - added by pharon on 11/17/06 12:09:49.

Change History

11/15/06 02:00:14 changed by lathiat

  • owner changed from lennart to lathiat.
  • status changed from new to assigned.

Avahi is in fact quite aware of both interface state changes and IP changes, it would appear you are hitting a bug.

Can you tell me what type of network cards you have on both ends?

11/15/06 03:05:04 changed by pharon

I am using NetworkManager? on both sides and both the cards have link beat detection, because networkmanager automatically detects the cable unplugs and plugs.

Something that might explain the weirdness on the laptop sideis that the wireless driver is loaded, and creates wifi0 and ath0. Some applications mistake wifi0 for a real interface and listen on it (ethereal for example). but that doesn't apply on the desktop side. On the laptop side the card is : Ethernet controller: Broadcom Corporation NetXtreme? BCM5705M Gigabit Ethernet (rev 01) Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01)

the kernel says :

eth0: Tigon3 [partno(BCM5705mA1) rev 3001 PHY(5705)] (PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet <mac removed>

on the desktop side :

Ethernet controller: VIA Technologies, Inc. VT6105 [Rhine-III] (rev 86)

kernel says : eth0: VIA Rhine III at <mac removed>

11/16/06 15:07:27 changed by pharon

I just had an idea. Could this happen due to the interface being up but without an IP, then avahi-daemon starts and the interface gets an IP assigend?

Something that might be related is described here :

http://www.hezmatt.org/~mpalmer/blog/general/severe_discomforts_the_joy_of_udp.html

11/17/06 01:51:47 changed by lathiat

Avahi also makes itself aware of IP changes, and I'm pretty sure we don't have problems with UDP as above but I will have a quick check into it.

11/17/06 01:54:38 changed by lathiat

Can yo ustart avahi daemon on the console with --debug and try to do the smallest number of steps possible to reproduce this bug and then send me the complete output

Either in this bug or if your not comfortable putting that dump in the bug send me a copy to lathiat(@)bur(.)st

11/17/06 12:08:25 changed by pharon

ok First I stopped avahi-daemon init script then I started avahi-daemon --debug --no-rlimits 1>2 2>avahi-log

after it was done starting I had some services listed ( sftp, workstation and ssh ) but it missed ntpd and vnc (probably unrelated).

Then I told NetworkManager? to rinitialize the network interface.

Avahi-daemon did notice the change. but after the ip was assigned I didn't have any services listed.

I run avahi-discover but the services don't appear.

I run avahi-discover-standalone and some services appear. I put it's output in avahi-discover-standalone.log

I will attach the log files here.

11/17/06 12:09:02 changed by pharon

  • attachment avahi-log added.

11/17/06 12:09:49 changed by pharon

  • attachment avahi-discover-standalone.log added.

11/17/06 12:15:45 changed by pharon

I forgot to mention that all this was done on the desktop, to rule out possible confusion with other network interfaces. The desktop only has eth0.

Also none of the manipulations above caused any services to appear on the laptop side.

Also the services provided on the laptop did not appear on the desktop. I think to fix that I have to restart avahi on the laptop.

To fix the missed services issue I have to restart the services themselves (ntpd vnc etc.. ).

11/20/06 03:40:39 changed by lathiat

Are you using network manager on your desktop as well?

I recall back in the early days we had a bug where network manager triggered a different netlink code path than ifupdown/dhclient and we fixed it, perhaps it has regressed

I have received a report from someone else about this too (hence thinking network-manager-related bug)

11/21/06 12:22:16 changed by pharon

Yes network manager on both sides. I don't think network manager uses ifup/idfown on gentoo.

What can I do to help.

12/07/06 14:43:29 changed by lathiat

  • priority changed from minor to critical.

I have done some research

it seems when using network manager, after the usual Interface Relevant / Withdrawn / Relevant, it then withdraws and make sit relevant again, but the netlink messages come from a nlmsg_pid != 0 and != avahis pid, which we are now droppign when this happens, avahi loses its multicast group membership, hence everything breaks i will work on sorting this out (need to investigate where the messages come from/ what triggers it etc) and will release a fix ASAP

12/11/06 08:58:09 changed by lathiat

I have a patch to resolve this here

http://lathiat.net/avahi-0.6.15-checkcred.patch

is anyone able to try this?

12/11/06 15:13:56 changed by pharon

I think this patch has fixed the problem. I can confirm that services get re-published after a reconnect is triggered from network-manager. And any subsequent services disconnected or published are correctly handled.

I will use the patched avahi for a while and report any problems.

Thanks for taking the time to fix this. You rock!

12/11/06 15:23:17 changed by lathiat

Thanks for your testing pharon, this is very encouraging.

Please do continue to test (and anyone else)

I don't want to make another bogus release that breaks in some side case :(

12/11/06 15:35:57 changed by lathiat

  • milestone set to Avahi 0.6.16.

12/21/06 02:18:09 changed by lathiat

  • status changed from assigned to closed.
  • resolution set to fixed.