This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug libc/11709] New: glibc domain resolution does not obtain IP addresses from truncated UDP DNS responses.
- From: "khanipov at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sources dot redhat dot com
- Date: 16 Jun 2010 13:28:19 -0000
- Subject: [Bug libc/11709] New: glibc domain resolution does not obtain IP addresses from truncated UDP DNS responses.
- Reply-to: sourceware-bugzilla at sourceware dot org
Contents:
1. Problem
2. Investigation
3. Conclusions
1. Problem.
At home I am using a D-Link Dir-320 router in DHCP mode to get access to the
internet. It has an option of relaying incoming DNS queries to the Internet
Service Provider's (ISP) DNS server and thus becoming a DNS server itself,
letting my PC use the router as a DNS server. Otherwise the PC must use ISP's
DNS server (DNS server is configured via DHCP, so I don't need to change
anything on my PC to reconfigure, if I want the router to act as a DNS I just
need to put a tick in its settings). By default the 'relay DNS' option was
turned on and everything had been going well untill I migrated from Windows to
Linux...
I noticed that some webpages didn't load properly (there were missing pictures
and and some other parts). At first I thought that the problem was with the
webpage, not with my PC. However I also noticed that many other webpages which I
had watched often before (xyz.livejournal.com, youtube) were sometimes loading
for a very long time. In the browser I could see something like 'resolving
host...' or 'waiting for pagead2.googlesyndication.com'. I was curious what
could cause that.
2. Investigation.
First of all I tried to open these web pages using Windows. Everything went
fine: all pages opened quickly without any delay. This way I found out that the
problem was connected with Linux. My next idea was that it was caused by a
faulty domain name resolution and I somehow got to turn off the router's DNS
relay mode. After that everything went fine (on Linux). One may think that it
was router, not Linux who caused problems, but it cannot explain why Windows
worked fine even with DNS relay.
I took some domain names which Linux could not resolve with router's DNS relay
and which arised often during my web-surfing: pagead2.googlesyndication.com,
w.sharethis.com and tried to ping them. For example, ping
pagead2.googlesyndication.com freezed for about 15 seconds and then informed me
that the host is unknown. After I turned off DNS relay ping worked fine. On
Windows, however, ping worked fine even with DNS relay: still there were several
seconds of delay, but the name got resolved.
At this point I realised that I need to get deeper into the DNS protocols. I
found out that both TCP and UDP queries are specified in the RFC. UDP replies
may get truncated if the full response data cannot be stored within a single UDP
frame, and if an application needs all the information contained in a response
it can set up a TCP connection and repeat its query, thus getting rid of small
UDP frame size limitations.
I became familiar with the DNS query and response protocol details and fired up
tcpdump:
sudo tcpdump -i eth1 -X udp port domain
It showed me that when I was pinging pagead2.googlesyndication.com responses
from the router were coming consistent and they contained all four IP addresses
of the host. What made them different from other DNS responses which I observed
when pinging 'resolvable' hosts was the truncation flag present, meaning that
the UDP packet was too tight for the whole response. The theory was born then:
Linux domain name resolution system discards truncated UDP replies and sets up a
TCP connection to get the full response, while my router fails to accept TCP DNS
queries. The theory was proved when I tried to run
nslookup w.sharethis.com
which showed the message ";; Truncated, retrying in TCP mode ;; connection timed
out; no servers could be reached" meaning that the router failed to process TCP
query. Yet the initial UDP reply from the router contained all the necessary IP
addresses! I guess that the Windows system didn't discard this initial reply due
to its truncation and used its data after TCP connection attempt failed (this
explains the delay in name resolution which I could observe on Windows).
3. Conclustions.
1. D-Link Dir-320 router does not comply to DNS server standards (which is
beyond glibc).
2. Linux domain name resolution system does not obtain IP addresses from
truncated UDP DNS responses. I think this is a bug, because most applications
using Internet just need the way to translate domain name into IP address and
are not interested in various additional informaton present in a full response,
so there is no need for them to set up a TCP connection to get the full DNS
record data. If the UDP response contains IP addresses, even if it has the
truncation flag set, it must be used without any further queries.
I would also like to note that the described problem can make many people (at
least those using D-Link Dir-320) stop using Linux systems. The problem was
irritating me for a long untill I finally took my time and found the cause. I
guess that many would just get back to their Windows OS thinking that Linux is
guilty of not loading web pages.
As far as I am concerned name resolution is performed via the getaddrinfo
function which, if I am correct, resides inside glibc, that is why I am posting
a report here.
--
Summary: glibc domain resolution does not obtain IP addresses
from truncated UDP DNS responses.
Product: glibc
Version: 2.10
Status: NEW
Severity: normal
Priority: P2
Component: libc
AssignedTo: drepper at redhat dot com
ReportedBy: khanipov at gmail dot com
CC: glibc-bugs at sources dot redhat dot com
http://sourceware.org/bugzilla/show_bug.cgi?id=11709
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.