lookupd: DNSAgent: timeout -- lets figure this out

Is your time-out-ing dns server on your subnet?

  • Yes - its on my computer, so its ip is 127.0.0.1

  • Yes - private ip addresses as well (such as 10.x.x.x)

  • Yes - public ip addresses

  • no - its got a real ip and isn't on my subnet but I still get the errors


Results are only viewable after voting.

kilowatt

mach-o mach-o man
The problem I'm referring to is one I frequently hear about in irc.

Surf the web for 20 mins (using sites you don't normally use, just to rule out the DNSAgent's cache) in osx with your favorite browser. Watch the console (/Applications/Utilities/Console.app).

you may see stuff like this:
Code:
Feb  5 17:07:10 mach4 lookupd[202]: DNSAgent: dns_send_query_server - timeout for 192.168.1.140

right before these errors pop up, you'll be at a site, and your web browser will say something like 'looking up site...'. The browser will pause for a sec, then you get the message, and then it finds it.


Well I've had enough of this. so I am going to figure it out. Last night I read 18 pages on lookupd (which was very interesting for a man page ;-).

Here's what I know: timeout means that the speficied timeout was reached when contacting the host. Well Duh. But anyway, what does not make since, is that I only have one DNS server listed in my preferences, and yet after it 'times out' it finds the host (it doesn't go to the next host, obviously because there isn't one).

Lots of people have reported this to me, and I think its time we figure it out. Don't say its because the DNS server isn't replying - some people that have 4 redundant BIND servers on a fiber network have reported this :D.

To test your dns server from the command line don't use host or nslookup - they rely on lookupd, which is what we want to prove is the problem. If your dns server is dns.server.domain and the host you wish to lookup is host.i.dont.know, use the following syntax from the terminal:
Code:
dig @dns.server.domain host.i.dont.know
[i] example: dig @124.183.12.42 macosx.com[/i]
Oh, allways use ip's when you are refering to dns servers, btw :p

the problem seems to only happen to me when I specify a dns server on my subnet (I don't get the errors with I use my isp's crappy dns server).

Please, unix and NetInfo gurus (I know you're out there!) help us out if you can. I'd like to start by asking everyone to post if they are getting these DNSAgent timeouts in their logs. there must be some common trend between us.
 
I get these errors but very infrequently compared to earlier versions of X.
Following your instructions, it took aproximately 15 hits to different sites to get my first DNS Agent error message for the primary DNS I have listed in my Network settings. I've never seen the agent error on the secondary DNS with this version of OSX, yet.
The browser (OmniWeb 4.1b1) reacted as you have described in your post.
-I'm running X 10.1.2 with an aDSL connection (no firewall).
-I have a static ip address and use my isp's primary and secondary dns servers.
-The DNS servers are on a different net than my static address! There's a hop when I run a trace route to the DNS.
-I have both dns servers listed in my network settings in the numerical format.
-If it matters, the machine is a 400MHz TiBook, and I call it shmenge. It likes that.
 
I wonder if this is something to worry about at all, really.

Given that DNS uses UDP (lightweight, unreliable transfer) for lookup requests and such, an overloaded DNS server, or heavy network traffic, or static electricity somewhere, or the price of trout in Hong Kong, are all liable to cause a DNS lookup request to get lost along the way. There presumably wouldn't be a timeout mechanism at all if it weren't expected to happen from time to time. The question is almost, why bother logging them at all?

That said - check out the lookupd manpage. It seems you can somehow set the timeout for the various lookupd agents (in particular the dns agent). Either shorter, to maybe jump the gun on some queries that haven't been dropped, but not wait as long for ones that have; or longer, to not have so many 'false alarms'. I guess one might experiment about a bit with that...
 
I've always ignored this myself; it seems, from what I've seen, lookupd is simply impatient. From log entries in /var/log/netinfo.log, the timeout seems to be five seconds. This timeout is controlled by much more than just your local DNS server (mine are on my own machines, so there should never be a problem here), mostly it's caused when your local server has to wait for another DNS server to respond. Five seconds isn't really that long for waiting on what could amount to several DNS lookups which could span the world.

According to the lookupd manpage, it seems we can create a /locations/lookupd/agents/DNSAgent directory in NetInfo, and add a property of Timeout, to control it there. I just added it, we'll see if /var/log/netinfo.log goes quiet.
 
Thanks for your postings guys, I think I'll mess with the timeout variable as well.

what I wonder, though, is if osx sends out malformed dns queries. For example, the guy with 4 redundant BIND servers on his network shouldn't see timeout messages all the time.

And, when I use my 127.0.0.1 caching dns, I don't understand how that times out.

I think a packet sniffer is going to become very usefull... looks like I'll be using 'snoop' on my solaris box as soon as I get it to start booting up again :)

Blb, keep us posted as to what you set the timeout to if you don't mind, and how it effected overal performance.
 
What I gots so far...

I found a pattern for the DNS Agent returning timeouts to Console. The difficult part was finding a domain name that would repeatedly fail. www.morecrap.com has worked for the last 18 hrs. Using this domain, call a DNS query (via a browser load, or your preferred method), watch it return the timeouts. 1 timeout message, message repeats 2 times, a final timeout message, and then a load failure from the browser. This is when the the DNS Agent will switch the primary DNS entry in your Network settings to the secondary DNS entry. DNSA will continue to use the secondary DNS till another timeout succession occurs. And round and round she goes.
So this mechanism seems to work as it should.

Interesting, yet strange...
If I'm monitoring my internet connection (en0) with tcpdump with an idle machine (nothing open but Terminal and TextEdit), there are DNS messages occurring at sporadic intervals. I found no repeatable sequence. At times it can storm as if making a full DNS query. Could be DNSA reacting to the igrp and arp protocols messages flying around my isps networks but this is where I draw the big ?

Helpful lookupd commands
-To avoid the DNS Agent hitting the cache (actually calling the cache agent to parse cache), you can manually flush it with the command: lookupd -flushcache
-You can also view lookupd stats with command: lookupd -statistics
-If you wanted to go crazy you can launch a 2nd and separate instance of lookupd (apart from the one that is currently running) and manually enter commands for each agent for debugging purposes.

I assume there are configuration files held in the OS by lookupd, but where? The man says it could be in the NetInfo directories (/locations/lookupd/) or a set of flat files (/etc/lookupd/). Did anyone find where they actually reside?

I'm slanting towards the way scruffy sees this, it doesn't really seem to be a problem. With me at least.


Pertinent man page snippet:

DNSAgent
DNSAgent is the DNS client. Since DNS does not have a fast mechanism
that would allow for validation of cached entries, the agent does not
support cache validation. DNSAgent is only used for host name/address
and network name/address lookups.

The DNSAgent supports the Timeout option. This sets the total timeout
for a query. Note that a query is sent to a server, and if no reply is
received, the query is re-tried a certain number of times, set by the
value of the Retries option. If no reply is received from that server,
then the query is sent to the next server in the list of known servers.
The DNSAgent computes a per-server timeout from the total timeout, divid-
ing by the number of servers and the number of tries per server.

Normally, DNSAgent determines the DNS domain name by reading the file
/etc/resolv.conf. If that file does not exist, DNSAgent searches NetInfo
for a directory named /locations/resolver, which should have the domain
name specified as the value of the ``domain'' property. IP addresses of
name servers should be specified as values of the ``nameserver'' proper-
ty. Additional properties that may be specified in /etc/resolv.conf may
also be specified in /locations/resolver, using the same keys and values
as those used in the file.

If the Domain option is given in DNSAgent's configuration, its value will
be used for the DNS domain name. Use of this option should be limited to
special situations, as this mechanism is not supported by other DNS util-
ities such as the nslookup command.

By default, DNSAgent does not support queries that fetch a list of all
entries, i.e. the ``allHosts'' query that supports the ``gethostent()''
API in the System framework. Setting the AllHostsEnabled option to
``YES'' will enable support for fetching a list of all hosts from DNS.
The DNSAgent implements this using a zone transfer call. Use of this
setting should be limited to special situations, since the call is very
time consuming for the DNS server, and the resulting list will use large
amounts of memory.

DNSAgent supports a domain name as an optional startup argument. For ex-
ample:

DNS:example.net
or
DNS:local

This allows you to configure lookupd to be a client of several different
DNS domains. Each domain named as an optional startup argument to
DNSAgent must be defined in NetInfo. In the case of the example
``example.net'' there must be a NetInfo directory with the path
``/domains/example.net.'' The directory should have the same format as
the ``/locations/resolver'' directory, as described above.
 
Well, I set the timeout for DNSAgent to 15, thinking it was in seconds; then I saw it timeout three times (the default for retries, then it switches to another server) within four seconds...

Okay, so then I figured it must be in tenths-of-a-second, so I tried 150, and now it appears to timeout after about 20 seconds...

But there are definitely fewer log entries (a ping to a host which doesn't resolve properly only shows one timeout).
 
not help... but aren't these things measured in "ticks" instead of secs?

and... does anyone run bind on their machine? would that make any difference?

well... my personal feeling is that this "error" is rather irrelevant... but I don't mind this thing going away either... I mean, it seems like this is about 80% of the log... Yea, sure I can remove it from the log file, but I'd rather just be able to look at other stuffs from console...
 
I am on a pure switched network with a local DNS server, and although that DNS server sometimes screams holy terror, it always responds to my local machine. My local machine has no issues whatsoever with DNS_Agent. Which surprises me, I have 3 NICs in this machine, and it's doing NAT, routing, Serving DNS as a backup (although I don't lookup against it) and is generally a busy machine.

I'm wondering, are you on a hub? Do you have errors / collisions on your network? I have happy network, a DNS server within 50 feet, and no errors or timeouts getting logged with any frequency.
 
Sorry about my delay :) I've been rather busy lately.

Well, here's what I setup:

1) installed a dns server on macosx. This is on my g4.
2) set the g4, my osx running g3, one os9 powermac, and a performa running linux to all get dns information from my g4 (its the ONLY source of dns for these computers).

All computers are on a 192.168.1.0 subnet, and are connected to two hubs.

the only computers that take way too long to lookup hosts are the Mac OS X computers. They generate errors, and take too long to lookup uncached information.

My network is plenty fast, and I don't get colissions unless I'm doing a ping flood test on one of my computers. My hubs are made by BayNetworks, and XSense. I'm pretty certain its not my network that is causing these errors - like I said, some dude on irc said with his fiber network and four redundant BIND servers, he still gets errors. And btw, its not just the errors that have me ticked :) its also the fact that looking up hosts takes WAY too long. Like +30secs. And when it do it manually (dig @mydnsserver host) it takes about half a sec.

So my conclusion is that the lookupd for mac os x is badly configured. I also conclude that, while some errors should be expected, an error on each new host lookup should not occur, and waiting 30secs for dns lookup is too long.

Once again, only my macs with OSX showed this. My performa with linux, a 6100 powermac with os9, an intel computer with FreeBSD 4.5, and my red hat-running 486 all lookup hosts with blinding speed. And they all connect to my g4 no less, which is a pain to surf the net with because of this.

wtf is going on? Apple, you listening?
 
I config'd the hell out of lookupd on MacOSXServer_v1.2 but it's been a long time, and that stuff was so cryptic I can't remember any of it. Anyway, I'm looking up against a MacDS server, I'm wondering if that's more mac happy than bind. That would frighten / annoy me.

Anyway, in 14 days the only DNSAgent error I have is when I restarted the DNS server. It really was an error. The vast majority of my log is filled with malloc errors like this:

Feb 18 19:49:58 Chimera check_afp[16961]: *** _NSAutoreleaseNoPool(): Object 0x6ada0 of class NSCFString autoreleased with no pool in place - just leaking

Well, I have reaffirmed to myself that garbage collection is good no matter what a contrary C programmer says. But I'll try and look into my settings and such and get back to you. See if we can compare something meaningful. In the meantime I need to do laundry or I man't be wearing underwear tomorrow. :-O
 
Back
Top