image
image

Go Back   macosx.com > Mac Help Forums > Unix & X11

Reply
 
LinkBack Thread Tools
  #1  
Old June 24th, 2009, 05:39 PM
Registered User
 
Join Date: Jun 2009
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Guppy is on a distinguished road
Awk performs malloc() when accessing arrays?

I've written an awk script that shouldn't, in theory, be taking up that much memory but in practice it is crashing due to running out of virtually memory on my server (like 800GB). It does store about 3200 strings in an associative array but that part doesn't seem to take much memory. It's the subsequent accessing of the array elements that is causing the memory footprint to sore. The following script illustrates this behaviour:

BEGIN { idx = 1 }

# Store the first 800 words
NR < 801 { word[idx++] = $1 }

# Now test whether accessing a stored array value increases
# the memory load
NR > 800 {
for (i=1; i<800; i++)
printf("%d\t%d\t%s\n", NR, i, word[i]);
}

if one calls this 'test.awk' and run it on the builtin dictionary (awk -f test.awk /usr/share/dict/words) all is does is store the first 800 words in the dictionary file in a simple array and this doesn't take much memory. But the second part just keeps printing the damn things over and over again and this starts seriously running up the memory requirements. I watch the process grow in memory using Activity monitor and don't get it. In contrast, if you replace "word[i]" in the printf() statement with "duh" (a constant string) the memory profile is perfectly flat over time. So it is something about accessing the array element that is costing memory (using malloc()s).

Can anyone explain this to me? It's ruining an otherwise reasonable script and doing my head in.

Thanks, in advance.
Reply With Quote
  #2  
Old June 25th, 2009, 07:13 AM
macbri's Avatar
Mac (r)evolution
 
Join Date: Jun 2005
Location: One of these days, Alice....
Posts: 299
Thanks: 3
Thanked 3 Times in 3 Posts
macbri has a spectacular aura aboutmacbri has a spectacular aura about
Two options: GNU awk, or an unsorted array

You've definitely found something -- running valgrind on awk with your example and again on GNU awk (from MacPorts) shows that sure enough awk leaks memory like a sieve, while GNU awk doesn't.

It might be worth filing a Bug Report with Apple (not that you'll ever hear back but it might at least call their attention to it and get it fixed in a future release).

Consider installing GNU awk, which is not only not afflicted by this apparent bug, but also more powerful. As a quick fix, if you don't want to install GNU awk right now, you can modify your code as shown below, and even the Apple-provided awk doesn't leak memory with this. There's one major caveat -- you're not guaranteed to have your array items in any particular order, so it may not be suitable for what you want to accomplish:

Code:
# Store the first 800 words 
NR < 801 { word[NR] = $1 }

# Now access a stored array value without a memory leak
NR > 800 {
    for (i in word)
        printf("%d\t%d\t%s\n", NR, i, word[i]);
}
__________________
Tech Blog

Last edited by macbri; June 25th, 2009 at 07:26 AM.
Reply With Quote
  #3  
Old June 25th, 2009, 01:52 PM
Registered User
 
Join Date: Jun 2009
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Guppy is on a distinguished road
Thanks, that's extremely helpful. Both fixed the memory bug and order wasn't important. Interesting, mac's native awk ran 4x faster than my newly compiled gawk on the same code/same data.
Reply With Quote
Reply

Bookmarks

Tags
awk

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Forum Jump


All times are GMT -5. The time now is 02:24 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.0 RC1
Copyright 2000-2010 DigitalCrowd, Inc.