Script for Parsing text: PLEASE HELP

Zeus · Dec 19, 2004

Hi all,
i've to convert many text files. The content of each file should be imported to a Database.

Files Content appears like this:

# 03 Bonnie and Clyde (by ?) - Jay-Z feat. Knowles, Beyonce [#11]
# 2+2=5 (by ?) - Radiohead [#31, 2003/04]
# All about lovin' you (by ?) - Bon Jovi [#28]
# All I have (by ?) - Jennifer Lopez & LL Cool J. [#15]
# All in my head (by ?) - Kosheen [#34]
# All my life (by Foo Fighters) - Foo Fighters [#44, 2002/03]

where
% mean a new record to be created
containing five fileds in this order
TrackName (from the % at every newline to '(' character)
Written By (from '(by' to ')' ... if the value is '?' it should be set to 'blank' or 'empty')
Artist (from ' - ' to ' [' or the end of the line )
Featuring (if in Artist Field there is 'Feat. someone' it should be moved from artist to featuring field )
Info (from '[' to ']' if existent)

Now ... this is a really big question.
Is there a tool to do this via commandline?
and if yes, which ?

I need only that the tool is able to parse the input text file and write to standard output as tab delimited

Please help ...
i've millions of files to process with thousands entryes... i don't like to do this manually .... i really need help ... if someone has a tip ... i'll be very grateful !

Thanks in advance

ElDiabloConCaca · Dec 19, 2004

I could probably write this in a simple C program or Perl script, but it might take a few days since time is tight... I can't promise anything, but I don't think it'd be that hard -- a matter of reading in a file, parsing it with grep or simple string compares, stripping the field of any unwanted characters, then writing each field to stdout separated by tabs.

It'd be a good practice program for me, but if you're under a strict time constraint, you may find better help at:
http://www.codeguru.com/forum

Zeus · Dec 19, 2004

Hi,
thanks at first for fast reply.

I've found a 'wayout' with AWK

using the command

% cat filetoprocess.txt | grep "-" | awk -F'di' '{ print $1 }'

this read the txt file and pass it to grep that filter 'blank' lines, then awk filter the grep output reading as field the word 'di'

i'm fighting with awk cause it wont read as field separator the ' (di' regex
infact:

% cat file.txt | grep "-" | awk -F' (di' '{ print $1 }'

awk: syntax error in regular expression (di at
input record number 1, file
source line number 1

any tips ???

p.s. unfortunatly i can't write C o C++ code :-(

ElDiabloConCaca · Dec 19, 2004

You may need to escape the "(" character with a preceding "\" character, since I believe it's a reserved character... like this:

% cat file.txt | grep "-" | awk -F' \(di' '{ print $1 }'

Zeus · Dec 19, 2004

using sed i've solved.

this is the pipe of dead!!!

sed '/^$/d' filetoprocess.txt | sed 's/ (di /%%%/' | sed 's/) - /%%%/' | sed 's/(19/%%%19/' | sed 's/ \[#/%%%/' | awk -F%%% '{ print $1 "---" $2 "===" $3 "---"$4 $5}'

you cannot belive me .... but i've finished the second cigarettes pack!!!! ;-)

andehlu · Dec 20, 2004

Nice!!!

Script for Parsing text: PLEASE HELP

Zeus

Registered

ElDiabloConCaca

U.S.D.A. Prime

Zeus

Registered

ElDiabloConCaca

U.S.D.A. Prime

Zeus

Registered

andehlu

this modern love