help with searching through a file

nim009 · Oct 29, 2006

hi I'm kind of a noob at Unix stuff,
I have a text file with huge quantities of text.
I need to locate and extract certain lines of text.
The stuff I care about starts with: /translation=
Basically I want to do a shell script where all the stuff in quotes would be printed in the terminal
and It would be great to be able to select which instance it appears, like the 3rd one or something

Example:
...(random stuff)...
/translation="MKQYIVLACMCLAAAAMPASLQQSSSSSSSCTEEENKHHMGIDV IIKVTKQDQTPTNDKICQSVTEITESESDPDPEVESEDDSTSVEDVDPPTTYYSIIGG GLRMNFGFTKCPQIKSISESADGNTVNARLSSVSPGQGKDSPAITHEEALAMIKDCEV SIDIRCSEEEKDSDIKTHPVLGSNISHKKVSYEDIIGSTIVDTKCVKNLEFSVRIGDM
CKESSELEVKDGFKYVDGSASKGATDDTSLIDSTKLKACV"
...(random stuff)...

*becomes*

MKQYIVLACMCLAAAAMPASLQQSSSSSSSCTEEENKHHMGIDVIIKVTKQDQTPTNDKICQSVTEITESESDPDPEVESEDDSTSVEDVDPPTTYYSIIGGGLRMNFGFTKCPQIKSISESADGNTVNARLSSVSPGQGKDSPAITHEEALAMIKDCEVSIDIRCSEEEKDSDIKTHPVLGSNISHKKVSYEDIIGSTIVDTKCVKNLEFSVRIGDMCKESSELEVKDGFKYVDGSASKGATDDTSLIDSTKLKACV

I would want to make the above a string to be used elsewhere

THANKS to anyone who helps

macbri · Oct 30, 2006

Here's a Perl script to do what you want. Each line of output will represent one complete matched "translation" section. Save the script as "extract.pl" and make it executable with the "chmod" command.

Code:

#!/usr/bin/perl -w
#
# extract.pl
# Written by B. Sheehan, www.bgstech.com
# Oct. 30 2006
#

# Initialize
use strict;
my $start  = '^\/translation="';
my $end    = '"$';
my $match  = 0;
my $string = "";

# Loop through lines in input file
while (<STDIN>) {
   $match = 1 if (/$start/);
   if ($match) {
      chomp;
      s/$start//;
      $string .= $_;
   }
   if (/$end/) {
      $match = 0;
      $string =~ s/$end//;
      print $string . "\n";
      $string = "";
   }
}

Combining this with the "head" and "tail" commands to, say, get the fifth "translation" block from your input file you could do:

Code:

extract.pl < input.txt | head -5 | tail -1

This comes with no warranty, etc. etc. use at your own risk, etc. etc. all that jazz. Enjoy!

nim009 · Oct 30, 2006

Thanks,

only I can't seem to get it to work,
I did what you said, and got one blank line as the result. I tried many syntaxes and I still could not get anything more than blank lines.

If you want to look at the data, its here http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AY484669
(I Just copy-pasted to a blank text document)

I really appreciate this, Thanks

macbri · Oct 30, 2006

Ahah, now I see. So when you said the line "starts with /translation=" in your original example, that wasn't exactly true -- it starts with a bunch of white-space and *then* /translation=... So one extra line in the original script removes all leading white-space from each line and voila!

Code:

#!/usr/bin/perl -w
#
# extract.pl
# Written by B. Sheehan, www.bgstech.com
# Oct. 30 2006
# Revision 1.1
#

# Initialize
use strict;
my $start  = '^\/translation="';
my $end    = '"$';
my $match  = 0;
my $string = "";

# Loop through lines in input file
while (<STDIN>) {
   s/^\s+//;
   $match = 1 if (/$start/);
   if ($match) {
      chomp;
      s/$start//;
      $string .= $_;
   }
   if (/$end/) {
      $match = 0;
      $string =~ s/$end//;
      print $string . "\n" if ($string ne "");
      $string = "";
   }
}

But re-visiting the script with a sample input file (which I should have asked for to begin with

) showed another bug, as you noticed: blank lines. In this version I've modified the "print" line to avoid outputting blank lines. Having the original input file is always a plus

Anyway give this version a go and see if it does what you want.

nim009 · Oct 30, 2006

AWESOME!
It works perfectly,

Thank You!

help with searching through a file

nim009

Registered

macbri

Mac (r)evolution

nim009

Registered

macbri

Mac (r)evolution

nim009

Registered