help with searching through a file

nim009

Registered
hi I'm kind of a noob at Unix stuff,
I have a text file with huge quantities of text.
I need to locate and extract certain lines of text.
The stuff I care about starts with: /translation=
Basically I want to do a shell script where all the stuff in quotes would be printed in the terminal
and It would be great to be able to select which instance it appears, like the 3rd one or something

Example:
...(random stuff)...
/translation="MKQYIVLACMCLAAAAMPASLQQSSSSSSSCTEEENKHHMGIDV IIKVTKQDQTPTNDKICQSVTEITESESDPDPEVESEDDSTSVEDVDPPTTYYSIIGG GLRMNFGFTKCPQIKSISESADGNTVNARLSSVSPGQGKDSPAITHEEALAMIKDCEV SIDIRCSEEEKDSDIKTHPVLGSNISHKKVSYEDIIGSTIVDTKCVKNLEFSVRIGDM
CKESSELEVKDGFKYVDGSASKGATDDTSLIDSTKLKACV"
...(random stuff)...

*becomes*

MKQYIVLACMCLAAAAMPASLQQSSSSSSSCTEEENKHHMGIDVIIKVTKQDQTPTNDKICQSVTEITESESDPDPEVESEDDSTSVEDVDPPTTYYSIIGGGLRMNFGFTKCPQIKSISESADGNTVNARLSSVSPGQGKDSPAITHEEALAMIKDCEVSIDIRCSEEEKDSDIKTHPVLGSNISHKKVSYEDIIGSTIVDTKCVKNLEFSVRIGDMCKESSELEVKDGFKYVDGSASKGATDDTSLIDSTKLKACV

I would want to make the above a string to be used elsewhere

THANKS to anyone who helps
 
Here's a Perl script to do what you want. Each line of output will represent one complete matched "translation" section. Save the script as "extract.pl" and make it executable with the "chmod" command.

Code:
#!/usr/bin/perl -w
#
# extract.pl
# Written by B. Sheehan, www.bgstech.com
# Oct. 30 2006
#

# Initialize
use strict;
my $start  = '^\/translation="';
my $end    = '"$';
my $match  = 0;
my $string = "";

# Loop through lines in input file
while (<STDIN>) {
   $match = 1 if (/$start/);
   if ($match) {
      chomp;
      s/$start//;
      $string .= $_;
   }
   if (/$end/) {
      $match = 0;
      $string =~ s/$end//;
      print $string . "\n";
      $string = "";
   }
}
Combining this with the "head" and "tail" commands to, say, get the fifth "translation" block from your input file you could do:

Code:
extract.pl < input.txt | head -5 | tail -1
This comes with no warranty, etc. etc. use at your own risk, etc. etc. all that jazz. Enjoy!
 
Ahah, now I see. So when you said the line "starts with /translation=" in your original example, that wasn't exactly true -- it starts with a bunch of white-space and *then* /translation=... So one extra line in the original script removes all leading white-space from each line and voila!

Code:
#!/usr/bin/perl -w
#
# extract.pl
# Written by B. Sheehan, www.bgstech.com
# Oct. 30 2006
# Revision 1.1
#

# Initialize
use strict;
my $start  = '^\/translation="';
my $end    = '"$';
my $match  = 0;
my $string = "";

# Loop through lines in input file
while (<STDIN>) {
   s/^\s+//;
   $match = 1 if (/$start/);
   if ($match) {
      chomp;
      s/$start//;
      $string .= $_;
   }
   if (/$end/) {
      $match = 0;
      $string =~ s/$end//;
      print $string . "\n" if ($string ne "");
      $string = "";
   }
}

But re-visiting the script with a sample input file (which I should have asked for to begin with :)) showed another bug, as you noticed: blank lines. In this version I've modified the "print" line to avoid outputting blank lines. Having the original input file is always a plus :)

Anyway give this version a go and see if it does what you want.
 
Back
Top