H2 remove duplicate lines/records?

James Bond

Registered
I have a text file that contains several thousand lines/records (tab separated text), some of which are duplicates.

What is the easiest way to strip them out, leaving only unique lines?
 
One way would be, in a Terminal, use the sort command,

sort -u textfile > textfile.out

this will sort and then remove all duplicate lines (the -u switch) with the output going to textfile.out.
 
my text file comes from a mac application and whilst it seems to have line separators in the GUI applications, these turn out to come from ^m characters.

So sort does not see them as separate lines!

I guess I need to use sed first....but how to replace control characters....or is there a better way to do this?
 
How about tr:

tr '\015' '\012' < mac_text_file > unix_text_file

and reverse the two numbers to go in the other direction:

tr '\012' '\015' < unix_text_file > mac_text_file
 
Back
Top