H2 remove duplicate lines/records?

James Bond · May 9, 2001

I have a text file that contains several thousand lines/records (tab separated text), some of which are duplicates.

What is the easiest way to strip them out, leaving only unique lines?

blb · May 9, 2001

One way would be, in a Terminal, use the sort command,

sort -u textfile > textfile.out

this will sort and then remove all duplicate lines (the -u switch) with the output going to textfile.out.

James Bond · May 9, 2001

my text file comes from a mac application and whilst it seems to have line separators in the GUI applications, these turn out to come from ^m characters.

So sort does not see them as separate lines!

I guess I need to use sed first....but how to replace control characters....or is there a better way to do this?

blb · May 10, 2001

How about tr:

tr '\015' '\012' < mac_text_file > unix_text_file

and reverse the two numbers to go in the other direction:

tr '\012' '\015' < unix_text_file > mac_text_file

James Bond · May 10, 2001

Thank you!

H2 remove duplicate lines/records?

James Bond

Registered

blb

`'

James Bond

Registered

blb

`'

James Bond

Registered