Compare files, extract the differences

wicky

play thing
I did an HTML mail out for a client against a supplied .csv file. The client has now asked me to send a second mail out from an updated .csv file.

The problem is that both files are VERY large & in a completely different order.

I'm looking for the best/ quickest/ easiest way to resolve this because I don't fancy handpicking the differences.

Does anybody know of either software or a utility that would do this job, or alternatively a relevant Terminal command line approach.

Cheers
 
Open with Excel (or similar app), sort the results of both documents and do a merge. Has got to be possible with Excel (or similar app).
 
I guess that would do it, yes. "Process Duplicate Lines..." in the "Text" menu does just that, regardless whether the lines are in order. So you just append the second file to the first and run that command. Only works well if the lines are _exactly_ the same, of course.
 
It's in csv format, so all appears as one line.

Is there a command in Wrangler's "find & replace" that I can use to separate onto multiple lines (ie. ", " -> "carriage return")?

Thanks
 
You may already be embarking on a different route for accomplishing your objective, but I thought I'd make a quick note that there is a useful UNIX command for similar tasks. diff lets you compare two files and outputs the differences between them. It's been very useful for me in the past, although you will of course want to convert the file from the csv format first.
 
I'm feeling a bit dumb here...

I've processed the 2 csv's, so now I have 2 txt files each with an email address per line. If I add the 2 sets of content together and remove duplicate lines I will end up with a replica of the second (newer file).The newer file is exactly the same content as the older file but with some additions.

What I'm trying to achieve is just finding the differences. Which should amount to about 135 email addresses.

Am I missing something obvious?
 
I tried "diff -ib" in the terminal, however it output eveything.

Is there a way to just get the differences
(ie. the addresses that only appear in one of the files)?

Thanks for your help/ patience/ etc/.
 
Ah, I misunderstood what you wanted. Nevertheless, I think TextWrangler can do it. In the Process Duplicate Lines dialog, change the top option from "leaving one" to "matching all", then check the "delete duplicate lines" box.
 
Worked a treat.... I think.
At least I've ended up with a handful of email addresses rather than **many**.

Thanks!!
 
Back
Top