Finding Non-Duplicate files and synchronising to a master directory

radii

Registered
I have about 6TB of files across a number of networked
computers running on Mac OS 10.6 that I want to do two things with.

1. Find every duplicate of each file (from my investigations thus far
there is probably one or more duplicates for about 50% of the 10
million or so files) --€” the number of duplicates varies from 1–50.

2. Specify one of the directories as the Master Directory, and then
copy all non-duplicates to the appropriate place in that Master
Directory --€” the criteria for the determination of which folder a
file is to be copied to will depend on:

(a) whether the file name is similar to a file name in a folder in
the Master Directory

(b) the name of the set of folders that the file is nested within --€”
in the majority of cases, this will provide sufficient information
about where the file should be copied to.
In most cases the process will require something like the following:

A. Locate all folders with the same name and nested structure.

B. Present the user with any partial matches of the nested structure
for manual intervention€” eg. A>B>C>D>filenameXX.doc and
A>B>D>filenameXX.doc

C. Test for duplicates, and€” discard all duplicates of files that
already reside in the relevant folder in the Master Directory.

D. Copy all non-duplicates to the appropriate folder in the Master
Directory.

I look forward to your suggestions of apps that can be used together to achieve this task.

Cheers
Peter
 
Even if what you ask is possible, then I believe that it is dangerous and unnecessary. If you want to identify files as the same even though they have different names and also want to delete files that you identify as duplicates, then you could end up with a god awful mess. I don't mean this in a good way.

A few duplicates among 6 TB of files is a small price to pay to avoid the regrets you avoid by not deleting duplicates that turn out not to be duplicates afterall.
 
One recommendation would be to exclude at least all /System, /Library, and each user's ~/Library, ~/Preferences completely from your search for duplicates.
Plus probably any .../preferences, .../kext and .../library shouldn't be touched no matter where they are.
 
Plus clear your font cache by boing a Safe Start (hold down the shift key while booting) and then restarting in normally. This should clear your font cache.
 
Back
Top