Need an applescript for a bazillion text files?

tmaximus95

Registered
Hi,
I need a script that will go through a folder full text files, and load each one, edit it, according to some parameters (textwrangler), save the file, and then go on to the next file in the directory and perform the same operation, until the last file in the directory is reached.

Is there such a solution out there?

Walt.
 
Last edited:

Mikuro

Crotchety UI Nitpicker
Can you give a general idea of what kind of operations you need to perform? TextWrangler has a built-in multi-file find and replace feature (at the bottom of the Find window), and it also supports powerful regular expressions, so that might be all you need.

If your needs are more advanced, you cna certainly use AppleScript. Fortunately TextWrangler has a decent scripting library.

This is just quick snippet, but maybe it'll give you something to start on:
Code:
tell application "Finder"
	set the_folder to folder POSIX file "/path/to/folder/"
	set the_files to every item of the_folder whose name ends with ".txt"
end tell
repeat with f in the_files
	tell application "TextWrangler"
		replace "old" using "new" searching in file (f as text) without saving 
	end tell
end repeat
(Change "without saving" to "with saving" if you trust the script enough to let it overwrite your files.)
 

tmaximus95

Registered
I made an automator script using text edit, but text edit can't do the operations I need done.

Thanks for your help, I'll let you know how it worked out.

Walt
 

earthsaver

Ben R.
TextEdit doesn't make Automator workflows nor AppleScripts. Use AppleScript Editor to create and run your script. Sounds like TextWrangler is more suited to your needs than TextEdit, as you suggested in the beginning.
 

tmaximus95

Registered
That worked fine. Thank you.

The only problem I have now is finding a way to delete the blank lines in over 2,000 files full of text. Each file contains 100 articles being formatted for import into MySQL for further processing.

I can't seem to find anything; stock that is, that would do that job for me on so many files, even in textedit or textwrangler, I can't find a way to do that.

I haven't checked Dreamweaver yet, but I'm hoping I can find something to do the Job. There's so much work to on those files to make them clean text articles.

What I've done so far is: create an automator script that first launches textedit, then it gets a folder contents, then combine's text files, then filter's the paragraphs, set's the contents of TextEdit document, then quits all applications.

Automator's Filter Paragraphs function is severely limited and needs improvement. If only it had the capacity to remove blank lines, I'd be in heaven. lol.

I'm sure I'll figure something out though.

Thanks for your help, and happy holidays.

Walt.
 

simbalala

Registered
Post it to the BBEdit talk group on google and I bet you'll have an answer in less than an hour. You'll most likely end up with several solutions.

http://groups.google.com/group/bbedit

edit - I'm assuming you tried the multiple file find and replace in TextWrangler and found that 2,000 files was too many or something. (Just search for carriage return - carriage return and replace with carriage return)

If that doesn't work in your case the guys over at the talk group will have something for you, they love this kind of stuff.
 
Last edited:

earthsaver

Ben R.
Whoa! You just used a couple keywords that finally told us exactly what you're trying to do. That opens a whole lot of new options, among text-cleaning applications. TextSoap and Clean Text are two options.

Okay, actually, those are probably your best choices and there may be few others to find. Hope one of them is worth your while.
 

Mikuro

Crotchety UI Nitpicker
You can also delete blank lines easily with TextWrangler using regular expressions (AKA grep). In the Find window, check the "Use Grep" box, and enter something like this in the find box: "^\s*\r" and leave the replace box blank. Then click replace all. Bam. This will also delete lines that contain spaces or tabs but no actual text.
 
Top