2013-10-26

1)

In order to do text expanding, it's a good thing to run a word frequency counter on typical texts first, to identify

- the real, relative frequency of all those short words we all use by the thousands

- the typical words in such typical texts, i.e. those 10, 20 or 40 "special" words it would make sense to enter in an abbreviated form here. This being said, depending on your work, you might not have the need to identify such different terminology groups, but perhaps you use the same "special" terminology in all of your texts... or then, both is true: there will be a base group of words of yours, and then some more typical terms for different text sub-groups; as said, good text expanders (and AHK, especially), permit combinations of several such abbreviation groups, i.e. several abbreviation files run concurrently if you wish so.

Also, it's important to know that those word frequency list tools are totally different from word count tools, which don't build up lists but serve translators to bill their customers, on translated words. For this, different, software category, there are also many different offerings, some showing clemency to the customers, while others do not (but tend to multiply the "word" count).

Also, all prices are plus VAT, here in Europe, so a price of 40$ comes to 40€ for example, which is a real nuisance. In ancient times, many vendors sold directly, without bothering about these awful taxes (then directly sent to people who live on us, e.g. around the Mediterranean basin and in Brussels); today, they almost all make you pay by payment services that take those takes from you, then (perhaps) send it to your respective tax collector - this fact very often makes me refrain from buying software, even it's "only" 20$, since for your feeling to be stolen from, 20$ equal 200$.

2)

So you need a word frequency counter, or a concordancer (of which most (or all?) also permit to create just a word frequency list. I have to add that my text, exported from my outliner, is a plain text with some 2,1 million bytes, and with some 340,000 words (this latter number heavily depends on the tool used), so I didn't treat the tools with some tiny texts, but with a real-world example.

From google, you'll get some real crap first if you leave out the "frequency" part of your search term.

There is "Word List Expert", 15$, one of the worst programs I ever have installed in my system. Sometimes, the creation of the word list just takes 10 minutes, and then, even after 30 minutes, the progress bar just shows about 15 p.c. "progress"... and if you get the word list, it will only show the very first 780 words or so (in my system), and there's big differences to what I get from other tools. Except for viruses, I've never been so happy as when I de-installed this piece of software.

There is "Word List Creator" and / or "Word List Maker", both from the same developer, I suppose, but available screenshots show some differences. There are several "buy" sites for them (and one even with 30 p.c. off), but there is not a single download site for either of them (perhaps there is, but I searched for about 30 minutes or so), and all there "original" sites are unavailable: www.wordlistcreator.com and www.wordlistmaker.com and www.mysoftwarefactory.net - There's also "Word Sorter", 10$ from the same, cannot be found either. I did not try to buy, in anticipation of big problems for the second part, "immediately after payment, you'll get the direct download link", when all those trial download links were 404's...

There is "Word List", from "www.i496.net", cannot be found.

There is WordMetry (29$ or 25$), from http://guoshesen.51.net/ - another 404.

I did not try "Crunch Wordlist Generator" (from sourceforge), "WordList 1.0" (free), "Free Wordlist Generator 1.9" (free), "WordList Generator" (free, from sourceforge), "Word List Compiler" (free, from LastBit Corp).

I did not try "Translator's Abacus", I did not try "WordCounter for MS Word" (20$, from Editorium), but both seem to be serious offerings.

3)

There is "Word Frequency Count Software 7.0" from Sobolsoft - their specialty is to have about 1,000 applications, since they systematically cut up their software into minimum scope: e.g. a word frequency tool for Word texts, for .txt texts, for .pdf texts, and so on, so you buy 10 times what should be the same software (For anything else, they do the same slicing-up.). I never bought anything from them, and I'll never will.

There is www.seasite.niu.edu/trans/wordfrequency.htm - did not try.

There is MyWordCount, from mywritertools.com (15$) - does not seem to be bad, but there is no trial, and what do I know about software perhaps choking my real-world examples like the above if no trial is permitted?

There is "Word Frequzency Counter" (fre or 20$) from wordfrequencycounter.com - another 404, also a beta from 2009 under wordfrequency.codeplex.com - did not touch it.

There is WordStat (see below), from provalisresearch.com - well, this is 3,000$, and they also offer other qualitative analysis software and such, QDA Miner and more (academic prices are about 600$).

4)

There are some free concordancers from some universities, and which also do just word frequency lists:

There is "Abundatia Verborum", from the university of Louvain / Leuven, Belgium. If you ask for access codes, they will perhaps send you them; I didn't bother to do so. ( wwwling.arts.kuleuven.ac.be/genling/abundant )

There is "AntConc", seemingly from a British scientist at some Japanese university - http://www.antlab.sci.waseda.ac.jp/software.html -, and it's quite known, but I didn't have real success with it: From its word list tab, it builds up the word list in just some seconds, but then takes 100 p.c. of my processor, for hours (!), and whilst other applications crawl (but ain't entirely dead, as is AntConc), I'm unable to scroll the word list, so I only get the very first 39 hits. (with XP and 2 gigabytes of memory, most of it free) Perhaps on your system it works correctly, and then it would be a very good choice for a free program, it does a lot of things (if you can bring it to work).

You'll have "Kwic Concordance", from another Japanese university, www.chs.nihon-u.ac.jp/... - not tried but seems to be a serious offering; 5.0 is for XP, 5.1 being for more "modern" operating systems.

There is TextStat2 (not to be mixed-up with WordStat above), from Freie Universität Berlin, and I ended up using this program since it's similar to AntConc (but with some early "difficulty"), and didn't choke on 2,1 million characters but put out the word list after just several seconds. The "difficulty" lies in its gui: For the file to be analyzed, there is no extra pane, but you'll put it into the main pane which afterwards will contain your list, after your triggering the list building command. So this is not very intuitive, but it works perfectly. On the other hand, I got some false positives with it, e.g. "tten" with 140 occurencies, when in fact, in the text, this "tten" is not a word, but just a part of words like "hatten", "hätten", etc. - perhaps my settings weren't correct. On the other hand, my first 39 hits were more or less identical to those from AntConc, with only minor differences in frequency, so for creating better text expansions than just from my "guesses", as before, I'll now use TextStat2... AND also one of the programs below!:

5)

Now for some commercial offerings:

"Hermetic Word Frequency Counter", from Switzerland, 40$ or, "Advanced Version", 60$. If you often need such a tool, this advanced version (don't buy the regular version, those 20$ more (plus tax, of course...) are really a good investment (for both: "PayPal 5 p.c. off", which is very original, since PayPal is rather expensive for both sides (for the customer, too, because of their bad exchange rates), so in many cases, PayPal payments do bear a surcharge, not a rebate), you'll have lots of options, and it's in continuous development.. but this might be considered a problem, since there will then be frequent, paid updates... (They also offer 1-year licenses and 3-months licenses, another rare thing but devoid of sense since either you need such a thing "everyday", or you'll do with free offerings like I do here.)

6)

Textanz (40$) from www.textanz.com - not tried. Looks a serious offering, but then, those 40$ end up in 31,02€ plus VAT (with the € currently at 1,38$!!! So it always comes to "dollar to euro 1:1", and I HATE THIS TO THE POINT OF NOT BUYING ANYMORE BUT IN CASE OF ABSOLUTE NECESSITY), and Franz Grieser wrote here:

http://www.outlinersoftware.com/topics/viewt/1560/15

Posted by Franz Grieser

Aug 27, 2010 at 12:48 PM

Hugh

>Curiously the single piece of pure desktop

>text-analysis software that I’d previously heard of, Textanz, isn’t mentioned in

>the list on the second link. Textanz is aimed at writers, not corporate or

>governmental data-miners, and is on the PC platform. It was last updated in 2009; a

>note on its website placed in 2010 says that a cross-platform Java version is being

>developed, but as the note spells its own software “Textans”, I don’t have high

>hopes!

I wouldn’t count Textanz as a text-analysis tool. What it does

- create a concordance (= a list of words used in a text)

- create a list of frequently used words and phrases

- show how often each word is used

- show where in the text a selected word is used

That is pretty useful for writers: I use the tool for checking my texts for repetitions.

But it’s not really what I consider text ANALYSIS.

Just my 2 cents.

Franz

7)

Which brings us to SmartEdit, also commented on here. Somebody (Delyannis) said it hadn't left its free version, which is not true, but both a good screenshot and the free version are well hidden. Here's a comparison: http://www.smart-edit.com/compare-smartedit.html and at the bottom of the page, there's the download link for the lite version. Install it, then press (in the ribbon, unfortunately!) "Help", which again brings you to the developer's site, and here you'll finally have a good screenshot of the program: http://www.smart-edit.com/help.html?free (Of course, this links works from here, too.)

Now this lite version is really crippled: There is no way to export your word frequency list, except by screenshot, scoll, another screenshot...

But for what I need this for, I'm happy with what I get, and I can sort (as in most other programs) both by frequency and alphabetically (the list is called "Repeated Words"). Here again, it's not too intuitive: first, click on "Options", and then, in "Select Checks to Run", de-select everything except for "Run Word Frequency Counter", click "OK", then only click on "Run Checks". Then, the program just takes some seconds, and you'll get your word frequency list, and I cannot tell you HOW MUCH INSTRUCTIVE such a list will be: 8,000 "die", 7,000 "der", 6,500 "und", 3,000 "das" / "nicht" / "ist", 2,700 "zu" / "den", 2,500 "sie" (attention: TextStat2 says, 1,500 "Sie" and 1,000 "sie"!), 2,300 "auch", 2,100 "mit" / "es" / "oder" / "sich", 2,000 "ein", 1,900 "auf", 1,700 "für" / "im", 1,500 "des" / "eine" / "dass", 1,400 "aber" / "dem" / "als" / "wenn", 1,200 "bei" / "nur" / "sind", 1,100 "er" / "werden" / "dann" / "man" / "wird", 1,000 "wie" / "vor", etc., etc. - you'll understand that such a list is TOTALLY different from what I did expect to be my most frequent words in the past, and this explains why I always have been so unhappy with my abbreviations: As soon as you get the "real numbers", you can tweak them accordingly, so that they finally become really useful to you!

But back to SmartEdit, the paid version: It's 50$ for normal people, and about 50€ for us EU-subjects/-slaves, and this passage from the license, "SmartEdit is new software, first released in December, 2012. Though minor upgrades are free to licensed users, the software is sold as is, for what it does today, not for what it might do in the future." has arisen more than one comment here and elsewhere. This being said, it has a lot of fine, potentially useful features, but I'm not sure at all it's a concordancer, i.e. that it also shows the hits, from the list, within their respective context!

8)

As does "Concordance", from http://www.concordancesoftware.co.uk/ (87$ plus VAT), so perhaps you'll need both in the end if you really need them to look into your texts before publishing. On the other hand, the main purpose of SmartEdit seems to be to VARY your terminology, when in fact, it doesn't help you with CONSOLIDATING, so its appeal to non-fiction writers is greatly reduced by this choice of scope, at this moment in time at least - development in just this very first year of its lifetime has been impressive, though.

9)

There are other concordancers, sometimes with 2-year licences only, from Athel/Athelstan, or then, they have got a word list, and a collocation list, but not in the same frame (e.g. tlCorpus Concordance 6.0, 55$: so you'll have to switch forth and back between these two views).

And then, there are aligners, but that's another story: Aligners have always fascinated me since we also discussed text atomization - which they automate, for translation purposes -, and I'm wondering if some of their concepts could be transposed to resolve similar problems - cross-referencing between and/or gathering of paragraphs all over the tree - in outliners (especially the semi-AI found in expensive aligners: not the free/200$, but the 1,000$ variety).

As it is, I highly recommend SmartEdit Lite for building up frequency lists for your autocompleter files, and if you need to export those lists, check out the university offerings mentioned above, especially TextStat2 (or AntConc if it doesn't choke on your system, too).

Show more