A brief study of letter frequency in English

Ok, American - but that's what I'm used to. This is a little project that I did based on a conversation with my dad (he of the dozens of dictionaries) just for fun. I've done many other silly, but amusing projects, but this is one of the few that I bothered to document. I thought that some of you might find this interesting or amusing. I also thought that it should put to rest any unwritten speculation that you've heard the last of me. However, my contributions here are going to occur, for the most part, when business is slow. Enjoy, Bob

Letter Frequency

The "dictionary" (Actually, just a long word list) from which these are drawn (Red Hat Linux) contains 93,397 words. All proper names (such as November) have been converted to lower case to simplify the counting process. There are (to the best of my knowledge) no "commercial" names in this list. The actual counts read left to right, not downward.

Descending Frequency Count for first letters of words in the English Language:

s, c, p, a, d, m, b, r, t, i, e, f. u, h, g, l, o, w, v, n, k, j, q, z, y, x.

11465 s; 7490 c; 6624 p; 6103 a; 5786 d;
5090 m; 5085 b; 5000 r; 4756 t; 4324 i;
4229 e; 3945 f; 3757 u; 3159 h; 2954 g;
2860 l; 2824 o; 1868 w; 1780 v; 1751 n;
791 k; 676 j; 594 q; 220 z; 209 y;
57 x.

Descending Frequency Count for second letters of words in the English Language:

e, a, o, i, n, r, u, l , h, t, p, c, m, y, x, v, s, b, d, w, q, f, g, k, z, j.

14479 e; 12364 a; 11282 o; 8827 i; 8533 n;
7354 r; 7313 u; 4206 l; 3269 h; 2251 t;
2169 p; 1817 c; 1735 m; 1414 y; 1118 x;
954 v; 924 s; 775 b; 738 d; 543 w;
362 q; 333 f; 312 g; 229 k; 43 z;
27 j,

Descending Frequency Count for third letters of words in the English Language:

r, a, n, t, s, e, l, i, o, c, p, m, u, d, b, g, f, v, h, w, y, x, k, z, j, q

8984 r; 8081 a; 7170 n; 6794 t; 6551 s;
6179 e; 5755 l; 5719 i; 5509 o; 5164 c;
4175 p; 3862 m; 3722 u; 3136 d; 2524 b;
2488 g; 1654 f; 1516 v; 985 h; 860 w;
807 y; 443 x; 373 k; 320 z; 286 j;
264 q.

Note the wide discrepancy from “etaoin” in first letters, moderate similarity to “etaoin” in second letter, and “in between” for third letters. Also note the relative clustering of frequency in third letters, versus the sharp drop-offs in frequencies (at different points) for first and second letters. Some day this might make an interesting graph. If I do it, I'll post it (or a URL to it).

Posts: 56 | Location: Santa Clara County, CA

Seanahan

Member

posted

Hide Post

I would think a better judge would be a corpus of English text. The letter "e" may not be the most frequent amongst the words, but is the most commonly used letter in text.

Posts: 886 | Location: Illinois

Ignored post by Seanahan posted

Show Post

zmježd

Member

posted

Hide Post

This sort of thing is fun. For a data structures class I used to teach, I'd assign a homework assignment for a concordance to measure the frequency of words. No morphological analysis was possible for so short a project. The students had to choose a longish text from the Gutenberg library project and compare results. Most choose to analyze English language texts, even though over half the classes tended to be non-native speakers. I kept waiting for one of the Chinese to run their program on Chinese texts. Another fun thing to do would be to do phoneme frequency by using a dictionary that had phonological representations (i.e., pronunciation guides) to determine what a typical word (or better yet syllable) looks like in English.

—Ceci n'est pas un seing.

Posts: 5148 | Location: R'lyeh

Ignored post by zmježd posted

Show Post

BobKberg

Member

posted

Hide Post

Ooh! zmjezd! very cool idea!!!

I'll have to play with that idea a little bit.
Although it occurs to me that the period in which the original text was written would doubtless play a role in the selections.

Bob

Posts: 56 | Location: Santa Clara County, CA

Ignored post by BobKberg posted

Show Post

BobKberg

Member

posted

Hide Post

I suspect Seanahan, that you are implying the repetition of articles, prepositions and such in regular usage.

If so, I don't doubt you for a moment.

I am simply having a little fun with the language, and the odd/interesting patterns one can encounter.

Bob

Posts: 56 | Location: Santa Clara County, CA

Ignored post by BobKberg posted

Show Post

Please Wait. Your request is being processed...

Wordcraft Home Page

Wordcraft Community Home Page

Forums

Questions & Answers about Words

A brief study of letter frequency in English

	View $GS_USERNAME's Public Profile
	Add $GS_USERNAME to my Buddies
	Add $GS_USERNAME to my Ignore ListRemove $GS_USERNAME from my Ignore List
	Invite $GS_USERNAME to a Private Topic
	View Recent Posts by $GS_USERNAME
	Notify me of New Posts by $GS_USERNAME

Quick Reply to: A brief study of letter frequency in English
Guest Name

Close \| Use Full Posting Form \| Quick Quote