Wordcraft Home Page    Wordcraft Community Home Page    Forums  Hop To Forum Categories  Potpourri    Least used letter?
Go
New
Find
Notify
Tools
Reply
  
Least used letter? Login/Join
 
Member
Picture of Kalleh
posted
In our local coffee house they have a trivia question everyday where you can get a dime off on your coffee (woo hoo! Wink) Shu always gets it right, which is annoying.

However, he wasn't with me today. The question was, what English letter is used the least? I guessed "x," but the guy said the answer is "q." I argued with him, reminding him of he common "q" words, like "quit" or "quiet" and then asked him how often he hears "x-ray." He countered with "exit," but still. I think "x" is used less than "q," don't you? He said that "someone" counted all the letters in the dictionary, and "q" lost.

Has anyone here heard of that before?
 
Posts: 24735 | Location: Chicago, USAReply With QuoteReport This Post
Member
Picture of aput
posted Hide Post
Z.

Letter frequencies were counted in the nineteenth century so that typesetters would have suitable numbers of each. This is the origin of the filler text ETAOIN SHRDLU that used to be seen in newspapers: those are the most common in order. Lists vary slightly but Z is always at the bottom. The list I memorized many years ago is ETAON RISHDLU CMFGY PWBV KXJQZ.

Because of its importance in cryptography, there's been a large amount of text analysed by computer, so the figures in this list are probably robust. That gives Z (0.1%) distinctly down from J (0.2%), Q and K (0.3%), and X (0.5%).

A large corpus like that would presumably be composed largely of texts that use Z in words like 'realize'. In '-ise' varieties such as Australian and journalistic British the frequency of Z would be even lower.
 
Posts: 502 | Location: LondonReply With QuoteReport This Post
Member
Picture of arnie
posted Hide Post
Well, I don't know about going through all the words in a dictionary, but this site shows the results of the analysis of 18584 common base words, and of 45406 common words.

Interestingly, "j" comes last in the common base words, and is second to last in the common words.


Build a man a fire and he's warm for a day. Set a man on fire and he's warm for the rest of his life.
 
Posts: 10940 | Location: LondonReply With QuoteReport This Post
Member
Picture of aput
posted Hide Post
A live experiment on a mini-corpus: the first thousand words of Pride and Prejudice

fixed, next, extraordinary, vexing, mixture, experience

acknowledged, known, Park, know, taken, know, taken, take, week, know, thinking, talk, likely, like, thinking, think, think, know, like, quickness, take, mistake, know, quick, make, knowledge, like, know, likes

quickness, quick

just, objection, Jane

Lizzy, Lizzy, Lizzy, Lizzy, Elizabeth

And to avoid conversational words, the first thousand words of The Origin of Species:

exposed, excess, exposed, experiments, exception, exceptions, exotic, exact, extremely, exposed, exactly

look, strikes, think, think, Knight, make, remarkable, weak, sickly, taken, like, kept, remarked

frequent, quite, quite

subject, just

organization
 
Posts: 502 | Location: LondonReply With QuoteReport This Post
Member
Picture of jheem
posted Hide Post
I ran Pride and Prejudice through a simple histogram program. Q came in first with 627, followed by X at 839, J with 873, and Z with 936.

Here's the results:

a: 41684
b: 9086
c: 13457
d: 22295
e: 69346
f: 11994
g: 10029
h: 34055
i: 37809
j: 873
k: 3207
l: 21583
m: 14755
n: 37670
o: 40020
p: 8225
q: 627
r: 32289
s: 33101
t: 46621
u: 14975
v: 5723
w: 12296
x: 839
y: 12697
z: 936
 
Posts: 1218 | Location: CaliforniaReply With QuoteReport This Post
<wordnerd>
posted
jheem: I ran Pride and Prejudice through a simple histogram program.

OK, I'll bite. What's a historgram?

Is 'simple histogram' an oxymoron?
 
Reply With QuoteReport This Post
Member
Picture of jheem
posted Hide Post
OK, I'll bite. What's a historgram?

A-H: "A bar graph of a frequency distribution in which the widths of the bars are proportional to the classes into which the variable has been divided and the heights of the bars are proportional to the class frequencies."

More inforomation here

I usually assign my intro programming students to implement a histogram, and tabulate and chart the frequencies of letters in different public domain books (usually from Gutenberg). Next assignment is to count the occurrences of words in a text.
 
Posts: 1218 | Location: CaliforniaReply With QuoteReport This Post
Member
Picture of aput
posted Hide Post
I make it 947 Z's, in the edition on the 'Republic of Pemberley', but close enough. But Z has such an unfair advantage here. Now take out the 633 mentions of Elizabeth and the 96 of Lizzy and 24 of Eliza and we're down to 98. Take out 34 mentions of Colonel Fitzwilliam, and 3 of Fitzwilliam Darcy, and we're down to 61.

Of these, 11 are forms of 'teaze', so no longer current: down to only 50 present-day dictionary words containing Z in the whole book, of which by the way 19 are forms of 'amaze'.
 
Posts: 502 | Location: LondonReply With QuoteReport This Post
Member
Picture of jheem
posted Hide Post
You're right aput. I thought of proper nouns skewing the stats after I'd posted. The interesting thing about doing this finding out stylistic tidbits, like the author's use of the "amaze" forms. Did you filter the X, Q, and, J words, too?
 
Posts: 1218 | Location: CaliforniaReply With QuoteReport This Post
Member
Picture of aput
posted Hide Post
Well Jane accounts for 284 of the J's, but you expect rather a lot of J names. It's only the Z's that are totally skewed here.
 
Posts: 502 | Location: LondonReply With QuoteReport This Post
Member
Picture of jheem
posted Hide Post
Yes, I looked through the Q words, and there's quite a few: all with oone or two occurrences. I think the Zs have it.
 
Posts: 1218 | Location: CaliforniaReply With QuoteReport This Post
Member
Picture of Kalleh
posted Hide Post
Well, either way, my "x" lost. I am surprised about "j" and would just love to see all those "x" words!

jheem, I am dying to know what your avatar means. Confused
 
Posts: 24735 | Location: Chicago, USAReply With QuoteReport This Post
Member
Picture of jheem
posted Hide Post
Its my screenname, jheem, written in Devanagri, the syllabary used to write Sanskrit and Hindi. I couldn't resist having a picture of a (non-)word. I guess I should've written avatar in Devanagri since it is a Sanskrit word.
 
Posts: 1218 | Location: CaliforniaReply With QuoteReport This Post
Member
Picture of Kalleh
posted Hide Post
jheem, what a creative avatar; it is especially appropriate on a word board!

Now, aput says that etaoin are the most common letters in that order, though he acknowledges some variances. Yet, from arnie's site "t" comes in 5th or 7th. Is that because the 18584 Common Base Words or the 45406 Common Words just don't have that many "t's?" Those most common letters are "eisar" or "aeirt."
 
Posts: 24735 | Location: Chicago, USAReply With QuoteReport This Post
Member
Picture of aput
posted Hide Post
That other site is a list counted by different words: that is, ignoring the fact that 'the' and other common words are repeated constantly. So the frequencies are not as they appear in text, and it's a rather odd measure.

Compare initial letters: the most common in text are TAOSW in that order; but in a dictionary, counting each word only once, they're... well, whichever bits of the dictionary are thickest, but with a heavy bias towards little-used words beginning with 'pre-' or 'un-'.
 
Posts: 502 | Location: LondonReply With QuoteReport This Post
Member
Picture of Kalleh
posted Hide Post
Yes, I am beginning to see the importance of what document you look at in order to see frequency of letters. The "Pride and Prejudice" example was another good one with the "Lizzys."

BTW, aput, in our wordplay thread, under "The Bluffing Game" I have nominated you to be next up. All you have to do is post a word that you think no one will know. Then people will send you private topics with fake definitions. You then post all the answers and people guess. If you fool everyone, you get 3 points. If someone picks the right answer, he gets 2 points, and people get 1 point every time someone picks their fake answers. We'd love to have you play! Big Grin
 
Posts: 24735 | Location: Chicago, USAReply With QuoteReport This Post
  Powered by Social Strata  
 

Wordcraft Home Page    Wordcraft Community Home Page    Forums  Hop To Forum Categories  Potpourri    Least used letter?

Copyright © 2002-12