Go | New | Find | Notify | Tools | Reply |
Member |
Does the OneLook dictionary site have an obsession? Enter the ordinary word come, and see what's first listed in the Quick Definitions, in the box on the right. | ||
|
Member |
I don't know whether to take this as a serious question, or if it's just meant to point out a curiosity. I do know that the 'quick definitions' are taken from the indexed public domain dictionaries (for obvious reasons) and at one time I think they used dictionary.com (until that source started using copyrighted material). I don't know how the current algorithm(s) work. | |||
|
Member |
Seems a rather odd algorithm that puts that particular definition top and
at thirteenth. "No man but a blockhead ever wrote except for money." Samuel Johnson. | |||
|
Member |
obviously it's not making value judgments. | |||
|
Member |
It just doesn't make sense. They can't be so stupid as to not be able to figure out that some people just look at that top definition. Or...maybe they can be... | |||
|
Member |
this is why I suggested that there may just be an algorithm that's rotating through a list of their unprotected sources. edit: FWIW, this statement appears on OneLook’s acknowledgment page: The content that appears in the "Quick Definitions" section of our results pages derives primarily from WordNet, a project of Princeton University [license info], and data from the U.S. Census Bureau. It also derives from the hundreds of user-submitted additions and corrections we've received over the years. I get the impression that they aren't running a very tight ship* over at OneLook since Bob Ware left the captaincy -- perhaps they've just got a loose cannon. *to wit, it often takes 6 months to get them to do a db update.This message has been edited. Last edited by: tsuwm, | |||
|
Member |
here's a little more information on OneLook for those wots innerested.. OneLook is just part of a larger operation called Datamuse (YCLIU). Currently, Datamuse is managed on a day-to-day basis by a gent name of Harvey Beeferman. I used to have somewhat of a working relationship with Harvey's son, Doug. At that time, Doug had a real interest in analyzing OneLook's DBs, and he was sending me a list of "chokes", i.e., words not indexed. This list was ordered by frequency, which eliminated a lot of the spelling errors (but for the common ones), and I could cull the list for obscure words, add them to wwftd, improving OL, and Viola's your aunt! In late 2006, I got this note from Harvey: Yes, I’m Doug’s Dad. He works full time in the computer industry and I manage Datamuse on a day-to-day basis. He does technical work for Datamuse when he has time. We’re using Bob Ware, the originator of OneLook on a consulting basis to maintain the database. <shrugs> | |||
|
Member |
I don't know how other dictionaries order their definitions - but there must obviously be some amount of personal judgement. Even the obvious system of frequency of use would only be possible for the written word. And I would imagine that the definition selected here would be more frequently used vocally than in writing Richard English | |||
|
Member |
Remember that Onelook is not a dictionary. | |||
|
Member |
Whatever it is it's a reference source and this challenge must be common to all reference sources. How do you decide priorities? Richard English | |||
|
Member |
I can't agree. Consider our sample word: if you take OL's ack'ment at face value, the WordNet source says "The verb come has 21 senses," of which the offending one comes(no pun) 20th. the current word count of WordNet, per OL, is 119160. if you allow, say, 5 senses per average word (conservative), that gives you more than 500,000 senses to put in some order. I'm not going to do an extensive study, but pick a random word. I'm going to try 'average': WordNet lists 6 senses for the adj., 'statistical norm' being no. 1; quick def'ns gives this second. So what's obvious is that some reordering is going on. One might reasonably ask, why? WordNet is "freely and publicly available for download"; but perhaps there are issues having to do with copying large amounts of data, such as OL is doing. in any event, personal judgment or randomizing, for up to a million data points.. | |||
|
Member |
I find it hard to understand why you don't agree with my comment that there must be some personal judgement when it comes to ordering. Form the statistics you quote (although I confess I find some of them confusing)it seem clear that it would be impossible to use any strict numerical system or algorithm since there are just too many variables. In your own example of 5000 senses - how is one to decide how such senses are to be ordered? Does the 4,555th sense come before or after the 3,585th sense? And why? Personal judgement must inevitably be used. Richard English | |||
|
Member |
let me try this from a nother angle. I know from the situation at OL that they don't have the resources to make value judgments on 500,000 (not 5000) senses. I assume, from this, that they must have had their part-timer who "does technical work" gin up a randomizer to sequence all these senses, in a one-shot effort. (any value-judgments would then, per force, be programmed into his algorithm -- maybe he was the loose cannon.) does this make any sense? edit: I suppose, if one were really curious about this, one could imagine some other words with salacious shadings and see if they are consistently emPHAsized. edit²: I tried 'head'; 'oral-genital stimulation' comes *way down the list.This message has been edited. Last edited by: tsuwm, | |||
|
Member |
In WordNet, the senses are typically ordered by the number of annotations of that sense. There is no specific "ordering" of senses, although the most annotated synsets tend to be the most common. Also, the data is somewhat quirky. One will notice a preponderance of baseball terms ranking more relevantly than they will should. | |||
|