Wordcraft Home Page    Wordcraft Community Home Page    Forums  Hop To Forum Categories  The Written Word    Spam, Beyssian filters and word-salad
Go
New
Find
Notify
Tools
Reply
  
Spam, Beyssian filters and word-salad Login/Join
 
Member
Picture of shufitz
posted
Today I received an odd bit of junk-mail, where the typical offense portion was preceded by and followed by these two bits of strange text:
quote:
down strange print road dependent cushion slope till goat group that skin town circle thin glass trousers building second fat

porter hanging responsible light cough cat between disease lord dependent account rail because produce happy round event opinion pin position waste make idea sweet be far daughter work in guide experience linen separate horse grass voice where system part where meeting pen bad operation reaction journey fall weight judge food owner hat stage
Another junk-mail ended with small-print text that seemed to be a passage from a book – innocuous, meaningful, but irrelevant. Why?

Spammers are using a new technique to circumvent the kind of spam filter called a Baysian filter (after an analysis by Thomas Bayes, 18th century cleric.) That filter computes a total score after assigning to each word to one of four categories: negative, benign, positive and uncatergorized esoteric words; examples would be "sex," "and," "non-profit" and "epicaricacy."

So the spammers, to dilute or counter the negative rating of the guts of their messages, pad them with odd words that have nothing to do with their pitch. These are called "word salad" e-mails.
 
Posts: 2666 | Location: Chicago, IL USAReply With QuoteReport This Post
Member
Picture of Chris J. Strolin
posted Hide Post
Thanks much, Shu. I also have received emails of this sort and am ashamed to admit that I've spent considerable time trying to puzzle out coherence from the meaningless yet high-sounding gibberish they sent.

I thought it might be some sort of code, as if I don't have enough worthless endeavors with which to waste my time!
 
Posts: 681Reply With QuoteReport This Post
Member
Picture of arnie
posted Hide Post
Yes, sadly, the spammers do tend to keep one step ahead of the methods used to detect their spam.

Often these spam e-mails are sent in HTML format, with the padding words hidden from sight. I have my e-mail preferences set to display all HTML messages as plain text, so the words can be seen by me, but I avoid getting the images and other possible nasties that can be associated with HTML e-mails.


Build a man a fire and he's warm for a day. Set a man on fire and he's warm for the rest of his life.
 
Posts: 10940 | Location: LondonReply With QuoteReport This Post
Member
Picture of Richard English
posted Hide Post
Does anyone know why I keep getting messages (the latest being from kdmogzkdmkcd@hotmail.com ) which are always sent to undiclosed recipients and which have no content at all. They aren't viruses either since there is no attachment.

I have tried replying but never get a reply back so these days I just delete them. But somebody must have a reason for sending them out and if it's not spam or a virus than waht is it?


Richard English
 
Posts: 8038 | Location: Partridge Green, West Sussex, UKReply With QuoteReport This Post
Member
Picture of arnie
posted Hide Post
Hmmmm...

These are just guesses:

a) The message body is not, in fact, empty. It may contain a script to open your browser and take you to to a Web site, or even do something even nastier to your machine. The script can't be seen if you have viewing messages in HTML enabled. Thankfully these messages have become rare lately because modern e-mail software has the automatic running of scripts disabled by default.

b) I know that with at least one common bulk e-mailer, it is necessary to separate the body text from the headers with a special switch. If the spammer forgot this, you might find the text with the headers (the headers are usually hidden). It is also possible that somehow the script failed to attach the body text at all.

c) An error in the script used to write an HTML message meant that your e-mail program couldn't parse the file, so it displays nothing.

d) The sender is "harvesting" e-mail addresses for onward sale to another spammer. He sends out thousands of messages to a particular mail server using random user names. Those that are not returned by the mail server as "undeliverable" can therefore be assumed to be valid. Replies that say "Huh?" or similar are a bonus, as they verify that the account is actively checked; that address therefore commands a premium.


Build a man a fire and he's warm for a day. Set a man on fire and he's warm for the rest of his life.
 
Posts: 10940 | Location: LondonReply With QuoteReport This Post
Member
Picture of Richard English
posted Hide Post
I suspect the final paragraph is the real answer. I don't know how to turn the html on and off for reading (though I can do it for sending) but I am sure that, even then, there is nothing to see.

If I click on select all I should get an indication that there's something there - even if it's white on white.


Richard English
 
Posts: 8038 | Location: Partridge Green, West Sussex, UKReply With QuoteReport This Post
Member
Picture of wordcrafter
posted Hide Post
Word-salad is on the net, too. Thanks, shufitz, for explaing something that had puzzled me.

As you can imagine, I often google up a word to find example quotation. Some words, particularly very rare ones, are found almost exclusively on sites amid log lists of wildy unrelated words.

Many of those sites are sex sites; I see now that they used the word-salad to defeat the filters of google and the other search engines. The rest are mostly nothing more than the word-salad itself; probably they are posted spammers to share with each other.

As an example, you can google up dimication, which means "a fight or contest". With filter set to 'moderate' you'll get almost 4,000 hits -- and after the top nine hits, the word-salad begins.
 
Posts: 2701Reply With QuoteReport This Post
Member
Picture of aput
posted Hide Post
The way to avoid the word-salad on the Web is (for now) surprisingly simple. All those spam pages are harvested from the public-domain Webster 1913. (They also reproduce all the erroneous headwords.) But the genuine dictionary pages cite Webster as their source, whereas the spam just uses the headwords.

So {dimication} gets 4830 hits but {dimication Webster} knocks it down to 12 (plus repeats), and they're the real ones.
 
Posts: 502 | Location: LondonReply With QuoteReport This Post
  Powered by Social Strata  
 

Wordcraft Home Page    Wordcraft Community Home Page    Forums  Hop To Forum Categories  The Written Word    Spam, Beyssian filters and word-salad

Copyright © 2002-12