the spamhuntress
Tag: spam
Spam filter evasion self-defeating?
might the decreasing comprehensibility of spam finally have an impact on spam click through rates? hope never dies.
the pill-peddlers and scam operators are getting ripped off, too. They think their products or scams will be advertised in a comprehensible manner, in readable emails; but instead, odd, opaque 3-word messages with “cut and paste this” lines, hidden inside filter-evasion text and bits of Project Gutenberg, are what gets delivered to the victims.
Spam Farms
Certain topics are especially well suited for baiting the technology-oriented crowds of social news and bookmarking sites. Stories focused on Apple, Firefox, Google, Nintendo, history of computers, top X lists, or the target social site itself are common baiting practices used to attract attention and place a new content node on the map. Opportunists will continue to jump into new networks of influence and promote their own sites, gathering search engine juice even when the brief blip of attention has passed and the crowd moves on to another story of the moment.
how to scam fat geeks in their basements
Pinging Service API
The Google Blog Search Pinging Service API allows users who frequently update their blog to programattically inform Google Blog Search about changes to their blogs. Blogging provider admins can also use this API to notify Google of changes to blogs on their platform(s). To set up automated pinging of Google Blog Search, create either an XML-RPC Client or a REST Client which sends requests as noted below. It doesn’t matter which method you choose for notification; both are handled in the same way.
this would be useful for other web content too (news items, say). unfortunately this will be filled with splogs
Windows is spammy
using os fingerprinting to discover spam. this is gonna hurt windows mail servers
This breakdown shows what %age of the stuff coming in via OS xyz is spam or ham. ie 84.6% of all mail received from Windows-2000 is spam, 14.9% is ham (the rest is viruses).
Spam filtering
Spam filtering as a proxy for search market share
Why is it that the most basic spams and 419 scams make it past yahoo’s spam filters into my Yahoo inbox? I was willing to go give them a fair shot with their new ui, but their spam filtering is beyond bad, and makes their new mail beta just as unusable as the old one. Almost makes me wonder if they have commercial reasons for letting a lot of spam through to their user base. Not having a capable contextual advertising platform must put them in some tight spots when revenue maximization time inevitably rolls around, and spam thresholds are early victims, I suppose. Even more so at MSN, whose hotmail is even worse. it is the rare event, however, maybe once a month, that spam makes it to my gmail inbox.
The less search market share your email provider has, the more spam you can expect in your inbox.
The new york times reports that gmail spam filtering is getting even better. Meanwhile, the obvious spam in my Yahoo inbox continues.
Looks like yahoo can’t even afford a SSL certificate for their mail domain.. oy. Plus they insist to show you a spammy ‘start page’ instead of your inbox. Someone getting desperate in the monetization department?

Bot classes
After a couple days of robots.txt love, I have now much less crap in my logs. A good opportunity to see which bots are well-written. Based on what I am seeing with /robots.txt, I am sure glad I blocked most of these festering piles of dung from my site.
not using conditional get while requesting /robots.txt
Only kinjabot, OnetSzukaj/5.0 and Seekbot/1.0 get this right. All other bots, including Google and Yahoo, do not. Lame.
requesting /robots.txt too often
The biggest offender is VoilaBot, checking /robots.txt every 5 minutes, every day. You gotta be kidding me. Google and Yahoo are not much better, you’d think they’d figured out a way by now to communicate the state of /robots.txt across different crawlers. Other bots fare better by virtue of being less desperate.
Problems like this are economic opportunities.
Crawler blight
i went ahead and blocked most crawlers in my robots.txt. there are too many of them, and for most, my ROI is negative anyway. if you had any doubts how far search still has to go, or how many moronic copycat companies there are in this space, spend some time with your log files.
BA Spam
first, i get this email to my british airways account:
-------- Original Message --------
Subject: test email on TCRM
Date: Fri, 1 Jul 2005 11:34:46 +0100 (BST)
From: British Airways Executive Club
Dear Mr Rothfuss,
It's warm in here but I'm having fun
test email
Yours Sincerely
Afzal
inevitably followed by:
-------- Original Message --------
Subject: Email sent in error by British Airways
Date: Fri, 1 Jul 2005 17:17:50 +0100 (BST)
From: British Airways Executive Club
Dear Mr Rothfuss,
You may have received an email titled "Test email on TCRM" this afternoon, please accept my sincere apologies as this email
was sent in error whilst we were undertaking routine testing.
I would like to reassure you that we have now rectified this error.
Yours sincerely
Sarah Keyes
Loyalty Programmes Manager Europe
you gotta be more careful, afzal, even when it’s hot.
New hope for web search?
i have spent some time recently trying to do deep searches (the kind that give less than 10 results), and i noticed that the link farming/spamming has become so bad that search engines are falling back to circa 1995-levels of accuracy and duplicate removal. as ben hammersley notes, yahoo is the new google, what with openly publishing research into better search algorithms.