Ipstenu.Org

Not like I get enough comments to make this matter…

The spam on the JorjaWiki has been getting worse. This is a stupid circular argument about the net, since the object of all websites is to become used and ‘popular’ among the correct circles.

With that as a given, you have to face facts that the more popular a site is, the more people will try and spam it.

There is a reason for this.

Google (and most search engines) work via ‘rank’ which Google calls PageRank and explains more here. The short answer is that the more links to your site, the more ‘important’ you are to a search engine. Which is why spammers want to, well, spam and link to their sites. And this is why sometimes when you search for dog food, you get penis enlargement sites.

In addition to spammers making links hellish and fsking up the search results, there’s a practice known as Googlebombing. This is ‘old science’ as the technique has been around forever, and was named in 2001. The simple explanation is that a bunch of people get together and link to a site. Instead of linking the normal way (Slayer: The Series), which is a helpful and descriptive link, thy would do it in a silly way (adjustment vedit).

And yes, both those links go to the same place.

What this does is it has Google associate the term adjustment vedit with Slayer. And trust me, it has nothing to do with Slayer.

Run a Google search on ‘Miserable Failure’ and you’ll see what I mean. Seriously. It’s funny.

Googlebombing is a perfect example of why spammers spam. They want attention, and that’s a cheap way to get it.

Part of the concept is deflated, though. MediaWiki uses a

rel="nofollow"

tag in all their external links, which Google (and Yahoo! and many other search engines) all have agreed to accept. The nofollow tells search engines ‘Don’t follow this link and add it to your page rank.’

So if someone posts on your wiki (or blog!): Visit my <a href=”http://www.example.com/”>discount pharmaceuticals</a> site.

That comment would be transformed to: Visit my <a href=”http://www.example.com/” rel=”nofollow”>discount pharmaceuticals</a> site.

WordPress embraces this technology, as does MediaWiki and some other software I use to ‘automate’ links. This all means that someone who spams my wikis or blogs gets nothing.

But this won’t stop them because they’re not human!

No, seriously, they’re bots.

90% of the spam I get are bots: computer programs written to utilize the automated (aka dynamic) sites and insert their spam. My wikispam was all botted, as I learned when I traced the IP addresses back.

Now, it’s one thing for someone to dislike my site and vandalize. It’s another to write a program just for perceived advertising. And personally, I’d had it. After implementing a couple ‘catch spammers by what they post’ tools, I was still angry. Having code that says ‘Aha! Your post includes a link to bigdick.com! You’re spam!’ is great, but it works as a reaction. It’s a defensive play.

I wanted to go on the offensive. Enter Bad Behavior. This checks the headers of incoming requests (be it read or write) and if they ping as a bot or badly configured browser, they get punted.

I slapped this on my wiki and caught … well possibly innocent users now that I read the log. Two AOL users, a Comcast user from Texas, Amgen Inc and someone from RIPE. Reading the raw log, I saw that the Amgen was a user trying to log in (it saved the ID name from the header!) and I whitelisted that IP address. I’m thinking Amgen-User was browsing from work, and the work proxy sucks donkey balls.

I mean, hell, my office has a hella stern proxy/firewall and I can get in.

I put Bad Behavior on my blog as well (and this is why the title is what it is), and I’ll keep an eye out for what I get. Having swapped to WordPress and it’s nice, easily readable URLs (which I love), Google will probably re-index my entire site and then come spammers.

As for JFO’s 5 possible non-spammers, I’m working on it.