snork.ca ... making kittens cry since 2001
homeabouttoscanaryrssmansvalidatecontactsearch

Update: SURBL of URL Shorteners For Spamassassin 2016-08-09


NOTICE! The .cu.cc registry deleted my account tonight (possibly earlier) with no notice. No email, no warning, no opportunity to appeal, nothing. So I have bought shorturlbl.ca and have migrated the list of URL shorteners to that domain. I have modified my examples to use the new domain name in case anyone is copy/pasting in to their own configs. Sorry folks, anyone using this URIBL will have to update their configs to point to the new domain name.

I also noticed that in my example rules below I had a typo where the rule was being defined as urirhsbl but the test was being run as a uridnsbl. What's the difference? A uriRHSbl only tests the Right Hand Side of the domain name (removes host names from URLs) while a uriDNSbl does not remove the hostname. This means that an URL like http://bit.ly/123 would get flagged but http://www.bit.ly/123 would not! This has been repaired in my examples below.


SURBL of URL Shorteners For Spamassassin 2016-06-24


I don't like spam. I mean, I really don't like spam. I get all bent out of shape about spam, and I rant about it, and I setup my own mail server partially because of it, and I constantly fiddle with Spamassassin in an attempt to kill it. I know that almost nobody actually cares, but it seems like the right thing to do. Recently I was going through my logs and spam collection when I noticed that a number of recent spams were basically telling me that "someone" thought I would be very interested in some link... and the links were always to some URL shortener. Now for those who don't know, a URL shortener takes a long like this one and cuts it down to something short like http://snork.ca/9. The reason this kind of thing is popular with most people, is because some messaging systems (like SMS or Twitter) only allow a limited number of characters in a message. The reason this is popular with spammers, is because when their www.dickpillz.com web site gets flagged as a spam link, the URL shortener hides the actual final destination and sometimes gets their spam through.

image Superpooch really hates spam.

So, as I was looking through these spam messages, I also noticed that the spammers were using a number of different URL shortener services. This is when I decided that a simple Spamassassin rule was not going to be enough and I was going to have to make a list of these services. I toyed with the idea of writing one big long regex in my Spamassassin rules but that would be pretty long, and as far as I can tell, there is no way to write a multi-line Spamassassin rule. Once I started searching for lists of URL shorteners I also realized that there really are a lot of them. So I created a SURBL list to track them.

I basically searched for any list of URL shorteners I could find, and combined them all in to one big list. Then I used an online utility to remove all the duplicate lines and alphebetize the list. When that was done I had around 700 domains in my list. I then surfed to each one manually to see if it really was a URL shortener. A lot of the domains were dead or for sale, and some were bought by marketing goofs who set them up to redirect to their scummy (aka scammy) marketing sites. I also did not want to include sites that are limited to certain domains. For example, t.co is owned by twitter and [as far as I can tell] only lets you create shortcuts to twitter itself... which makes it mostly useless to spammers. In the end, after hours of checking sites, my list of publicly available free URL shorteners is just a little over 200 sites.

So, how do you use this list with your Spamassasin? Well the SURBL domain is snork.ca and you can set it up like any other SURBL... but I would suggest making it part of a meta rule rather than just penalizing anyone who uses a shortener service. If you really want to be harsh to all shorteners you could do this:

urirhsbl   FU_SNORKCC   shorturlbl.ca A
body       FU_SNORKCC   eval:check_uridnsbl('FU_SNORKCC')
tflags     FU_SNORKCC   net
describe   FU_SNORKCC   This email contains a link to a URL Shortener site.
score      FU_SNORKCC   2.0

Clearly you can provide whatever scoring you would like, but do remember that these services are still used for legitimate emails too. As mentioned, a better way to set this up might be to use a meta rule, which is for using combinations of other rules. For example, I noticed that a number of these spams would say something like:

This email is not marketing or scammy blah blah blah...

Well, if I receive an email with the word marketing and a link to a URL shortener I am pretty confident it is junkmail. So I could create a meta rule (or rather set of rules) like this:

urirhsbl   FU_SNORKCC   shorturlbl.ca A
body       FU_SNORKCC   eval:check_uridnsbl('FU_SNORKCC')
tflags     FU_SNORKCC   net
describe   FU_SNORKCC   This email contains a link to a URL Shortener site.
score      FU_SNORKCC   0.01

rawbody    __FU_MARKETING  /marketing/i
meta       FU_SHORTENER1   (__FU_MARKETING && FU_SNORKCC)
score      FU_SHORTENER1   6.5
describe   FU_SHORTENER1   The term "marketing" and a URL shortener site.

Because the "__FU_MARKETING" rule starts with two underscores, it is evaluated but not actually scored. The meta rule will only apply the 6.5 points if both rules are true. I set my "FU_SNORKCC" rule to 0.01 points because I still want to be able to check email headers to see if it is being set off. When I am satisfied it works the way I want then I will change the name to double underscores to hide it.

I should note that I got the domain name for this project at the cu.cc registry, and so far I have no complaints about them. I have previously had .tk and .ga domains from freenom but they have a really bad reputation and their customer service is entirely non-existent. I have the DNS hosted at GeoScaling and was happy to see that they let me paste an entire zone in to populate my A records. This service is free for anyone who wants to use it for any non-commercial purpose. I can't really stop anyone from using it for commercial purposes but I can at least express my preference. If you have any suggestions for the list, additions, updates, removals, or just want to rant about spam then please feel free to contact me... I guess I should fix that contact link up there eh?

View the complete list of domains here and the list of changes here.

PS: Thanks to Jim for pointing out my error in the rule defintions. It is important to note the difference between urirhsbl and uridnsbl in the code lines above!

Made using Notepad++ & FastStone, hosted using nginx & php, search by JRank, and powered by North Korean mushrooms.