Multiplying Goalposts

Professional, Technology Comments

Some years ago I stopped following the USENET group news.admin.net-abuse.email. However, like many people who do what I do, it is somewhat necessary to follow for work-related reasons. So, I’m reading it again.

There is an interesting discussion going on now that is rather telling for people in the industry. It centers around what words mean. This isn’t a case of “I say ‘tomato’ and you say ‘tahmato’”, but rather that some people are calling apples oranges and others are calling apples cherries.

The root cause of this is a big fight currently going on between Al Iverson and Matthew Sullivan. Al is compiling stats of DNSBL accuracy rates. One of the groups that didn’t turn out as well as they thought they should have was Matthew’s SORBS list.

Let’s start with the definition of “spam.” NANAE is currently sporting at least 4 different definitions of spam.

We start, of course, with Al’s definition of spam. Al seems to define spam as “mail that comes to my spamtrap addresses which was unasked for”. We can generalize this to “unsolicited bulk email”. Under this definition, mail sent to the spamtrap address is, by definition, unsolicited and bulk (since the address is not in use for 1-to-1 email) and is counted as spam with some minimal processing to remove backscatter.

Matthew Sullivan, on the other hand, defines spam as “anything that is considered spam under the [Australian] Spam Act 2003 is spam” (given that Matthew is not an attorney, barrister, or solicitor, we should hasten to add “in his opinion”). We can generalize this to “mail which violates some legal standard”. This can quickly become sticky as a blocking mechanism open for use across countries. What violates the legal standard of the Spam Act of 2003 does not necessarily also violate the legal standard of the CAN-SPAM Act here in the United States.

Then we have another definition of spam stated by Laurence F. Sheldon, Jr., as all mail which comes from “a source most blacklist users would identify as spam-source under the ‘Boulder Pledge’ or a similar notion.” We’ll call that the Justice Potter Stewart “I know it when I see it” standard. Please note that Mr. Sheldon also states elsewhere that spam is “unsolicited bulk email” so we need to combine these two into “the group knows it when the group sees it”. This has the benefit of possibly allowing for some form of “bulk” to be recognized, but has the drawback of one person within the group making a claim which is then recognized as authoritative by the entire group even with no one else’s concurring experience. And then there is the problem of defining who is in “the group”.

Finally, we see this (inverse) definition of spam given by Chris Lewis in response to this quote: “It’s mail that he signed up for, making it solicited and thereby not spam.” Chris says: “Which is not representative of the email that users want.” So, we can call this definition of spam “mail my users (as a group) do not want.” For the end user, this transforms into “mail that I do not want”.

Now, is one of these definitions any better than the others? That’s not really for me to say, although I cut my teeth with “unsolicited, bulk email”. But, as a professional in the space, it’s important that we take time to understand what people are talking about when they use certain terms. When someone clicks a “This is spam” button, are they saying that this is “unsolicited, bulk email” or are they saying that this is “mail which I do not want (anymore)”? For me, this is a critical consideration. If it means “unsolicited, bulk email” then I have a client with a real problem on their hands. If it means “mail which I do not want” then all that’s called for is unsubscribing the user.

This fundamental problem of using different dictionaries means that we will never find a solution to “the spam problem” as long as we can’t decide on what “spam” really is. It’s a problem that my friend Laura once called “moving goalposts”. It’s an apt description, but I think that we may need to change that to “multiplying goalposts.” I’ve pointed out above that in a single thread in a single newsgroup we can identify at least four different, contemporary definitions for the same word. The goalposts haven’t really moved at all. It’s just that there is now a new set out there in addition to the old ones.

We can also look at the definition of “false positive” in the same thread. A false positive is a medical term generally defined as “A result that is erroneously positive when a situation is normal.”

There are two definitions for false positive given in this post: “That which is listed, but doesn’t meet the list’s criteria” and “mail that I wanted which got blocked”. Al Iverson’s DNSBL stats site defines “false positive” for his usage as “false positive would be something I likely did sign up for and then forgot about”. Huey Callison, on the other hand, gives “a nonspam mail blocked by a spam filter” as the definition.

Again, we have four contemporaneous definitions within the same thread. They’re all in use. So, what, exactly, is a “false positive.” It’s going to depend on who you are talking to and perhaps the context of the discussion.

Ultimately, if the “spam problem” is going to get fixed, we’re going to have decide on a single set of definitions (goalposts) for whatever it is that we are talking about.

MickC @ November 8, 2007

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>