Sunday, December 11, 2005
An ounce of prevention
So how are these spammers getting your e-mail address? There are many different correct answers to that question. Some buy lists. Some collect e-mail addresses via worms, viruses, or spyware. You may have given your e-mail address to an unscrupulous site that collects registration information. There is however one method that seems to be more common than others - web spiders.
Software programs run by spammers traverse the web reading page after page by following link after link in a similar process to the way google would index a web page for searching. However, instead of indexing your page, this program is only looking for e-mail addresses. When it finds an e-mail address, it logs it in a database.
Is there anything we, as web developers, can do to truly prevent spammers from getting our users e-mail addresses? Nope. Is there anything we can do to slow the spammers down a little? Maybe.
Obviously, we can ask our users to please refrain from posting their e-mail addresses to websites but that's not sufficient. First off, we're likely to be ignored. Secondly, there are times in which you need e-mail addresses to appear on a website. For an example, look at virtually any website's "Contact Us" page.
So, what can we try? First, we have to realize that a spider is not a person or a web browser. It doesn't "see" an e-mail address on a page the way our eyes do. An example of where this could be important is images. An e-mail address inside of a .gif or .jpg probably won't be picked up. How does a spider recognize and record an e-mail address? It surely varies from spider to spider but we can probably make a few assumptions.
1) The e-mail address has to appear in the source of the web page.
2) The spider probably looks for 1 or more characters followed by an @ followed by one or more characters followed by a . followed by "com", "net", or "org".
Let's look at the HTML required to generate an e-mail link:
It's pretty easy to see how a piece of software could grab the e-mail address from this sample. Anybody who's taken an intro level programming class could write the code to do it.
Let's compare it to the following:
sd = ".";
sp1 = "body";
sp2 = "some";
sa = "@";
ss1 = "where";
st1 = "com";
st2 = "org";
se = sp2 + sp1 + sa + sp2 + ss1 + sd + st1;
document.write("<a href=\"mailto:" + se + "\">" + se + "</a>");