Adapted from an article in http://www.wikipedia.org
Spam by e-mail is one type of spamming that involves sending identical or nearly identical messages to thousands (or millions) of recipients. Addresses of recipients are often harvested from Usenet postings or web pages, obtained from databases, or simply guessed by using common names and domains. By definition, spam is sent without the permission of the recipients.
The terms unsolicited commercial email (UCE) and unsolicited bulk email (UBE) are sometimes used as more precise or less slang-like expressions for spam. Most legislative efforts against spam are tailored to address UCE. A small but noticeable proportion of unsolicited bulk email is not, in fact, also commercial; examples include political advocacy spam and chain letters.
Overview
Sending spam is a violation of the Acceptable Use Policy (AUP) of almost all ISPs, and can lead to the termination of the sender’s account. In many jurisdictions, spamming is a crime or an actionable tort, such as in the United States, where the act is regulated by the Can Spam Act of 2003. In Singapore spamming can be prosecuted under the Computer Misuse Act.
Spammers engage in deliberate fraud to send out their messages. Spammers frequently use false names, addresses, phone numbers, and other contact information to set up “disposable” accounts at various Internet service providers. They also often use falsified or stolen credit card numbers to pay for these accounts. This allows them to quickly move from one account to the next as each one is discovered and shut down by the host ISPs.
Spammers go to great lengths to conceal the origin of their messages. They do this by spoofing email addresses (similar to Internet protocol spoofing). The spammer hacks the email message so it looks like it is coming from another email address. Some ISPs and domains require the use of SMTP authentication allowing the specific account from which an email originates to be positively identified.
It is not possible to completely spoof an email since the actual connection from the last mailserver’s IP address is recorded by your own mailserver; however, the rest of the history of the mailservers the E-mail was sent through can be forged by spammers. But tracing an email messages route is usually fruitless since many ISPs have thousands of customers and identifying just one spammer is tedious.
Spammers frequently seek out and make use of vulnerable third-party systems such as open mail relays and open proxy servers. The SMTP system, used to send email across the Internet, forwards mail from one server to another; mail servers that ISPs run commonly require some form of authentication that the user is a customer of that ISP. Open relays, however, do not properly check who is using the mail server and pass all mail to the destination address, making it quite a bit harder to track down spammers.
Spoofing can have serious consequences for legitimate email users. Not only can their email inboxes get clogged up with “undeliverable” emails in addition to volumes of spam, they can mistakenly be identified as a spammer. Not only may they receive irate email from spam victims, but (if spam victims report the email address owner to the ISP, for example) their ISP may terminate their service for spamming.
Gathering of addresses
In order to send spam, spammers need to obtain the email addresses of the intended recipients. Toward this end, both spammers themselves and list merchants gather huge lists of potential email addresses. Since spam is, by definition, unsolicited, this address harvesting is done without the consent (and frequently against the expressed will) of the address owners. As a consequence, spammers’ address lists are of remarkably poor accuracy. A single spam run may target tens of millions of possible addresses — many of which are invalid, malformed, or undeliverable.
Spam differs from legitimate direct marketing in many ways, one of them being that it costs no more to send to a larger number of recipients than a smaller number. For this reason, there is little pressure upon spammers to limit the number of addresses targeted in a spam run, or to restrict it to persons likely to be interested. One consequence of this fact is that many people receive spam written in languages they cannot read — a good deal of spam sent to English-speaking recipients is in Chinese or Korean, for instance. Likewise, lists of addresses sold for use in spam frequently contain mailformed addresses, duplicate addresses, and addresses of role accounts such as postmaster.
Email addresses may be harvested from a number of sources. A popular method has been to use email addresses which their owners have published for other purposes. Usenet posts, especially those in archives such as Google Groups, are a frequent target. Simply searching the Web for pages with addresses — such as corporate staff directories — can yield thousands of addresses, most of them deliverable. Spammers have also subscribed to discussion mailing lists for the purpose of gathering the addresses of posters. The DNS and WHOIS systems require the publication of technical contact information for all Internet domains; spammers have illegally trawled these resources for email addresses.
Because spammers offload the bulk of their costs onto others, however, they can use even more computationally expensive means to generate addresses. A dictionary attack is an exhaustive attempt to gain access to a resource by trying all possible credentials — usually, usernames and passwords. Spammers have applied this principle to guessing email addresses — as by taking common names and generating likely email addresses for them at each of thousands of domain names.
A recent, controversial tactic is called “e-pending” — for the appending of email addresses to direct-marketing databases. Direct marketers normally obtain lists of prospects from sources such as magazine subscriptions and customer lists. By searching the Web and other resources for email addresses corresponding to the names and street addresses in their records, direct marketers can send targeted spam email. However, as with most spammer “targeting”, this is imprecise: Users have reported, for instance, receiving solicitations to mortgage their house at a specific street address — with the address being clearly a business address including mail stop and office number!
Spammers sometimes use various means to confirm addresses as deliverable. For instance, including a Web bug in a spam message written in HTML may cause the recipient’s mail client to transmit the recipient’s address, or any other unique key, to the spammer’s Web site. Likewise, spammers sometimes operate Web pages which purport to remove submitted addresses from spam lists. In several cases, these have been found to subscribe the entered addresses to receive more spam.
Delivering spam messages
Internet users and system administrators have deployed a vast array of techniques to block, filter, or otherwise banish spam from users’ mailboxes. Almost all Internet service providers forbid the use of their services to send spam or to operate spam-support services. Both commercial firms and volunteers run subscriber services dedicated to blocking or filtering spam, such as Brightmail, Postini, and the various DNSBLs. How, then, do spammers still manage to deliver messages which users wish not to receive and network owners wish not to carry?
Using other people’s computers
Early on, spammers discovered that if they sent large quantities of spam directly from their ISP accounts, recipients would complain and ISPs would shut their accounts down. Thus, one of the basic techniques of sending spam has been to send it from someone else’s computer and network connection. By doing this, spammers protect themselves several ways: they hide their tracks, get others’ systems to do most of the work of delivering messages, and direct the efforts of investigators towards the other systems rather than the spammers themselves.
In the 1990s, the most common way spammers did this was to use open mail relays. An open relay is an MTA, or mail server, which is configured to pass along messages sent to it from any location, to any recipient. In the original SMTP mail architecture, this was the default behavior: a user could send mail to practically any mail server, which would pass it along towards the intended recipient’s mail server.
While this cooperative, open approach was useful in ensuring that mail was delivered, it was vulnerable to abuse by spammers — and abused it soon was. Spammers could forward batches of spam through open relays, leaving the job of delivering the messages up to the relays. In response, mail system administrators concerned about spam began to demand that other mail operators configure MTAs to cease being open relays. The first DNSBLs, such as MAPS RBL and the now-defunct ORBS, aimed chiefly at allowing mail sites to refuse mail from known open relays.
Within a few years, open relays became rare and spammers resorted to other tactics. Chief among these was the use of open proxies. A proxy is a network service for making indirect connections to other network services. The client connects to the proxy and instructs it to connect to a server. The server perceives an incoming connection from the proxy, not the original client. Proxies have many purposes, including Web-page caching, protection of privacy, filtering of Web content, and selectively bypassing firewalls. An open proxy is one which will create connections for any client to any server, without authentication. Like open relays, open proxies were once relatively common, as many administrators did not see a need to restrict access to them.
A spammer can direct an open proxy to connect to a mail server, and send spam through it. The mail server logs a connection from the proxy — not the spammer’s own computer. This provides an even greater degree of concealment for the spammer than an open relay, since most relays log the client address in the headers of messages they pass. Open proxies have also been used to conceal the sources of attacks against other services besides mail, such as Web sites or IRC servers.
Besides relays and proxies, spammers have used other insecure services to send spam. One example is the now-infamous FormMail.pl, a CGI script to allow Web-site users to send email feedback from an HTML form. Several versions of this program, and others like it, allowed the user to redirect email to arbitrary addresses. Spam sent through open FormMail scripts is frequently marked by the program’s characteristic opening line: “Below is the result of your feedback form.”
As spam from proxies and other “spammable” resources grew, DNSBL operators started targeting these as well as open relays. Blocklists such as Blitzed Open Proxy Monitor (http://opm.blitzed.org/info) and Composite Blocking List (http://cbl.abuseat.org/) chiefly target open proxies.
In 2003, spam investigators saw a radical change in the way spammers sent spam. Rather than searching the global network for exploitable services such as open relays and proxies, spammers began creating “services” of their own. By commissioning computer viruses designed to deploy proxies and other spam-sending tools, spammers could harness hundreds of thousands of end-user computers. Most of the major Windows email viruses of 2003, including the Sobig and Mimail virus families, were spammer viruses: viruses designed expressly to make infected computers available as spamming tools.
Besides sending spam, spammer viruses serve spammers in other ways. Beginning in July 2003, spammers started using some of these same viruses to perpetrate distributed denial-of-service (DDoS) attacks upon DNSBLs and other anti-spam resources. Although this was by no means the first time that illegal attacks have been used against anti-spam sites, it was perhaps the first wave of effective attacks. In August of that year, engineering company Osirusoft ceased providing DNSBL mirrors of the SPEWS and other blocklists, after several days of unceasing attack from virus-infected hosts. The very next month, DNSBL operator Monkeys.com succumbed to the attacks as well. Other DNSBL operators, such as Spamhaus , have deployed global mirroring and other anti-DDoS methods to resist these attacks.
As of early 2004, virus-infected hosts remain a major source of spam.
Legality
Accessing privately owned computer resources without the owner’s permission is illegal under computer crime statutes in most nations. Deliberate spreading of computer viruses is also illegal in the United States and elsewhere. In Singapore, it is a crime under the Computer Misuse Act. Thus, some of spammers’ most common behaviors are criminal quite independently of the legal status of spamming per se. Even before the advent of laws specifically banning or regulating spamming, spammers have been successfully prosecuted under computer fraud and abuse laws for wrongfully using others’ computers.
Obfuscating message content
Many spam-filtering techniques work by searching for patterns in the headers or bodies of messages. For instance, a user may decide that all email she receives with the word “Viagra” in the subject line is spam, and instruct her mail program to automatically delete all such messages. To defeat such filters, the spammer may misspell commonly-filtered words, or insert other characters, as in the following examples:
V1agra Via'gra V I A G R A Vaigra /iagra
The principle of this method is to leave the word readable to a human, but not recognizable to a literally-minded computer program. This is effective up to a point. Eventually, filter patterns become generic enough to recognize the word “Viagra” no matter how misspelled — or else they target the obfuscation methods themselves, such as insertion of punctuation into unusual places in a word.
HTML-based email gives the spammer more tools to obfuscate text. Inserting HTML comments between letters can foil some filters, as can including text made invisible by setting the font color to white on a white background, or shrinking the font size to the smallest fine print.
As Bayesian filtering has become popular as a spam-filtering technique, spammers have started using methods to weaken it. To a rough approximation, Bayesian filters rely on word probabilities. If a message contains many words which are only used in spam, and few which are never used in spam, it is likely to be spam. To weaken Bayesian filters, some spammers now include lines of irrelevant, random words alongside the sales pitch. A variant on this tactic may be borrowed from the Usenet abuser known as “Hipcrime” — to include passages from books taken from Project Gutenberg, or nonsense sentences generated with “dissociated press” algorithms.
Avoiding Spam
Several tools have been released, both for end users and systems administrators, which automate spam removal by scanning through all emails in search of traits typical of spam.
Tools for end users range in capabilities from tracing and reporting spam to hiding email addresses from spammers to removing and/or blocking spam. These tools include SpamCop, NoSpam , SpamGuard, and even mail clients, such as the one built in to Mozilla.
Tools for systems administrators allow them to block incoming email from particular spamming IPs, block Usenet spam, block formmail spam, and determine if mail is spam. One of the most popular amongst systems administrators is SpamAssassin. One of the statistically most accurate on the spam corpus is CRM114, which can be integrated into SpamAssassin.
Spamgourmet, quite unknown, but very powerful takes a completely different approach, and offers free disposable e-mail addresses. The project was “created by folks who’ve been driven rabid by spam since 1993 or so” (quote from their FAQ. All the code they’ve written is open source.
Spam-support services
A number of other online activities and business practices are considered by anti-spam activists to be connected to spamming. These are sometimes termed spam-support services: business services, other than the actual sending of spam itself, which permit the spammer to continue operating. Spam-support services can include processing orders for goods advertised in spam, hosting Web sites or DNS records referenced in spam messages, or a number of specific services as follows:
Some Internet hosting firms advertise bulk-friendly or bulletproof hosting. This means that, unlike most ISPs, they will not terminate a customer for spamming. These hosting firms are clients of larger ISPs, and many have eventually been taken offline by these larger ISPs as a result of complaints regarding spam activity. Thus, while a firm may advertise bulletproof hosting, it is ultimately unable to deliver without the connivance of its upstream ISP.
A few companies produce spamware, or software designed for spammers. Spamware varies widely, but may include the ability to import thousands of addresses, to generate random addresses, to insert fraudulent headers into messages, to use dozens or hundreds of mail servers simultaneously, and to make use of open relays. The sale of spamware is illegal in eight U.S. states.
So-called millions CDs are commonly advertised in spam. These are CD-ROMs purportedly containing lists of email addresses, for use in sending spam to these addresses. Such lists are also sold directly online, frequently with the false claim that the owners of the listed addresses have requested (or “opted in”) to be included. Such lists often contain invalid addresses.
A number of DNSBLs, including the MAPS RBL, Spamhaus SBL, and SPEWS, target the providers of spam-support services as well as spammers.
Miscellaneous facts about spam email
Larger ISPs such as America Online report that anywhere from one-third to two-thirds of their email server capacity is consumed by spam. Because this cost is imposed without the consent of either the site owners or the authorized users, many argue that email spamming is a form of theft of services.
In May 2003, it was reported more than half of all emails sent were spam. Steve Linford of the spam-fighting project Spamhaus warned that at current rates of increase, the entire email system could “melt down” within six months.
According to an article by James Gleick in The Observer, 2 March 2003:
- 10 billion spam emails are sent every day;
- 30 billion are expected by 2005;
- 150 spammers send 90% of all email;
- a new email account set up to experiment received spam within 540 seconds;
- 37% of US email is spam; 1 in 12 of UK emails;
- EU businesses spend €10 billion euros each year to deal with spam.
The U.S. Federal Trade Commission estimates that as much as 2/3 of all spam email contains fraudulent offers, forged headers, or other false claims suggestive of criminal activity.
AOL documented an “unscientific” list of the subjects of the spam most widely sent to its members during 2003. In alphabetical order, they are:
- As seen on Oprah
- Get bigger (also: satisfy your partner, improve your sex life)
- Get out of debt (also: special offer)
- Hot XXX action (also: teens, porn)
- Lowest insurance rates (also: lower your insurance now)
- Lowest mortgage rates (also: lower your mortgage rates, refinance, refi)
- Online degree (also: online diploma)
- Online pharmacy (also: online prescriptions, meds online)
- Viagra online (also: Xanax, Valium, Xenical, phentermine, Soma, Celebrex, Valtrex, Zyban, Fioricet, Adipex, etc.)
- Work from home (also: be your own boss)
Current events
As at 11 July 2003, the U.S. Federal Trade Commission (“FTC”) was expected to ask the U.S. Congress for new powers that would let it cooperate closely with other governments and more easily prosecute American and overseas spammers. A 13-page proposal drafted by the FTC to implement legislation entitled the International Consumer Protection Enforcement Act (ICPEA) would render the agency’s investigators “spam cops”, granting them the power to serve secret requests for subscriber information on Internet service providers, peruse FBI criminal databases and swap sensitive information with foreign law enforcement agencies. The proposed legislation is a result of a push by American legislators to enact strong laws targeting the most extreme spammers. Civil libertarians are alarmed at the ICPEA draft bill, on the basis that it does not contain sufficient checks and balances, and would adversely impact the Freedom of Information Act.
On June 29, 2003, The New York Times reported that Ferris Research estimated that for 2003, the cost of spam is $10 billion in the United States. The estimate factors in the waste in computing resources and work time.
On October 22, 2003, the U.S. Senate voted to outlaw spam e-mails and to set up a “do not spam” registry similar to the recently put in effect “do not call” one. Such a registry might actually cause more spam if it gives spammers a list of confirmed “live” addresses, though the final version of the Can Spam Act of 2003, which was sent to the President for his signature on December 8th, prohibits the sale or other transfer of an e-mail address obtained through an opt-out request.
On October 24, 2003, a Santa Clara, California Superior Court judge ordered two spammers to pay $2 million for illegally sending unsolicited e-mails.
On December 11, 2003, new UK legislation was passed making it an offence for UK organisations to send unsolicited e-mails. Many experts have expressed doubts over the effectiveness of the new law given that most spam originates outside the UK and the process to convict a spammer would take up to two years to complete.
On December 12, 2003, the state of Virginia arrested two men on felony spamming charges.
On 2004 a court of law in Denmark fined an company 400.000 DK (€ 54.780) for illegally sending 1.500 unsolicited e-mails.
In 2004, Bill Gates proposes at the World Economic Forum in Davos, Switzerland, to charge the sender instead of the recipient of the mail. Such proposals, called “email postage” or “sender pays”, have been proposed before and have failed on technical and economic grounds.
In March, 2004, with spam email traffic at about 60 percent of all e-mail, America Online Inc. adopted a new anti-spam policy that includes blocking AOL members from access to websites that bulk e-mailers promote.