Spammers use a variety of techniques to harvest email addresses. However the two main techniques are (a) the use of automated spiders and (b) directory harvesting. These are software agents that are known under a variety of names spiders, crawlers, robots and bots. These spiders are the seekers of content on the internet. They form the basis of how search engines, such as Google and Yahoo!, work.
Search engine spiders trawl the internet unceasingly looking for content. Their searches are based on important words known as key words. The engines keep an index of the words they find and the website where they find them. Users of the search engines can then find these sites by keying in the search words. A major search engine will index hundreds of millions of pages, and respond to tens of millions of queries every day.
A spammer collects email addresses in a similar way… by sending an automated spider throughout the internet looking for addresses that are found on web pages or in links used to send emails. The spider sends them back to the person who is compiling the spam list. The spammer’s spider will trawl a variety of websites looking for addresses. email spider include dating sites, chat rooms, message boards, Usenet newsgroups; in fact any type of webpage that might conceivably contain an address.
If you have ever sent your address to anyone on the internet, have inserted it in a form or have you own webpage with your address on it, you can be absolutely sure that your email address has been harvested by numerous spiders working for compilers of spam lists. A directory harvesting attack, aka a dictionary attack, is another common technique for creating lists of addresses. It is used to collect addresses from internet service providers (ISPs), mail services such as Yahoo!, Hotmail and AOL, and large companies with their own mail servers.
Nearly all these addresses will be invalid, in which case the server will respond with an SMTP 550 error message. The harvesting software will ignore these addresses. But every now and then the software will get lucky and the server will respond with a message that an email address is valid. The software will compile all the valid addresses into a list for spamming.