Search Engines vs. Spammers in China
The following article appeared in Nanfang Weekend on July 14, 2005.
There is a vast amount of economic interests behind the search engines, since being found by a search engine is an important method of increasing the number of hits to a website. Search engines are also important indicators in the "economy of attention" on the Internet. The conflict between the search engines and the spammers is not just a technical battle because it is actually a serious problem about product valuations.
You go to Google and you enter "Elder Sister Hibiscus" (芙蓉姐姐). You select one of the top ranked pages for "Elder Sister Hibiscus" and then one of two things will happen. First, within seconds, there would pop out a page such as "Asia's largest online broadband movie site." Second, there is a page with many listings of "Elder Sister Hibiscus." You select one of them and it could be page just like the first kind -- the content is totally unrelated to "Elder Sister Hibiscus" and all you see are semi-nude women or naked female breasts.
That was a sampler of what the spammers have done.
Without exception, whenever a term becomes 'hot' on the Internet, it will be used by the spammers to cheat the search engines. Right now, it is the moment for "Elder Sister Hibiscus."
In the realm of the Internet search engines, spammers have become a deluge.
According to a Google search on July 12, there are 1,030,000 pages related to the keyword "Elder Sister Hibiscus." Of the top 40 pages, 15 (or 37.5%) were actually about her and 25 (or 62.5%) were unrelated to her.
The top ranked page is Sina.com. The second-ranked page Mblogger and the third-ranked page dyo.zj.com are unkonwns. Of the top 10 ranked pages, apart from Sina.com, all the others are blogs that came through BSPs (Blog Service Providers). When these pages are examined, there are plenty of links that say "Elder Sister Hibiscus" but they turn out to lead to sexy movie downloads or mobile telephone ring tone downloads.
At Baidu, there are 1,340,000 pages that are found with "Elder Sister Hibiscus" as the keyword. Among the top 40, the principal pages are Neteasy, TOM, eladies.sina.com, QQ and other portals. In Google, these pages were beaten out by the blog spammers.
According to an anonymous anti-spam expert, Baidu's results do not mean that they favor news-related web sites. Rather, Baidu has automatically deleted the spamming pages in the Google results, and therefore those news-related web sites were able to rise up in the ranking.
What is really cute was that when Baidu suppressed those spammers, they were attacked by those spammers. If you enter the keywords "Anti-Baidu Alliance" into Baidu, you will find numerous mentions of the alliance. Since the anti-Baidu alliance website came online on June 1, more than 2,000 people have signed up. When the website first came online, there was plenty of attention, but it seems to be quiet now.
The Anti-Baidu alliance was founded by a person whose website was suppressed by Baidu at the end of last year. In the declaration for the alliance, the organization was said to be founded to oppose the injustice of Baidu towards broad masses of webmasters and netizens, and the website was established to "collect evidence of how unfair Baidu has been towards webmasters and netizens in order to make sure that Baidu move towards fairness." But so far, there has been no indication just how much evidence this alliance has collected against Baidu.
Baidu's attitude towards the alliance can be said to be 'tolerant.' One can use Baidu to obtain all the information about the alliance. Allegedly, Baidu took this alliance seriously and its conclusion based upon the composition of the members of the alliance was that they were without fail websites which have violated Baidu's anti-spam website policies.
Industry experts believe that the formation of the alliance was an extraordinary event, like the "Alliance For The Right To Enjoy Pirated/Imitated Products", "The Alliance For Spam Mailers" or other socially objectionable groups.
At the moment, there is a deluge of search engine spammers. They have developed like the spam mailers of yesteryears. Like locusts, they endanger the search engines and even the Internet as a whole.
According to Baidu general manager Yu Jun, the struggle between search engines and spammers has been a long process. The chess game between the two sides is very much like the global anti-viral war in terms of techniques and styles.
From one angle, according to Yu Jun, the search engine spammers serve the function to make the search engines become better.
In the early stage of search engine development, the search engine professionals were thinking about how to let the searchers find the most valuable and relevant search results as quickly as possible. Thus, they invented the meta-tag -- this meta-language served to describe the unique characteristics of the web page or web site.
At first, the meta-tag was very useful. At the time, honest webmasters would accurately describe the website and listed the most relevant keywords. The search engines can then grab and quote these meta-tags as an important step in ranking search results. In order to improve the search results, there came web specialists who studied the art of search engine optimization (SEO) to improve websites in order to be ranked higher by the search engines.
Then people increasingly discover that the ranking in the search engine results represents an immense space for gaining benefits -- whichever website or web page can show up at the top of the search results stands to gain public exposure and therefore monetary benefits. Many websites "woke up" and began to find ways to get more page views and hits, since these are the important indicators in the "economy of attention."
By this stage, the SEO professionals which once served to improve search techniques have metamorphosed into "SEO spammers."
The initial counterattack by the search engines was to depend more on the contents of the web page and less on the meta-tags. But the spammers responded by using many keywords on the page which are in the same color as the background page and therefore invisible to the human eye. In addition, they also added keywords into the descriptions for the photographs.
The second round of counterattack by the search engines was to find ways to filter out all those invisible keywords.
When a search engine spider reaches a web site, it announces itself and then checks any website-specific restrictions and follows those procedures. So the search engine optimizers have gone ahead to establish two websites -- one for the regular users and another one specifically for the search engines. "To each its own." This method is known as "cloaking."
Outside China, SEO spamming was popular already in the last century. A 1998 study of the hot search engine keyword "Monica Lewinksky" showed that 40% of the search engine results were for spammers.
This forced the search engines to go outside of the page itself to look for ways of ranking pages fairly that the spammers cannot tamper with. Thus, there came Pagerank. The idea is to evaluate a page based upon the links from other pages, while considering the importance of those linking pages. For example, links from .gov and .edu are rated as more important.
But the spammers went ahead and constructed a large number of websites that link to each other. If a customer is willing to pay, all these websites will provide links. This SEO technique is known as 'link farm.' Conversely, the search engines would blacklist any 'link farm' if they ever find out.
According to a webmaster, there is a huge market for SEO spammers in China. One estimate is that there are several hundreds of thousands of people who make a living out of SEO spamming with link farms that total several million websites. There are many SEO websites that 'teach' people how to cheat and to exchange ideas and experiences.
These SEO spammers earn their living by one of two ways. First, they earn advertising revenues through the traffic that was guided there by the search engines. Second, they sell the highly ranked keywords that they obtained by fooling the search engines.
Website traffic is usually stated in terms of the number of unique visitors. According to one webmaster, his 5,000 unique IP visits per day earns him a monthly income of 1,300 yuan from his three broadband movie website ads. A certain website that got 2,000,000 page views from Baidu can make as much as 40,000 yuan per day.
The destructiveness of SEO spamming is tremendous. It directly ruins the search experience of the hundreds of millions of searchers made by Internet users every day, and that is the key by which search engines can make a living. At the same time, it also ruins the principal source of income for the search engines -- its rankings and associated advertisements.
According to an audit, spamming web pages account for 10% of all web pages right now. Among the hottest keywords at any time, the percentage of spamming web pages among the top 50 results can be as high as 80%.
In China, Google has also acted as the gentleman, but there are times when even Google got mad. ON 4am, March 26, 2005, Google suddenly blacklisted a whole bunch of spamming websites. After that, Google results became a lot cleaner.
Less than 4 months later, the spamming websites have rushed back like the tide. According to inside sources at Google, the company has always tried to stop spam. But building a complete anti-spam system is a long time-consuming road. In China, anti-spam experts are a rarity. When our reporter spoke to Baidu, we were asked not to reveal the name of their anti-spam expert "because this is one of the most valuable talents at Baidu."
In the long war between the search engines and the spammers, the former was always in a defensive posture. This is not just because there are only a few people at the search engines thinking about the search results and page rankings, whereas there are hundreds of thousands of others thinking about how to cheat the search engines all the time. The more important fact is that it is easy to create dozens or even hundreds of spamming websites with just a few mouse clicks, whereas it takes the search engines a lot more time to identify and eliminate spamming websites. For the search engines, large amounts of investments in human resources and capital are necessary to fight spamming.
At this moment, the spamming websites have set their eyes on the blogs. The bloggers in China have been plagued by spammers who come down on them like locusts, to the point where their websites are congested and inaccessible.
This struggle is like that the war between the virus and anti-virus sides on the Internet. The difference is that it probably cost more effort to create a virus as well as greater technical skills. But spamming websites have lower entry barriers and they are easy to do.
Therefore, the search engines must treat the spamming websites with the same serious stance -- permanent deletion if discovered.
But during any search, there will always be spamming websites since they continue to be born every second. Due to the vast differential between the revenues and the costs, SEO spammers will continue to rush ahead. As of now, the government and other organizations have not legislated against these activities, but we have to believe that someday they will.
Follow-up post: Search Engines vs. Search Engines in China