Google Analytics Referrer Spam

November 29, 2017

#Bot Traffic #Ghost Spam #Google Analytics #Referrer Spam

“The technique involves making repeated web site requests using a fake referrer URL to the site the spammer wishes to advertise. Sites that publish their access logs, including referrer statistics, will then inadvertently link back to the spammer’s site. These links will be indexed by search engines as they crawl the access logs, improving the spammer’s search engine ranking. Except for polluting their statistics, the technique does not harm the affected sites. At least since 2014, a new variation of this form of spam occurs on Google Analytics. Spammers send fake visits to Google Analytics, often without ever accessing the affected site. The technique is used to have the spammers’ URLs appear in the site statistics, inducing the site owner to visit the spam URLs. When the spammer never visited the affected site, the fake visits are also called Ghost Spam.”

Referrer Spam – Wikipedia

One of our clients is asking us about strange results in the Google Analytics report we send her every week. There are lots of referrals which seem to make no sense. Can we sort it out for her?

It’s a low traffic site and it turns out that something like 50% of apparent visits are actually ghost spam – fake traffic which is visible only in Google Analytics and which is created not by actually visiting the site but by sending data directly to the account via the measurement protocol. Because these are not real site visits there is no point in creating a server rule to block this traffic (e.g. in .htaccess). If we quickly create a report including all referrals and excluding all valid hostnames then we end up seeing this kind of thing:

Ghost Spam
Ghost Spam

It’s fake referrer URLs typically designed to encourage us to visit those sites. Sometimes the domains have been created to look like well known internet sites by using homographs – similar looking unicode characters. Other times the domain will be real. Sometimes messages are included in the data. Motherboard has an article about this issue which relates to their own domain being used in referrer spam: We’re not spamming your Google Analytics – Motherboard.

On sites where traffic matters this can be an issue as it negatively affects the statistics which clients and advertisers care about (though not, typically rankings). The obvious discrepancy here which we can see from the report is that the hostnames are not ours – our website is not serving these apparent hits. So that’s easy enough to filter from the client report. We start by creating a duplicate view and then create a filter to include only validated hostnames – expressed as a simple regex – e.g. www\.example\.com|example\.com

Valid Hosts Only
Valid Hosts Only

While we are it – we can go into View Settings and make sure that that “Exclude all hits from known bots and spiders” is checked. And have a look at Campaign Source.

Show All Posts