Saturday, July 28, 2007

How good is Google Analytics?

“In God we trust, everyone else must bring data.”
I read this interesting quote some time back and have tried to follow that in my professional life i.e. make decisions based on hard data. But the question I’m going to explore in this post (and the next one) is can we trust the data that Google Analytics shows us.

Before you turn back thinking that this is just one more of those Google bashing blogs, I must quickly point out that most of the issues discussed here are applicable to any analytics tool that uses similar technology. And yeah, there will be a point when I turn my attention to a possible “Google is evil” theory; but I’ll give you some advanced warning before we reach that point.

It has been a while that I’m noticing a lot of discrepancies between the GA’s data and the internal tracking data (server logs) that we maintain for our site. I turned to my linkedin network to understand if anyone else has noticed such issues and got a few responses that confirmed the problem with GA.

Google Analytics gives us the following important (albeit inaccurate) pieces of information
• Site Usage (number of hits, page views, bounce rate etc)
• Visitor overview (new vs. returning, visitor location etc)
• Traffic sources (search engines, referring sites)
• Content Performance (top content, top landing/exit pages)
• Adwords integration

Let’s look at these points one by one.

Site Usage: It has been observed by many people that GA under reports the traffic of your site; although it’s not clear to what extent, but I’ve read about cases where GA reports only 40% of the actual traffic.

One of the responses I got in linkedin was from David Kutcher, President of Confluent Forms, who very nicely explained the issue with using JavaScript (as GA does)

“Google Analytics is good, but it will never replace server log analytics, for the main reason that it relies on Javascript (the Urchin code).
Javascript's main drawbacks for this are:
1. it can be cached by a proxy server
2. it can be disabled on the client
These two points alone can wildly skew your results….”


Combine David’s explanation with the fact that our server logs typically over-report the traffic because most of the times they fail to ignore the spiders or bots (which GA very appropriately excludes from its report), and you can understand why the huge discrepancies are there.

Perhaps we could have lived with this serious issue if we knew (approximately) that GA under-reports the traffic by a certain percentage; but as Michael Martinez points out in this article (again thanks to David for the link) he has noticed GA diverging from the server data “in a consistently downward spiral” over few months. So what that means is GA is not only inaccurate in its traffic reports it’s, to make matters worse, inconsistent.

We’ll look at the other GA reports in next posting.

No comments:

Post a Comment