Ever since returning from Superweek in beautiful Galyateto, Hungary, I’ve been thinking a lot about data and the utility of Google Analytics as a tool.  Yes, I know, I spend a lot of time thinking about those things, but the conference was particularly inspiring in those regards.  Google Analytics is not different than any other digital analytics tool insomuch as it is critical to understand what the values that get reported actually mean and how they get there in the first place.  But that’s not enough.  When we analyze data, we need it to be presented in a meaningful way.  Data visualization is tremendously important in this regards, and I believe that one of the reasons why Google Analytics has such great adoption and market penetration (besides the enticing $0.00 entry price point) is because the UI is crisp, FAST, and easy to use.

One catalyst for this post is a response to this post entitled “Are You Being Misled by Google Analytics?”  While I am about to critique the post, I do want to point out that one of the ideas that Tien Nguyen has (who Chris mentions in his article as the source of this idea)  is indeed insightful.  Namely, that without configuration Google Analytics may not provide as much visibility into traffic sources that one needs.   While I urge you to take a look at the article, I’ll briefly summarize the main idea here.

Currently, when traffic is not tagged with campaign tracking parameters Google Analytics by default sets its campaign cookie according to the document referrer.  In other words, GA looks at the source of traffic (which website the user was on before they clicked to your site) and then uses a set of rules to determine how to classify the traffic.  If the source of traffic is one of GA’s predefined search engines, then the traffic will be listed as Organic.  If the traffic is not from a predefined search engine, it will be listed as a referral.  When a user arrives on your site via a URL with campaign tracking parameters appended, the source of traffic will be reported in accordance to the tracking parameters.

I will start with the case from one of my own clients, and then move on to Chris’s post. One of my clients is a retailer who advertised heavily on Google, but decided that they should also try advertising on Bing Ads. As most people know, Microsoft and Yahoo created what they call the “Search Alliance” through which Microsoft adCenter (now Bing Ads) powers the ad displays on Microsoft, Yahoo, and all of their partner websites.

Regular campaign tagging  

My client was interested in knowing which traffic was coming from Yahoo and which traffic was coming from Bing. It was easy enough to get this data, all we needed to do was create a filter:

referral from yahoo  

The impact of making this change gives us a Traffic Sources report that looks like this:

partner networks  

The thing is, who cares?  Seriously, I did my very best to explain to my client that this simply doesn’t matter. I was told that the “type of person” who searches on Yahoo may be very different than the type of searches on Bing.  Ok…  Let’s accept that assumption for now.   So why does it not matter? Well, there is no way to change ad targeting in Bing Ads based upon ad network. In other words there is no way to optimize the traffic that one is getting based upon this information. As a result, instead of providing insightful data we are creating fragmented data.

The blog post that I referenced above goes to length to explain what a CSE Co-brand is and why there is an “issue” with normal link tracking vs. their solution for “advanced” link tracking. Their suggestion for understanding the true source of Pricegrabber traffic:


AHHHH!!!  So many rows.

Reporting [utm_source], [real referrer] in GA.  Geeky, yes. Cool, yes (especially for us geeks).

But WHY? Optimization as a relates to Pricegrabber or any other comparison shopping engine is vis-à-vis, for the most part, product suppression. That means to say, if certain products are getting lots of clicks but they are not converting then lower their bids or remove them completely from your feed. Knowing the true referral source for this Pricegrabber traffic is not actionable (an important word that us web analysts like to throw around a lot).

Much more useful advice would have been to suggest concatenating the product category to utm_campaign and the product SKU to utm_content.  Usable data… novel…  

Using New Profiles

The real kicker for me, however, was that the author left out a critical piece of advice. Namely, that anybody who is interested in using this “clever” filter should ONLY do so in a new profile. Profiles cannot be cleaned up retroactively. Once poor data is in there, is there to stay. Since many readers out there may not know about the best practice of using test profiles, unfiltered profiles, new profiles for these sorts of changes, etc., I feel that is borderline negligent to leave out that critical advice.  

Where are my ecommerce sales?

ecommerce data  

Another important “detail” that is worth mentioning is that the application of the filter like this will yield some very “wonky” data. In the image above we see that traffic specifically from Bing and Yahoo isn’t registering any sales. Why would that be?  Let’s start by applying a Visits with Transactions advanced segment.

visits with transactions    

Where did these transactions go???  Ah, here they are, under the adcenter tagged traffic.

  hidden transactions  

But wait another darn minute!!  Something doesn’t look right here.

I’ll give you a moment to figure it out…  

Answer:  Filters function on the “hit” level.

Filters allow for the processing of data that is pumped into Google Analytics before it goes into profiles. Data gets processed in Google Analytics on the “hit” level.  A “hit” is any time that there is a request to the __utm.gif file.  The parameters that are appended to the file location contain the data that GA uses to build all reports. In the example below, I did a search for [analytics ninja] in Google. As you can see, I was signed in which is why the keyword reported is (not provided).   landing page
When I visit another page on the same site, you will notice that the UTMR parameter gets set as zero.

  2nd page viewed  

The very first hit of the session determines much of the “visit information” about the session.  This is why there are visits with all of the different PriceGrabber information neatly shoved in there in the traffic sources reports.  The Co-Brand is visible to the “visit” because that data from utmr existed during the first hit of the session. However, on all subsequent hits the utmr parameter did not contain the information will as stipulated in the filter.  That “Co-Brand” data (i.e. document.referrer) is not longer available to GA to process.  Any of the e-commerce hits are perforce not tracked back to the way the visit is being displayed in Google Analytics and therefore are coming up as ghost sessions.  

Bottom line: Please be very careful folks with all of the snazzy Google Analytics advice out there.  You might wind up with a bunch of fragmented data that is totally wonky, not actionable, and probably putting a permanent stain on your GA data since you didn’t create a new profile.