Ever since returning from Superweek in beautiful Galyateto, Hungary, I’ve been thinking a lot about data and the utility of Google Analytics as a tool. Yes, I know, I spend a lot of time thinking about those things, but the conference was particularly inspiring in those regards. Google Analytics is not different than any other digital analytics tool insomuch as it is critical to understand what the values that get reported actually mean and how they get there in the first place. But that’s not enough. When we analyze data, we need it to be presented in a meaningful way. Data visualization is tremendously important in this regards, and I believe that one of the reasons why Google Analytics has such great adoption and market penetration (besides the enticing $0.00 entry price point) is because the UI is crisp, FAST, and easy to use.
One catalyst for this post is a response to this post entitled “Are You Being Misled by Google Analytics?” While I am about to critique the post, I do want to point out that one of the ideas that Tien Nguyen has (who Chris mentions in his article as the source of this idea) is indeed insightful. Namely, that without configuration Google Analytics may not provide as much visibility into traffic sources that one needs. While I urge you to take a look at the article, I’ll briefly summarize the main idea here.
Currently, when traffic is not tagged with campaign tracking parameters Google Analytics by default sets its campaign cookie according to the document referrer. In other words, GA looks at the source of traffic (which website the user was on before they clicked to your site) and then uses a set of rules to determine how to classify the traffic. If the source of traffic is one of GA’s predefined search engines, then the traffic will be listed as Organic. If the traffic is not from a predefined search engine, it will be listed as a referral. When a user arrives on your site via a URL with campaign tracking parameters appended, the source of traffic will be reported in accordance to the tracking parameters.
I will start with the case from one of my own clients, and then move on to Chris’s post. One of my clients is a retailer who advertised heavily on Google, but decided that they should also try advertising on Bing Ads. As most people know, Microsoft and Yahoo created what they call the “Search Alliance” through which Microsoft adCenter (now Bing Ads) powers the ad displays on Microsoft, Yahoo, and all of their partner websites.
My client was interested in knowing which traffic was coming from Yahoo and which traffic was coming from Bing. It was easy enough to get this data, all we needed to do was create a filter:
The impact of making this change gives us a Traffic Sources report that looks like this:
The thing is, who cares? Seriously, I did my very best to explain to my client that this simply doesn’t matter. I was told that the “type of person” who searches on Yahoo may be very different than the type of searches on Bing. Ok… Let’s accept that assumption for now. So why does it not matter? Well, there is no way to change ad targeting in Bing Ads based upon ad network. In other words there is no way to optimize the traffic that one is getting based upon this information. As a result, instead of providing insightful data we are creating fragmented data.
The blog post that I referenced above goes to length to explain what a CSE Co-brand is and why there is an “issue” with normal link tracking vs. their solution for “advanced” link tracking. Their suggestion for understanding the true source of Pricegrabber traffic:
AHHHH!!! So many rows.
Reporting [utm_source], [real referrer] in GA. Geeky, yes. Cool, yes (especially for us geeks).
But WHY? Optimization as a relates to Pricegrabber or any other comparison shopping engine is vis-à-vis, for the most part, product suppression. That means to say, if certain products are getting lots of clicks but they are not converting then lower their bids or remove them completely from your feed. Knowing the true referral source for this Pricegrabber traffic is not actionable (an important word that us web analysts like to throw around a lot).
Much more useful advice would have been to suggest concatenating the product category to utm_campaign and the product SKU to utm_content. Usable data… novel…
Using New Profiles
The real kicker for me, however, was that the author left out a critical piece of advice. Namely, that anybody who is interested in using this “clever” filter should ONLY do so in a new profile. Profiles cannot be cleaned up retroactively. Once poor data is in there, is there to stay. Since many readers out there may not know about the best practice of using test profiles, unfiltered profiles, new profiles for these sorts of changes, etc., I feel that is borderline negligent to leave out that critical advice.
Where are my ecommerce sales?
Another important “detail” that is worth mentioning is that the application of the filter like this will yield some very “wonky” data. In the image above we see that traffic specifically from Bing and Yahoo isn’t registering any sales. Why would that be? Let’s start by applying a Visits with Transactions advanced segment.
Where did these transactions go??? Ah, here they are, under the adcenter tagged traffic.
But wait another darn minute!! Something doesn’t look right here.
I’ll give you a moment to figure it out…
Answer: Filters function on the “hit” level.
Filters allow for the processing of data that is pumped into Google Analytics before it goes into profiles. Data gets processed in Google Analytics on the “hit” level. A “hit” is any time that there is a request to the __utm.gif file. The parameters that are appended to the file location contain the data that GA uses to build all reports.
In the example below, I did a search for [analytics ninja] in Google. As you can see, I was signed in which is why the keyword reported is (not provided).
When I visit another page on the same site, you will notice that the UTMR parameter gets set as zero.
The very first hit of the session determines much of the “visit information” about the session. This is why there are visits with all of the different PriceGrabber information neatly shoved in there in the traffic sources reports. The Co-Brand is visible to the “visit” because that data from utmr existed during the first hit of the session. However, on all subsequent hits the utmr parameter did not contain the information will as stipulated in the filter. That “Co-Brand” data (i.e. document.referrer) is not longer available to GA to process. Any of the e-commerce hits are perforce not tracked back to the way the visit is being displayed in Google Analytics and therefore are coming up as ghost sessions.
Bottom line: Please be very careful folks with all of the snazzy Google Analytics advice out there. You might wind up with a bunch of fragmented data that is totally wonky, not actionable, and probably putting a permanent stain on your GA data since you didn’t create a new profile.
Tien V Nguyen says
Love this detailed post! I replied on our blog to the points you made, I’ll post them here for your readers too:
Here are a few points I’d like to address:
-” Knowing the true referral source for this Pricegrabber traffic is not actionable ”
In some cases this may be true, but there are instances where knowing the “true” source can be extremely important. For instance if the traffic we’re getting is from say an international domain, e.g. South America, England, India, etc.. then we know that that traffic is completely useless to the merchant since they don’t ship there.
Or we’ll observe that 40% of traffic is from a source that doesn’t convert, or is from a “questionable” site, the CSE that sends that traffic can shut off those sources. So we’ll tell the CSE, “we noticed that we’re getting traffic from a cobrand based in South Africa, can you shut it off” and no longer will the merchant be paying for clicks that are completely useless to them.
You’re 100% right that product suppression (as well as brand or category suppression) is a major factor in optimization, but the actual source, if it’s of very low quality can be just as important.
“Namely, that anybody who is interested in using this “clever” filter should ONLY do so in a new profile.”
Another good point. I think the screenshots taken were a bit dated and we had one minor update that we do in-house, and that’s instead of using the “output to -> constructor” set to Campaign Source, we’ll use “User Defined” so that it doesn’t interfere with any data moving forward.
You’re right in that if we did set it to campaign source, that’ll really screw up with how the old/new data work with each other, but if the output goes to a field in analytics that is not being populated by any data, using the current profile shouldn’t mess anything up.
Yehoshua Coren says
Thank you very much for taking the time to comment on this post. There are a number of important things that you bring up, and I’ll go ahead and make an update to my post noting them as well.
Re: Being able to tell CSEs to turn off certain co-brands.
This indeed is actionable! I was unaware that CSEs honored such requests (it certainly isn’t in their user interfaces). The point you raise changes everything. Now we can certainly make decisions based upon properly configured GA data.
Re: User Defined Value
This is key. I chose not to mention this in my post, because I was unaware that CSEs will manually turn off co-brands upon request. So pushing data to user defined didn’t make much of a difference. BUT, now that knowing the true traffic source can impact our profitability, using user defined is very important.
As you can see from my blog post, the referral source is filtered on the hit level; the first hit is the time when utmr sends the value we need. As such, sending referral values to user defined for the first hit will indeed get matched to the visit, and we won’t get those ghost sessions that I mentioned in my post. In other words, it won’t “interfere” with data as you mentioned. (Please feel free to confirm this based upon your own data).
You referred to passing values to user defined as a “minor update.” I respectfully disagree with you and would like to suggest that this is a “major update.” 🙂 Without doing this, we wouldn’t have the conversion data necessary to tell the CSEs to turn off one of the co-brands.
Rob Kingston says
Great post, Yehoshua (I only just managed time to sit down and read it all).
I find most of the filter hacks people share can be teased out with a handful of broad filters and then analysed down the track using advanced segments or the API to drill into the data. The only filters I regularly use are:
– Full referrer filter
– Internal address filter
– Include domain name filter (or exclude development environment filter(s))
– Excluding hits on a cross domain proxy. e.g. http://www.citricle.com/blog/how-to-integrate-netsuite-with-google-analytics/
Sure there’s some funky stuff you can do with filters, but for the most part, they don’t offer a lot of value.