Google Analytics Updates How Visits Are Calculated
In a recent blog post, the Google Analytics team made announced that they are changing the way that visits (sessions) are calculated. Interestingly, they said that, “Based on our research, most users will see less than a 1% change.” Unfortunately (imho), they didn’t cover their bases with that statement as the comment section of the above cited blog post shows that lots of people are going pretty crazy about these changes.
**Update** On August 16th, the Google Analytics Team announced that there was a bug in the way visits were recorded after they launched the change. Now, people should be going less crazy as numbers are making a bit more sense. Nevertheless….
Bottom line, this is really a significant change and it seems that people aren’t understanding what is going on.
The main things that people seem to be complaining about are:
- Increase in visits
- Increase in bounce rate
- Decreased average time on site
- Decreased pages per visit
Surprisingly (maybe), I didn’t see a lot of people complaining about a decrease in conversion rate. Hmm… In any case, one comment that I saw rise above the negative spew in the GA blog comment section was by Peter at L3 Analytics. He linked to his blog post which does a nice job discussing some of the implications of the change to the way sessions are calculated. I decided to add to the discussion with this post. Some of what I’ll be saying has already been formulated by Peter. Other things will hopefullly be new , including a number questions I have based on some data in GA that I am still not understanding based upon my current knowledge of the change.
**Side Note** It upsets that people can get so negative in their comments made in forums and blog posts, especially since most of their complaints stem from a lack of understanding. Simple questions in the comment section such as “I don’t understand why 123xyz is happening….” would be nicer to address than “this data is useless, (sarcastic) thanks alot!!”
Understanding the change.
Google Analytics receives hit level data and then calculates all metrics based upon that hit level data. Every time there is a pageview, event, or transaction, a gif request is sent to the Google Analytics servers with information about that hit. Part of that gif request includes session information, and other parts of that gif request include visitor level information. I’m not going to go into the UTM gif requests in depth here, but if you really want to know what is going on check out the RUGA (Really Understanding Google Analytics) series of posts from Cardinal Path. (Kent – it would be great if you could add inner linking between posts on the blog, it’s a great series).
Here is a graphic I quickly put together. (As you can tell, I’m not much of a graphic designer).
The idea that this is trying to illustrate is a visit (session) is made up of hits. A visitor can visit the site multiple times. When a “visitor” has two or more visits, they change from being counted as a “New Visit” to a “Returning Visitor.”
**Side note: We use the term “visitor,” but technically this means “__utma Cookie.” Cookies are browser specific. So if I, Yehoshua Coren, visit example.com in a 5 minute span from 3 different browsers, GA reports that 3 “unique visitors” came to the site. Similarly, if 3 different people in my household visit example.com at different times throughout the day, this is 1 “unique visitor.” Lastly, if I visit a website repeatedly using Private Browsing (Firefox) or Incognito Mode (Chrome), etc, my cookies are cleared on browser close so I’ll be an additional “unique visitor” (with a ‘new visit’) on every subsequent visit.
So how does Google Analytics calculate visitors and visits?
Let’s take a quick look at these important cookies and some of the values in the gif request which define a visit:
The Random Unique ID is what allows GA to determine a “visitor.” This number will be the same in __utmz cookie as well.
Also note the session counter. This tells GA which number of visit to the site it was for this visitor (first, second, third etc).
According to Google, “the session number increments every time the campaign cookie is overwritten. The campaign cookie increments for every time that the campaign cookie is overwritten, even if it is in the same session.”
I might be missing something, but as far as I can tell, this wasn’t really true “then” and is less true “now.” Specifically, it used to be that the session counter would not update if a visitor came to the site via two separate campaigns. That is why in the above graphic the campaign number can be bigger than the session number.
In any case, these are the two cookie values that passed onto Google Analytics via every gif request (i.e. a hit, such as a pageview or event). A standard gif request is a whole bunch of data that looks like this:
I like using Chrome to see a parsed version which makes it easier to read.
A visit is simply a “Unique ID + Session Number.” Google Analytics records another visit every time that the Session Counter advances. The number of pageviews in a visit is counted by the number of pageview hits (_trackPageview) in a session. Similarly, the number of events per visit is determined by the number of event hits (_trackEvent) in a session. (Yep, there are different types of hits, reported by utmt in the gif request).
So what causes the session calculator to advance? Glad you asked. 🙂
Calculating Sessions – the old way.
Google Analytics used to use two cookies to determine session. UTMB and UTMC. In addition to having a pageview counter, the UTMB cookie also has a time stamp. By default, exactly 30 minutes after the time stamp in the cookie, the cookie expires and is deleted from the browser. (Poof! Goodbye.) So, if a user came to my site looking to learn about visitor level tracking in GA, got distracted on Twitter for 30 minutes in another tab, and then went back to my site to because they realized it was time for an Analytics Audit, the UTMB cookie would not be there and the session counter in UTMA and UMTZ advances on the next hit. If however, after every 25 minutes of Twitter distraction, they continued what they were doing on my site, their session would be extended. Sessions also used to end when a user closed their browser. This was due to UTMC. UTMC existed as long as their browser was open. Closed browser, visit over.
So, GA used to look for 2 different cookies. If either was missing, the session counter advanced on the next hit (i.e. pageview or event) and it was a new visit.
So, let’s take the following scenario. I’m searching on Google for arts and crafts paper.
- Google Search: “metallic craft paper” >
- Land on example.com home page. View for 5 pages | 3 minutes on site >
- Back to Google.com | 1 minute > Search for “american made metallic paper” >
- Return via SERP to example.com, this time a more relevant internal page >
- 19 pageviews later I purchase a nice roll of hard to find paper.
In the old way of tracking visits, GA would report:
1 New Visits with 25 pageviews. The visit would be attributed to “metallic craft paper” while the sale would have been attributed to “american made metallic paper,” as the UTMZ campaign cookie was updated. This, of course, is called ‘last click attribution.’ So if the visit is attributed to “metallic craft paper”, what about the “american made metallic paper” keyword? It would be there, but the number of visits would be listed as 0. (**Everyone should check out Michael Whitaker’s blog on this topic).
So, did I visit the site once or twice? Was the purchase made by a “New Visitor” or a “Returning Visitor.” It really depends on how you look at it, but in my humble opinion, I actually visited site twice. There is really nothing to say that had my 2nd organic search brought up a different site on the SERPs that I more interested in, I wouldn’t have clicked there. I believe that this is a significant part of the reason by GA changed how they deal with sessions.
Calculating Sessions – the new way.
1). There is no more UTMC cookie. Gone, deprecated, goodbye. That means that if I accidentally close my browser (oops!) or my computer crashes (d’oh!) and then reopen to the same site, GA will only look for UTMB. If I haven’t been idle for 30 minutes, the session continues.
2). Every time UTMZ gets updated, the session counter advances. The session is no longer solely dependent upon the utmb & utmc ”session cookies.” Remember how above it was possible for the campaign counter to advanced but the session counter to stay the same? That is no longer possible now. The GA session counter will always advance when the UTMZ cookie is updated.
Let’s take a quick look a real example that happened the day that Google Analytics made the change:
In this example, I’m using visitor level tracking by via custom variables. This visitor, let’s call him or her 1033107639
- Very first visit to the site was via a click on a Google Product Ad.
- 2nd visit is from a Google Product Ad. Either 30 min of inactivity happened, or user went back to Google and clicked on another listing.
- Visits 3-6 were via branded keywords. Might mean that 30 minutes expired between visits 3-4, 4-5, 5-6.
- 7th visit, back to Google… Clicked on product ad.
- Visits 8-11, category type keyword search. Might mean that 30 minutes expired between visits 8-9, 9-10, 10-11.
- Visit 12, product type keyword search
- Visits 13 -14, product specific search.
One thing that I didn’t take into account while trying to recreate this particular visitor’s buying process was tabbed browsing. This makes it a lot easier to have a more fragmented looking data set. The user interaction with the site will look like they are jumping all over the place between different channels and keywords, when if fact, they might be using Google Product Search in one tab and Google Web Search in another. Each time they enter the site from one of these different sources, it will update GA’s cookies and another visit will ensue.
**UPDATE** The multiple visits via the same campaign source (UTMZ values) in points 3 & 5 above were caused by the GA bug that was fixed August 16th. Using the API, I pulled down hour of day data for this particular user ID and discovered that there were not 30 minutes between visits. A takeaway from this is to always question the accuracy of data, even if you think you have a rock-solid implementation. In my case, the implementation was right on but there was a bug. But oftentimes I find implementation problems specifically because the data is fishy. In the use case above, 4 visits in a row for aluminum coat rack on the same day was “fishy.” With that said, the process of a user going back and forth between search engines and the site is accurate. After the bug fix, this the above visitors would have made 6 visits to the site, instead of 14. Before the August 11th session change, this would have been counted as 2 visits.
I’ve been examining the impact of this change on GA’s reported data on lot of different sites (with unique visitors per month of 1K, 3K, 50K, 150K, 350K, 1.2M, 4.8M, 5.5M and 7.5M respectively). The biggest changes I have seen are on ecommerce sites. Perhaps on a future post I’ll do some statistical analysis by site type. For many out there commenting on the forums, common shifts look like this:
The first and most obvious change is to see the number of visits increase while unique visitors remain the same.
This is happening because whereas previously a visitor could back and forth between a site and search engines, comparison shopping engines and the like all within one visit, now each time they return to the site from a with a different source, medium, campaign, ad content, keyword or gclid, it is a new visit. (That was a mouthful).
**UPDATE** After GA fixed the bug, the number of increased visits for a number of sites should not be as inflated. However, for the same site I analyzed above, we are still seeing a significant increase in visits compared to unique visitors. Albeit, not a 30% plus increase, but there is definitely a change.
We see lots of visitors to the site returning via a different source within what used to be counted as the same session.
After the bug fix, we still see a significant number of visitors returning to the site via a different source within what used to be counted as the same visit. However, the numbers are not nearly as inflated.
From the visitor loyalty distribution, we see that many of the previous “1 time” visitors are indeed coming back to the site many times as a part of their shopping process.
While it is true that many of the previous “1 time” visitors are indeed coming back to the site multiple times, it is not nearly as many as previously indicated when there was a bug.
Average Time on Site, Pages per Visit and other “visit” related metrics.
These metrics will tend to look “worse” since sessions are being updated more frequently.
Since a fundamental way of looking at one’s site via GA has changed, the average person (and even some analysts) need to be very careful about drawing conclusions from data. For example, ”If I’m seeing a large decrease in average time on site it means that users are finding my content less meaningful than they used to, right?” Well…yes and no. All things being equal, yes. But now that the way sessions are calculated has changed the sudden drop means that users are going back and forth to your site more than you thought they were. Their average time per visit is less, but the aggregate amount of time a user spends on the a site is the same.
The same goes for pages per visit. Visits are starting anew after a visitor has viewed less pages since they are updating their campaign cookies.
The same goes for bounce rate. Visitors will now be more likely to see one page and then come back to the site and have it be a bounce then it used to be, where their return to the site would simply extend the session and result in the visit not being a bounce.
Last but not least, conversion rate. Especially for ecommerce sites, a visitor who goes back and forth between the search engines, CSEs, and the ecomm site will have many more visits until conversion. This is a great boon of information for those who have access to multiple touch attribution models, including the Multi Channel Funnels in Google Analytics which is in Beta. For everybody else, it will look like there is a drop in conversion rate.
For all of the above, I recommend that people use custom reports and/or data directly from the API to create some calculated “per visitor” metrics. These will be most helpful for measuring any year over year or month over month changes.
**UPDATE** I thought I would throw in this chart of Percent of New Visits over time. This was definitely one of the data points that was most impacted by the bug. That said, we definitely see that the % New Visits metric has been impacted pre and post session change. The question, of course, remains for the analyst to figure out how to interpret this change. I often view % New Visits as a metric that informs me about acquisition potential (or success). The change in the way sessions are calculated will provide a more accurate portrayal as to where the “first touch” visitors to the site are really coming from.
In the previous paradigm, a user’s first visit (“new visit”) to a site might be reported as the 3rd or 4th click they made from an external source, all within the same session. Now, through segmentation and analysis of the % New Visits metric across multiple channels, we’ll be able to see more accurately where that first visit really came from, and where the greatest acquisition potential lies.
Of course, until Multi Channel Funnels become available for all in Google Analytics we won’t really have a super deep understanding of attribution (and even then, it will still be lacking). For now, I think it is great that we’ll be able to more accurately see where those initial visits to our website are truly sourcing from. Where they go afterwards is still anyone’s guess.
So, there are still some things which are really eluding me. I’d love to hear some answers from colleagues about the following two issues.
1). Cookies aren’t updating as expected. Verified in gif requests as well….
So, how in the world are sessions actually tracked now?
2). Big increase in Visits to Adwords in absence of new clicks.
I would expect that a visits to other channels would increase. But in this case, there is an increase in visits relative to the increase in clicks. Clicks have always been calculated much in the same way that visits are now calculated. Namely, if I clicked on an ad, went back to Google and clicked on an Organic listing,
- In the old paradigm, 1 visit for the Google / Organic keyword and 0 visits 1 click for the Google / Adwords keyword
- New paradigm,1 visit for the Google / Organic keyword and 1 visits 1 click for the Google / Adwords keyword
In both cases there was always a click recorded in Adwords. This is the biggest stumper I’ve found to date.
**UPDATE** New analysis reveals that neither the cookie question or the Adwords click question have changed post bug fix