A better way to measure content engagement with Google Analytics

This post is inspired by a conversation that I had with my friend and colleague Simo Ahava at Superweek as well as a recent work request from a well-established Italian publisher. In short, the publisher was quite challenged by the fact that they had an 85% bounce rate, and that their time on site was so low. Their articles tend to the get many hundreds, if not thousands, of Facebook likes, so “how could it be that users were spending so little time on site?!” Their average time on page was around the three minute mark, so how could be that average session duration was significantly lower?

bounce rate, time on page

  • Challenge 1:  Google Analytics tracks time on page / on site by measuring difference between time stamps of hits.  If the page is a bounce, no time will be recorded.
  • Challenge 2:  Even if the page viewed is not the bounce/exit page (and thereby has a time greater than zero), GA doesn’t distinguish between time on page/site if the browser window is in a hidden or visible tab

After a lengthy explanation to the client informing them of the way the Google Analytics tracks time on page (and by extension, time on site), they were still stuck without a way to accurately measure content engagement.  First of all, there are a number of different ways to measure engagement besides time on page / site.  Many posts have been written about this and I urge readers to seek those out since time metrics gain too much undue focus as it is.  As things stand, since this publisher’s site was not configured with any event tracking (a scroll tracking module would be great), they were seeing many users come to their site, view one page, and then leave. Unfortunately for them, “out of the box GA” does not provide very good insights into the nature of how users are interacting with their content. “Are they even reading the content?”

In my time out there on the interwebs, I have heard many folks voice concerns, complain, groan, or otherwise kvetch about Google Analytics not providing them with accurate time on page. Of course, even the out of the box time on page metrics for non-bounce visits still skew the picture we want to paint of user behavior.  In particular, I think about how often it is that I right click on a link on Twitter or elsewhere, and open that link in a new tab.  From the moment the tab is open, the clock is running.  In an even more common scenario, I have multiple tabs open and forget about those pages for some time only to go back to them later. From a time on page perspective in Google Analytics, the clock is running (until session timeout at 30 min default).  In most cases, the total time on page isn’t even recorded because users close their browser window or the session times out.  So when I heard Simo speak about the Page Visibility API at Superweek, I got to thinking about how we could model true time on page in Google Analytics.  

There are few major considerations that I have when trying to understand “real” time on page.  I’m defining “real” time on page as the amount of time that the window has been in focus.  So, first question to answer is –> “is the window in focus?”  (I.E. is it the current tab in the browser).   Next, we’ll need to get a timestamp for when the user navigates away from the page or closes their browser window.  Thanks again to Simo for shooting me over a code sample for the beforeunload function.  (Simo, if you haven’t figured it out already, I think you rule!)

 timing category

The first place I am modeling this data within Google Analytics is using the User Timings API.  Much like event tracking, user timings use categories, variables, and labels.  The value is time in milliseconds (converted to seconds in reports).  I really like using the User Timings API because it is GA’s native way to track events that have to do with time in GA.  Strangely, I find it to be a highly underused feature of Google Analytics.

hidden time in articles
The logic works like this. When the page loads, GTM sets a time stamp and we record the page’s Visibility State.  Using an event listener, when the visibility state changes we fire a timing tag.  The Category is “Page Visibility”, the Variable (think GA Event Action) is Visible or Hidden, and the value is calculated by subtracting the previous time stamp from the current time stamp.  The value pushed into the data layer needs to be the opposite of the current visibility state.  That is because we need to set the value for the hit to be descriptive of the previous state.  For example, if my tab is Hidden and then becomes Visible, I’ll need to push a data layer value into my tag at the moment that it becomes Visible informing GA of the amount of time the tag was Hidden.  

One of the things I’m like most about using the User Timings API, is the ability to get a histogram of the timing samples. Below we see the average amount of time that browser tabs were hidden for articles on this site.  For every timing hit sent to Google Analytics via the User Timings API (in our case, the amount of time the window was not in focus), the data will find its place within the distribution.  This shows me approximately how long users are not looking at my content, even though the “clock is running” in terms of the standard time on page metric (or not captured at all in the case of a bounce).

hidden browser tab distribution

In this implementation, we fire a user timing hit for every change in browser visibility state. Critically, we also make sure to send an timing hit immediately before the browser window closes. In order to help make sure that the data gets to Google before the browser window closes or navigates to the next page, we employed the useBeacon feature of the analytics.js API.  

The User Timing API measure the amount of every hit, but I also want to collect raw aggregate timing measures. In order to calculate the amount of time that page was in focus (on average), I decided to leverage Custom Metrics.  When custom metrics were first released, I admit that I did not see them as having much utility. Although I still wish that those metrics could be scoped to session and user levels (pretty please?), I have found more and more use cases where custom metrics are useful.  Pro tip: Think about meaningful ways to use Custom Metrics in your implementations.

custom metrics admin

One thing you should know about custom metrics is that they increment per hit.  They are counters.  So, for every timestamp that I send to Google for my page being in focus, it will be added up on the page level.  There are three types of custom metrics: integers, currency, and time.  The time metric also needs to be sent as an integer (no decimals allowed!). custom metrics reports

The data I’m looking for is the average amount of time the page was visible or hidden. Within the standard GA reports, the custom metric will return TOTAL visible time or hidden time. I needed to export the data to Excel and do some ETL in order to calculate the average visible time per page (Total Time / Pageviews).

average page time in focus

Depending on your implementation needs, you may want to set an upper limit within the Google Analytics backend for the value of the custom metric. Just think about the number of times you’ve gone to sleep and left your browser window open.  🙂  For starters, I suggest trying a 1,800 limit to the value for the metric, as this will map to the standard session timeout in Google Analytics.  As you evaluate your own needs, you may want to experiment with this (would love to hear more in the comments below).  The above image did not have the custom metric upper limit applied.

UPDATE #1:  A number of people have been asking for some data that shows bounced vs. non-bounced sessions as they relate to the Total Time Visible metric.

bounce vs. non-bounced

UPDATE #2:  Because averages suck, I’ve also started sending a “Total Time Visible” value to User Timings on the beforeunload.  This value is calculated by looping through the array of data layer values for when the page was visible and summing them.  (Go Math!).  

Personally, I love the histogram feature within GA that is available in far too few reports.

before unload time on page google analytics

Summary:  A better understanding of true user engagement with their content can help the publisher mentioned in the beginning of the article understand which types of content build user loyalty (and by extension pageviews, and by extension advertising revenue). In summary, we started with a real business use case where a publisher was feeling the pain of not having visibility into user engagement with their site.  A technical solution (which gratefully relied on the expertise of some really smart people) was able to come into existence to in response to the problem.  That’s my general approach to digital analytics.  Let your strategic business objectives and questions drive the data collection solution which can then be used to make smart decisions.