Universal Tagging Conventions: Towards Web Data Structure Standards

On from Web Analytics

I was reading Ian Thomas’s latest post this morning, and his ideas prompted me to express some of mine I’ve had lately. Nothing really definitive yet, but worth sharing with you I believe, especially since these are the kind of ideas that would need inputs from a lot of people.

I would like to propose that our field, Web Analytics, should establish standards in how web sites are tagged, so that 1) data could be analyzed by any application from complying vendors, and 2) in such a way to make it easy to integrate with other enterprise data, even loadable in a data warehouse somehow.

Let us first examine the benefits from #1. Having standards in how a web site is tagged would really reinforce ownership by the true owners, i.e. the site owners themselves. Data could easily be transferred from one vendor to another, eliminating dependency on proprietary data structure, which would make competitiveness solely rely on functionalities. Of course, data (“logs”) would have to be readily available, which is unfortunately not the case now (and to me Google Analytics biggest drawback, for example). It could also be possible for site owners to tag and collect the data themselves, as with WebTrends SDC, and then decide where to get it analyzed; either using a locally installed version of a product, or dumping their data on the vendor’s servers.

Another big advantage would be that measures (visitors, visits, campaigns, what have you) would basically mean the same to everybody. Over with the days when analyzing the same logs with different products meant getting different numbers all over (well, at least when doing this was even possible using web server logs).

As for #2, defining data structure with enterprise integration in mind would certainly force us to revisit what we really need to measure on a web site, due to practical limitations of data warehouses. Would everything have the same value? Should every site collect every bits, or collect the most important actions/events? Well, I don’t know, but we would definitely need to rethink the value of collecting all page views, for example, at least for those not in publishing (and thus not relying on an advertising based business model). I am aware that making web data structured in a way to make it readily available to BI products would also bring the question of what is Web data and data coming from the Web, i.e. where would Web Analytics stops and BI starts. But that’s a different debate.

Ian’s idea of making Google the universal collector of data is interesting. However, I think we can go a step further with what I am saying here by making even the collection totally a matter of choice. Also, Google would have to make the data/logs available to anyone who needs their data, instead of relying only on the brand for result validation, as it is currently the case with GA.

Am I crazy? Is this even feasible? Would vendors see their benefits in this? Well, you are more than welcome to add your own more intelligent ideas here.

Tags: Universal+Tagging+Conventions, Web+Data+Uniformity, Web+Data+Structure+Standards

13 responses to “Universal Tagging Conventions: Towards Web Data Structure Standards

  1. I think I remember reading something about a universal tagging system that could convert the tagging base on the vendor. So if you change vendor, you only have to modify your configuration file so the tagging on your site will be translate to the native tagging of your new vendor.

    I just don’t remember where I read about it.

  2. Hey Jacques,

    To quote Radiohead, “nice dream” 🙂

    I think this is ideal, but unlikely except perhaps in the long-term.

    1.) Some people don’t trust Google. Let’s be honest, Google wants your data to understand the true value of keywords and understand user experience.

    2.) Competitive Advantage – Much of what often prevents people from switching vendors is the inertia of tagging. What’s good for the client is not always what’s good for the vendor.

    3.) Data Collection and Processing – Some vendors can (rightfully) argue that the way they calculate certain metrics is better. Check out this issue:

    We call all hope, though…


  3. I think that you and I have similar Crazy Ideas, but mine does not involve what happens at collection time. Sure, we could benefit from some tool-independent tag standardization, but for the most part I think data collection works pretty well. I believe we need to have better control over the way we get data *out* of our analytics systems *after* it has been collected and transformed – ie we need the ability to define meaningful and useful visitor-level feeds. At collection time you get an event-level snapshot of a visitor’s behavior. At extraction time you have access to that visitor’s behavior over all history. See the difference?

    Yes this will be the topic of our impromptu “hallway huddle” in August. 🙂

  4. Hi Jacques,

    Very interesting discussion. You share this vision with Tom Hochstatter, a VP of biz dev at Yahoo! (not sure if he still is or whether he is still there or whether his work is accomplished now that they acquired IndexTools). Tom made the same recommendation at an emetrics summit back in 2006 or 2007.

    First let me say something to Ian’s vision of Google collecting data and 3d parties developing web analytics tools. It is possible, but would have limitations. There is a big distinction between the architecture of web analytics tools. Some throw away all detailed data after processing the latest sessions and they only keep detail data. This makes sense for limiting storage requirements. But if you are going to offer ad hoc slicing & dicing of reports, this aggregate level data doesn’t suffice. You have to have a data warehouse with detail data. Hence the big guys, including my employer Unica, all offer data warehouse options. In the case of Unica’s NetInsight web analytics solution this happens to be the default.

    What this means for Ian’s idea however, I think, is that the 3d party tools plugging into Google could only be offering repackaged versions of the analytics that GA already provides. They could not make new connections between data elements becuase that would require detail data. So web analytics tools that offer slicing & dicing could not make do with this limited data.

    As for the universal tag idea. There seem to be two concepts. One is more like Ian’s, i.e. one company’s data center collects all the data and provides it to third parties. That approach is difficult swallow. Data is gold, after all.

    In contrast, Tom (and you, I believe) were suggesting that all w.a. vendors could standardize on a common Javascript page tag. At least they could make sure that there is some basic compatibility. (similar to standard SQL vs. RDBMS specific extensions). I agree that customer would love the latter.

    It is my understanding that non standard web analytics tools, for example MVTs have very different requirements though. So the standard would likely not go beyond pure web analytics solutions.

    I guess the question is: will customers value this over other improvements into which w.a. vendors could invest. There is only so much that can be developed given finite resources. Incentives are another matter.

    Good discussion! I don’t remember whether the WAA’s standards committee ever discussed this. We have our hands full standardizing just metrics under the leadership of the fearless Angie Brown.

  5. Alex: Supertramp, “Dreamer, you know you are a dreamer!” (this gives out my age by the way).

    Well, I know it’s far fetch (see Akin’s comment too), but, hey, we’re still at the fantasy stage!

    1) Yes, that is why this idea of UTC is not linked to Google doing the collection. They could, but I could do it myself as well if I wanted.

    2)Again, do we need to accept that indefinitely??

    3)I don’t argue that, in fact that would be essentially where they would compete. Do they need to have the data in a proprietary format, though?

    Thanks for your input!!

    June: So, that’ll be two fools at X Change! Oh yes! The post-process export capabilities still suck! I certainly see the difference in what you are suggesting. One of my secondary ideas is that the event data could be in a format that would make it easily “dumpable” in a data warehouse. A BI tool could then take care of it. Anyway, see you in the hallway !

    Arin: Speaking of Yahoo VP, I would definitely LOVE to have Bill Schmarzo’s take on this:


    I hope I’ll be able to attend his keynote at the TDWI San Diego!

    Well, see my response to June. I had in mind to have the Web data (some of it only? all?) structured in a way that could make it possible to dump it in the DW if that is what I want to do. This begs the question of what is web data vs. data coming from the web. I mean, when I measure a KPI that’s Web Sales/Total Sales, not one number comes from the web alaytics application! Anyway, that’s another discussion.

    Yes, the idea is to make the data structure independent from vendors, with total ownership from site owners, and total (or partial in a good proportion) compatibility. I don’t think being “data independent” has anything to do with application development. Actually, vendors would have to be very persuasive in convincing companies that their application would be the one to extract maximum juice from the data.

    Thanks again for your great input!

    Well, I guess we have cause for a “hallway huddle” at the next X Change !! (http://www.semphonic.com/conf/index.asp)

  6. Hi Jacques,

    I concede to you that your solution is the best one. But I don’t think this gonna happen. Being from IT and having experience with Google Analytics, Coremetrics and a quick glance at the Omniture tags, I’m not sure a system like a describe earlier is even possible.

    Vendors shares some commons characteristics, but when you start digging deeper, it because more and more difficult to do so.

    Heck, the industry is having trouble on agreeing on definitions of basics KPIs. The WAA produce a official guideline for the 20 most commons KPIs.

    But in a perfect world, I think the idea of a universal tagging system is a great idea 🙂

  7. Nice to see that my post has at least sparked some debate, if not on my blog itself (where wind whistles through the empty comments box). To respond to a couple of the points:

    Jacques: Don’t hold your breath for standardized tagging. As Alex says, it’s not really in the vendors’ interests to make it easier for customers to switch; plus, whilst it might be possible to settle on standardized data collection for core elements like page URL and referrer, a typical Omniture implementation contains a huge number of custom variables being captured, and I can’t see a standard for this emerging any time soon.

    Alex: Although my post was about Google, I wouldn’t expect it to be the only game in town offering this kind of data collection/warehousing service. Another company that I know quite well (since their name appears on my paycheck) would possibly be equally interested in providing such a service. And you could imagine Yahoo wanting to be in this game too. My point was that the infrastructure for data collection, processing and storage is one of the main expense lines for a web analytics vendor, and only a few companies can drive the economies of scale to make it work.

    Akin: In my post I was careful to be explicit about the kind of API Google could expose. A simple “front-end” API into GA’s functionality would be moderately useful but not earth-shattering; what I’m talking about in the post is an API into Google’s data warehouse itself, enabling a 3rd-party analytics app to pull novel kinds of reports and create its own custom segmentation, approaching more closely the kind of functionality in the enterprise tools.


  8. Hi Ian,

    Thanks for clarifying the API into Google’s data warehouse. I wonder whether Jacques can get the friends from GA to comment. But i’d be willing to bet that there is no such data warehouse underlying Google Analytics (of formerly Urchin). That is by design though. Rather than emphasizing warehouse + segmentation, the design for their solution always emphasized scalability of reporting to massess of users. You can’t be everything to everyone after all.

  9. Pingback: The Big Integration » tbi alert

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.