...and the last confession of a Turncoat.
What? ADD as Turncoat Analyst? Gamekeeper turned poacher? Or is it the other way round?! I’d like to think of it more in terms of "If you can't beat them, join them." Or else, it's a unique opportunity to be a mole on the inside… However you call it, I'm not about to temper my outspoken approach, that's for sure! 
It’ll be interesting, however it pans out!
I'll admit that I'm really excited by the prospect of become a Research Director at Gartner. I'm hoping that this new role gives me a unique opportunity to further explore many of the issues that we experience within the Information Management and Analytics sector, and influence the way we think and act with data. It should also mean that I will be able to delve into a lot more detail than my blogging enables me to. Even my "discussion paper" series doesn't really provide the channel to go in-depth into issues and compile the supporting evidence that I would ideally wish.  
The down-side is that I will probably have to curtail my self-published content as I start putting most of my material out through the Gartner channels. (Though there's always the chance that a particular "too hot for TV" moment will need to come out under my own auspices!) I'll also continue to Tweet on a regular basis.
But before I turn my cloak altogether, I thought I'd follow up my previous comments on data quality in "Big Data" environments. More precisely, here's why I think our industry is currently missing a trick when it comes to "Big Data" (or as it should more properly be called, "data".)
Now, the
first challenge is that there is still much disagreement about what constitutes
“Big Data”. The original suggestion that it is “any data that can’t be
processed by traditional methods” is hugely unhelpful, as would be any attempt
to define any thing as being “not another thing”. (Would we be comfortable in
defining a “dog” as “not a cat”?)
 In the
past few years, the technology sector has generally settled upon defining “Big
Data” based on identifying certain characteristics of the data set, with those characteristics
all beginning the letter ‘V’. Gartner analyst Doug Laney originally proposed
three ‘V’ characteristics – Volume, Velocity, and Variety.
In the
past few years, the technology sector has generally settled upon defining “Big
Data” based on identifying certain characteristics of the data set, with those characteristics
all beginning the letter ‘V’. Gartner analyst Doug Laney originally proposed
three ‘V’ characteristics – Volume, Velocity, and Variety.
These
three ‘V’s help to establish characteristics and bound the problem of what “Big
Data” might look like from technical
perspective. The new breed of data tools certainly enable the engineering of
new and innovative methods of processing data that were previously out of reach
to all but the most well-funded of organisations.
There is no “so what?” factor that jumps out at us to make the problems of “Big
Data” meaningful in a business context.
- Variability: Within any given data set, is the structure of that data regular and dependable, or is subject to unpredictable change? If so, how can we understand the nature of the “unstructured” text data content (or sound, or video) and interpret it in a way that becomes meaningful for the required business analytic-ready output?
- Veracity: How do we know that the data is actually correct and fit for purpose? Can we test the data against a set of defined criteria that establish the degree of confidence and trustworthiness? What are the business rules that enable the data to be tested and profiled? If there are issues with the data, what actions can be taken to clean and correct the data before any analysis is carried out.
- Value: What is the business purpose or outcome that we are trying to meet? What questions are we seeking to answer, and what actions do we expect to take as a result? What benefits do we expect to achieve from collecting and analysing the data? Has the data been aligned with the desired outcome?
All three of these
additional characteristics require a clear understanding of the business context, which then is used to
frame the meaning and purpose of the data content. “Variability”, “Veracity”
and “Value” all express different aspects of the fitness-for-purpose of the
data sets in question, all of which need to be addressed in order to solve a
business problem in business terms.
If expanding the "Big Data" lexicon to a "Six 'V's Model" becomes my first contribution as a Gartner Analyst, then it's probably not a bad place to start.

 
Great to hear about your change. Gartner needs you��
ReplyDeleteThanks for your kind words!
ReplyDeleteAs part of my work at Gartner, I've followed up on this idea with a more in-depth research piece on the "Three Business 'V's of Big Data Analytics" (so I guess I succeeded in expanding the model!). This explores the actionable opportunities and implications of putting the emphasis on business issues, rather than the data processing itself:
ReplyDeletehttp://www.gartner.com/document/2921417 (requires Gartner subscription to access)
I really like the more pragmatic application of your 3 V's, Variability, Veracity and Value are much more outcome centric.
ReplyDeleteSize (Volume), Speed (Velocity) and Model (Variety) of a car does not help you determine if you will arrive on time at the right place whereas Variability, Veracity and Value are more aligned to ensuring the car is fit for the journey, you have a map and you know why you are going to where you are going?
Nice analogy - thanks Martin. I might just have to borrow that...!
ReplyDeleteHello,
ReplyDeleteThe Article on The "Three Business 'V's" of Big Data is nice.It give detail information about it .Thanks for Sharing the information about Big Data. data science consulting