"An extraordinary thinker and strategist" "Great knowledge and a wealth of experience" "Informative and entertaining as always" "Captivating!" "Very relevant information" "10 out of 7 actually!" "In my over 20 years in the Analytics and Information Management space I believe Alan is the best and most complete practitioner I have worked with" "Surprisingly entertaining..." "Extremely eloquent, knowledgeable and great at joining the topics and themes between presentations" "Informative, dynamic and engaging" "I'd work with Alan even if I didn't enjoy it so much." "The quintessential information and data management practitioner – passionate, evangelistic, experienced, intelligent, and knowledgeable" "The best knowledgeable, enthusiastic and committed problem solver I have ever worked with" "His passion and depth of knowledge in Information Management Strategy and Governance is infectious" "Feed him your most critical strategic challenges. They are his breakfast." "A rare gem - a pleasure to work with."

Wednesday, 30 July 2014

Business Glossaries – the pointy end of metadata management

A recent thread on LinkedIn raised the issue of implementing a business glossary (in particular relation to using IBM’s Business Glossary tool).

I generally try to avoid commenting on any particular vendor’s products in this blog – as regular readers will know, I’m much more concerned with all the joys and frustration of the human aspects of Information Management! However, the LinkedIn discussion raised some interesting questions on the topic of “business glossaries” and metadata management more generally, and I think these are worth exploring and summarising.

The first thing to note is that the key issues of successful business glossary implementation aren't related to the technical deployment of the tool anyway. As with all Information Management capabilities, they are cultural, societal, behavioural, process - i.e. human challenges (that is true whether you are dealing with data quality, master data management, Information Asset Management etc.). The overall approach to Data Management and Governance needs to be given due consideration - who is responsible for data definitions and business rules, who has the decision rights, how do the definitions get maintained, management and published etc.)

The key purpose of a “business glossary” is that of human communication and collaboration – to exchange business-level understanding and interpretation of informational terms. A business glossary (sometimes referred to as "business dictionary", “business metadata”, “business vocabulary” or “business lexicon”) collects terminology that expresses business concepts in the language of the end-user, with the aim of collating one consistent set of terms that are commonly understood by the user community. That’s hard enough!)

Getting people to agree on words & definitions is difficult. Unfortunately life isn't that clean cut. Even just collecting as many words/terms/phrases/acronyms as you can grab and examining the different uses/definitions/conflicts can be time consuming.

When I was at UNSW we ended up collecting over 1500 terms, which one way or other resolved to about 400 groupable items (e.g. there were six different ways of establishing whether or not we counted someone as a "student"). 

Resolving all of those contentions, discrepancies and ambiguities in one go was way too hard for most people to get their head around, let alone show any interest!) So we focussed on one subject area - in this case, Staff/HR data definitions - which was a high priority as we were in the process of re-implementing our HR admin system.

One common question that was asked very often was “How many staff do we have?” This invariably led to much wailing and gnashing of teeth as people scurried around, frantically trying to answer the question, only to come up with multiple different answers, none of which corresponded with any other answer.

Of course, the challenge was that there was no agreed understanding of what anyone meant by “How many”, “staff” and “have”!

At UNSW, we had a working party of 5 full-time team members, supported by a part-time stakeholder group of approximately 30 nominated business representatives. (For some comments about the "consultation culture" at the university, see my interview for DataQualityPro.) 

We settled upon approximately 80 agreeable terms, which also included some significant re-thinking of business concepts in some cases e.g. we had to split the generic and idea of someone having "contract" into the concepts of "employment status" (permanent vs temporary), "employment type" (fully employed vs fixed term vs contract vs casual), "payment arrangement" (paid vs emeritus vs conjoint/volunteer) etc. 

Just for the "Staff/HR" subject area, it took over six months to get the definition of terms resolved. We than had to start on the process of cleaning the actual data, ready for migration to the new system... Whatever you're doing, patience is a virtue!

It’s also worth noting that the issues of identifying, validating and communicating this common business language are very different from (though related to) the more detailed questions of data modelling, integration, traceability, integrity and auditability enforcement which might be considered the realm “technical metadata” (and I’m using the word “technical” very advisedly here to mean any aspect of metadata management that isn’t immediately end-user facing!)

By the way, I’m not suggesting that “business metadata” and “technical metadata” are separate – indeed, it’s vital that they integrate and correlate to/down and bottom/up.  Together, these will form the core body of knowledge that defines the existence of the organization. However, in order to make things manageable, it is useful to think of them as different views of the same thing, dependent upon role and purpose.

In my experience, if you want to be successful in implementing an Information Management environment, it is absolutely vital to address the human, cultural and societal factors that make for a successful outcome. Build organisational capability and resilience for Information Management as a set of foundational disciplines. Think about the accountabilities, responsibilities and process controls that are required.

Hint - Enter into a project naively, and you will fail. don't buy the tools unless you are prepared to deal with the human factors.


  1. Great post, Alan on a topic around which I have built my 'new' information management architecture. I say 'new' because I went through several data management exercises exactly like the one you describe at a university college in Calgary. I hope my comments here will build on your post.

    In the course of building a data dictionary, we discovered three things that helped enourmously:

    1. Decouple the common definition of a term (like "Student") from the applications that used it.
    2. Force people to be very explicit about the qualifiers they might use with the term. For "Student" these would include many status values like: foreign, part-time, credit free, etc.
    3. Get consensus from the group on exactly what we were counting or referring to. In the case of "Student" we were counting individual people.
    4. Ask the group where the set we were interested in might overlap with other sets. For example, the set of 'People in the role of "Student" might overlap with, say, the set of 'People in the Role of "Staff".

    Someone in another discussion group said they use a dictionary to establish the decoupled definition and I often use the same technique. It's amazing how many fights stop at that border. When you define a term, in this case a Role, as universally as possible, modifiers will not change that definition.

    Likewise, discovering what people assume are the acceptable modifiers is always interesting and almost always rewarding. "Student" at Mount Royal University and "Student" at UC Berkley are likely two different things; just as "Student" in the Finance system is different than "Student" in the Registration System. (they shouldn't be but there you are). I call this the dialects challenge. Once you know that one is speaking money while the other is speaking bums in seats it's mush easier to have that conversation.

    Speaking of bums, if you assume (that word again) that we since everyone who is a student has a bum then counting bums (as opposed to counting People) is acceptable. For your readers now imagining 'bumless' people I apologize, but stranger assumptions have been made when it comes to statistics. For example, counting rows instead of individuals. Surface the assumptions and again things become clearer.

    Finally, the overlap question points to the answer to #3. If we are counting Persons in different Roles, and it's ok for a person to occupy more than one then you will have overlap.

    Just a few thoughts on a Saturday morning. I'm going out to cut the lawn now. Until next time!

    John O'Gorman

  2. Thanks John - great comments & your techniques mirror mine in a lot of ways. Cheers ADD

  3. It's a well-written post, one with which I agree. However, please don't overlook the value to be gained by trying to arrange an entity's terminology in a structured manner, i.e. - a glossary, one which may be published to a wide audience. Creating structured definitions of terms forces stakeholders to explain their rationale more carefully and in the context of other terms rather than in isolation. The latter is when mistakes are made.

    During discussions about terminology, we should not lose sight of the business (or technical) rationale that drives each term's existence. For example, there are many geographical attributes at any company, so which one is most appropriate for responding to your particular business question? In the HR world, there are many attributes which seem to overlap, e.g. - worker life cycle status (employment status), worker contract type (employment type), FT-PT classification, FTE, management level, exemption status, pay grade. I have repeatedly seen business people mix them up by awkwardly combining their value sets into one picklist. A glossary helps show "lay people" why there should be two attributes, not one.


    Jim Burtt

  4. This is an excellent post! I'm working on a serious of "best practices" posts for a user group I'm involved with, and if it's well received I'll broaden the audience. When it comes to glossary some of the points you make are part of what I'll be saying. I think you imply the need for collaboration and governance, which are critical. I think your point about "naivety" is also excellent -- I've seen something of a tendency to believe that glossaries can be centrally mandated on a "one and done" basis ... which certainly ain't so!
    Both John and Jim make some great points about the relationship between terminology, organization, and technology. One of the efforts I'm engaged in is an attempt to articulate a model that embodies some of that thinking. We need to move these practices in more "standard" directions!

  5. Thanks Jim & Ian. We get into all sorts of other areas quite quickly - Data Modelling, Requirements Gathering, Governance & decision rights etc. The skill of the information practitioner is to help the business group (and IT) navigate these - ideally, without getting into too much detail of exactly what/how we're doing it!

    I've also got some useful techniques (available in my coaching/training packs but not part of my blog, as yet) which explore the different layers within the overall information model (business/logical/physical) as well as illustrating the metadata management processes & "grey area" relationships between glossaries, taxonomies, hierarchies, business classification schemes, data models, reference data sets etc. All vital stuff, but not necessarily the type of thing you'd expose to a user group! (as I've previously found to my cost...)

    Elsewhere on the blog are some of my thoughts on various related topics - and see my "Tube Map" page for some ideas of how each Information Management discipline relates.


  6. Its really a relevant information regarding data management. Data and information must be properly managed for exclusive business growth.