I really enjoyed Leena Rao's recent article on TechCrunch, "Why We Need to Kill "Big Data"", not least because it reflected pretty much how I've been feeling for a long time about the lazy, misleading use of jargon and never-ending hype cycle as the tech vendors jump upon the latest bandwagon. (Pick one, any one, because they're all at it. Yes, I'm looking at you, IBM, Oracle, Microsoft, Teradata, SAP...)
Leena's perspective also provides a healthly counterpoint to the views currently being positioned by IDC (as reported in the Data Informed article "Now is the time to buy into Big Data"), which seems to offer a very tech-centric point of view and plays right into the hype cycle without actually saying anything.
We've been here before - Decision Support, OLAP, Management Information Systems, Performance Measurement, Business Intelligence, Master Data Management - all are catch-all term that actually had little if any real meaning in and of themselves. "Big Data" is just the latest entry in the Buzzword Bingo Lexicon.
In the twenty-odd years I've been involved in business solutions delivery and management consulting, it seems that *insert tool of choice* is promoted as the "next big thing to solve all your problems", without giving any thought whatsoever to what the actual problem, scenario or challenge actually might be. The technology is almost always the totally wrong entry point for the conversation, because technology only becomes of relevance when it is applied to solving a particular problem - and depending on the problem at hand, some technologies are more equal than others.
In the Information Management space, almost all problems relate to the capture, exchange, dissemination, sharing, interpretation and acting upon one or more data sets. (Data Governance then sets out to ensure that these tasks can happen in a repeatable, consistent, efficient and effective manner). Which got me thinking. Are there generic categories of "Information Use Case" that we can use to describe various business problem scenarios, so that we can then start to make more informed choices about what sorts of technologies might be appropriate?
Or to put it another way: "what's your point, caller?!"
Anyway, here are some information-related problem situations that I could identify (you might think of others, or take issue with some of mine. I'd be delighted to hear from you if you do, because it means you've been thinking about the business problem, and not about the technology!):
Note that there is likely to be interaction
(or even significant overlap) between these classes of use case. Each
individual class of Use Case should be considered a necessary, but not
sufficient, element of the organisation's Data Governance and Information Management
capability.
Data is of little value if all it does is sit
in the data warehouse. As a result, the presentation layer is of very high
importance.
Most On-Line Analytic Processing (OLAP)
vendors have a front-end presentation layer that allows users to call up
pre-defined reports or create ad hoc reports. The aim is to synthesise large
quantities of raw data into meaningful views that can be acted upon in context.
As such, reporting against structured data
can be viewed as a specific type of authoring process; any reporting output is
likely to be produced and submitted to the more general publishing process.
A number of key considerations need to be
taken into account as part of the reporting capability:
- Number of reports: The higher the number of reports, the more likely it is that purchasing a pre-built vendor solution is the right approach. Reporting tools typically make creating new reports easier (by offering re-usable components) and also provide report management systems to make maintenance and support functions easier.
- Desired Report Distribution Mode(s): reports will only be distributed in a single mode (for example, email only, or over the browser only), or will users access the reports through a variety of different channels?
- Ad Hoc Report Creation: in most environments, it is expected that end- users will be able to create their own ad hoc reports. Ad hoc report creation necessarily relies on a strong metadata layer and shared understanding of what the information presented in the report is communicating.
- Data source connection capabilities: in most modern environments, users will need to access data sources using both relational database and OLAP multidimensional data technologies.
- Scheduling and distribution capabilities: in a realistic data usage scenario, senior executives will only have time to come in on Monday morning and look at the most important information from the previous week. To meet this need, the reporting tool must have scheduling and distribution capabilities. Weekly reports are scheduled to run on Monday morning, and the resulting reports are distributed to the senior executives either by email or web publishing.
- Security Features: reporting tools are geared towards a number of users in different Business Units and teams, with different priorities and responsibilities. Therefore, ensuring that people see only what they are supposed to see is important. Most reporting tools have capabilities to manage security at different levels, including at the report level, folder level, column level, row level, or even individual cell level. Furthermore, they have a security layer that can interact with the common corporate login protocols and "single sign-on" policies.
- Export capabilities: data export is commonly required for Excel, flat file, and PDF formats. It may also be desirable and time-saving to export the reporting format as well as the data itself.
- Integration with the Microsoft Office environment: It is likely that reporting information will need to be incorporated into documents created with Microsoft Office products, especially Excel, both for manipulating data and for publishing. Some reporting tools now offer a Microsoft Office-like editing environment for users, so all formatting can be done within the reporting tool itself, with no need to export the report into Excel.
Strategic Intelligence and Data Mining
Data mining is the process of discovering new
patterns and inferences in large data sets, involving a range of methods and
techniques such as artificial intelligence, machine learning, statistics and
database systems
The goal of data mining is to extract
knowledge from a data set in a human-understandable structure and may involve a
complex process of database and data management, data pre-processing, model and
inference considerations, interestingness metrics, complexity considerations,
post-processing of found structure, visualisation and online updating.
It is likely that a risk-based approach will
need to incorporate information processing and data analytic features
including:
- Anomaly detection: Identification of outliers, changes and deviations in the data records that might be interesting or data errors and require further investigation.
- Association rule learning: searching for relationships and dependencies between variables.
- Clustering: discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.
- Classification: identifying and applying a known, generalised structure or categorisation to new data. (For example, an email program might attempt to classify an email as legitimate or spam.)
- Regression: discovery of an approximation function that models the data with the least error.
- Summarisation: to provide a more compact representation of a data set, including visualisation and report generation.
At a minimum, content has to be written and
it has to be posted. Between those two steps, it is usually checked for its
writing quality and correctness. Legal and compliance may need to review it. Ideally,
any publishable material will be reviewed by a high-level editor or editorial
board to make sure that it is consistent in style and fact with other
information already in the published domain.
As with the searching process, the authoring
process will need to be context-aware to ensure that information is defined and
used appropriately, within both the context within which it was created, and
the context of any intended (or unintended) usage.
The capacity to provide timely, compelling
and concise advice to inform senior decision makers and executives is a vital
capability for any organisation.
The Executive Briefing process therefore
requires departments and business units to be able to locate,
collate, and interpret the available information, such that the context and
rationale for any decision can be supported and substantiated.
In simple terms, education provides a
knowledge base that underpins any other activities the individual may engage
in at a later stage. Training is not as general and tends to
concentrate on skills development for the purposes of a specific skill or task.
Learning tends to be associated with the self-developed of the individual.
Capability for education, training and
learning of staff is a key aspect of service improvement in the University. In
support of this, organisations need to provide an information sharing
capability that enables all staff to access the process, policy and knowledge
resources pertinent to their role.
Many information users will be interested in
finding material that has been authored by someone else in the organisation.
Assuming this content has been made available
for others to access, the capability for finding, retrieving and accessing the
required material may be many and varied, depending on a number of factors
including: the nature of the content medium; the physical locations of the
originator and consumer; the mechanisms available; other content that the
consumer may wish to combine.
The nature of information content will also
be dependent upon both the context within which it was created, and the context
of the intended usage. Any search and retrieval process will need to be
context-aware to ensure that information is used appropriately.
A technology-enabled approach to content
search and retrieval will become increasingly important. However, it is also
important to give due consideration to the governance authorities and control
processes that define what content is to be made available.
Is should also be noted that any content
search capability does not stand alone and needs to be fully integrated with
content authoring and publication processes and systems. As such, the search
process is likely to be implemented as part of integrating Records Management,
Document Management and Knowledge Management solutions.
Records Management is the practice of
maintaining the records of an organisation from the time they are created up to
their eventual disposal. This may include classifying, storing, securing, and
destruction (or in some cases, archival preservation) of records. A record can
be either a tangible object or digital information, such as office
documents, databases, application data, and e-mail.
The ISO 15489-1: 2001 Standard defines Records Management as "[the] field of management responsible for the efficient and systematic control of the creation, receipt, maintenance, use and disposition of records, including the processes for capturing and maintaining evidence of and information about business activities and transactions in the form of records". The standard defines “records” as "information created, received, and maintained as evidence and information by an organisation or person, in pursuance of legal obligations or in the transaction of business"
Records Management is primarily concerned
with the evidence of an organisation's activities, and is usually applied
according to the value of the records rather than their physical format. While
there are many purposes of and benefits to records management, as both these
definitions highlight, a key feature of records is their ability to serve as
evidence of an event. Proper records management can help preserve this feature
of records.
Many jurisdictions now make legislative provision based on the principle that government information is a public
resource to be managed in the public interest. Such instruments give citizens the
right to make requests to access Government documents. Similarly, where personal information is retained by an Agency, the individual has the right to request access to those records.
This comment has been removed by the author.
ReplyDeleteIf your point is that use-case awareness needs to precede choice of technology, then it's a point worth making. A key factor you point out that industry will continue to stumble over is the "context awareness" problem. Nice overview article, Alan.
ReplyDeletehttp://datareality.blogspot.com
Thanks for your kind words Mitch.
ReplyDeleteI was discussing the issues of Information Governance with a colleague earlier this week (who has come to be responsible for her organisation's Information Management from legal and risk background), and she was bemoaning the generally poor level of engagement from IT. Understanding of the business context and purpose must always be the starting point.