Why estimating Data Quality profiling isn't just guess-work
Data Management lore would have us believe
that estimating the amount of work involved in Data Quality analysis is a bit
of a “Dark Art,” and to get a close enough approximation for quoting purposes
requires much scrying, haruspicy and
wet-finger-waving, as well as plenty of general wailing and gnashing of teeth.
(Those of you with a background in Project Management could probably argue that
any type of work estimation is just as problematic, and that in any event work
will expand to more than fill the time available…).
However, you may no
longer need to call on the services of Severus Snape or Mystic Meg to get a workable
estimate for data quality profiling. My colleague from QFire Software, Neil Currie, recently put me onto a post by David Loshin on SearchDataManagement.com, which proposes a more
structured and rational approach to estimating data quality work effort.
At first glance, the
overall methodology that David proposes is reasonable in terms of estimating
effort for a pure profiling exercise - at least in principle. (It's analogous
to similar "bottom/up" calculations that I've used in the past to
estimate ETL development on a job-by-job basis, or creation of standards
Business Intelligence reports on a report-by-report basis).
I would observe that David’s approach is predicated on the (big and probably optimistic) assumption that we're only doing the profiling step. The follow-on stages of analysis, remediation and prevention are excluded – and in my experience, that's where the real work most often lies! There is also the assumption that a pre-existing checklist of assessment criteria exists – and developing the library of quality check criteria can be a significant exercise in its own right.
- 10mins: for each "Simple" item (standard format, no applied business rules, fewer that 100 member records)
- 30 mins: for each "Medium" complexity item (unusual formats, some embedded business logic, data sets up to 1000 member records)
- 60 mins: for any "Hard" high-complexity items (significant, complex business logic, data sets over 1000 member records)
How much socialisation? That depends on the number of stakeholders, and their nature. As a rule-of-thumb, I'd suggest the following:
- Two hours of preparation per workshop (If the stakeholder group is "tame". Double it if there are participants who are negatively inclined).
- One hour face-time per workshop (Double it for "negatives")
- One hour post-workshop write-up time per workshop
- One workshop per 10 stakeholders.
- Two days to prepare any final papers and recommendations, and present to the Steering Group/Project Board.
Detailed root-cause analysis (Validate), remediation (Protect) and ongoing evaluation (Monitor) stages are a whole other ball-game.
Alternatively, just stick with the crystal balls and goats - you might not even need to kill the goat anymore…
No comments:
Post a Comment