Re-inventing Data Management
This is a (slightly updated) version of a paper which has been presented in various forms since the late 1990's. In particular, it was the subject of keynote addresses at the North American, European and Australian Data Management Association Conferences.
A Well-Recognised Problem
It has never been particularly difficult to sell the case for better data management.
Virtually every medium and large organisation - and quite a few small ones - can identify examples of:
Multiple copies of data with attendant storage, capture, maintenance and translation costs
Poor data quality, often as a result of holding uncoordinated copies
Difficulty in pulling data together for operational purposes or to meet management and executive information requirements
Re-work, as the same database structures and associated maintenance routines are designed over and over again
In many cases, individual instances of these problems are recognised as significant sources of costs, inefficiencies or embarrassment to the organisation.
As a result, data management has consistently appeared near the top of surveys of key issues for MIS managers, and has attracted the attention of senior business managers. Telstra's chief executive publicly identified data integration as a key issue shortly after taking up his position. Nor have the academics been silent. In his much cited paper on stages of IT maturity, Nolan identified “Data Administration” as the fifth and penultimate stage.
In short, data management is an important issue for most organisations, and is recognised as such.
Data Administration - A History of Failure
Attempts to improve the management of data have largely followed a well-established pattern - what could be called the data administration approach, although the words data management , information management and information resource management are also widely use. It is characterised by:
A strong policy statement, generally from the IT department, perhaps signed off by general management, to the effect that “data is one of our most important resources”.
Establishment of a team, again usually within the IT department, with a grand title and objectives: “to manage our data as a valuable corporate resource”. The most common skill within the team is likely to be data modeling.
Development of a corporate data model.
Establishment of a corporate data dictionary.
Implementation of standards and procedures which require that new applications conform to the corporate data model and are documented in the dictionary.
There is nothing wrong with adopting a common approach - as long as it works. Unfortunately, the evidence is overwhelmingly that the data administration approach does not work (see, for example, Goodhue et al ). Shanks'  detailed study of data administration in Australian banks was consistent with these findings.
My own experiences working with and talking to literally hundreds of data administration staff in Australia , North America , Asia and Europe , over a 15 year period, support these findings. While some can cite enough achievements to justify their existence, far fewer can say that they had made real progress towards the delivering on the promises which had encouraged the original investment.
Interestingly, few data administration groups seem to have put in place performance measures related to their stated goals and surprisingly little has been written on the subject (but see Moody & Simsion ).
One of the few success stories does more to illustrate the difficulties than to encourage optimism. A medium sized organisation succeeded in having all of its data - some 20,000 items in 2,000 tables across 20 - conform to the corporate data model. There was negligible duplication, none of it uncontrolled, and everything was properly documented in the corporate dictionary. But, they had a few things in their favour:
They started with a “clean slate”, with a mandate to completely replace legacy systems
The data administration approach had the support of top management, the IT manager, and the two key project managers, both of whom had backgrounds as data management specialists
There were clear and obvious benefits in integrating the core data in the new applications
In-house development was chosen over packaged solutions
The database administration team reported to a strong and Data Administrator who had right of veto over any non-conforming or inadequately documented designs
The development environment required that database schemas be generated from the data dictionary.
Few organisations provide such an accommodating set of circumstances. And there is a footnote to the story. The organisation decided to outsource their IT function, and executive management was initially not convinced that the Data Administration group was worth retaining. In the end they were persuaded to keep the dictionary and the model - but it was hardly an overwhelming victory.
The most commonly cited reason for the failure of data administration is lack of management support. But this lack of support is of a particular kind: an unwillingness to impose the burden of conformity with data management standards on recalcitrant projects. I call this obstacle the unholy alliance (fig 1) - unholy in the sense that it does not recognise the holistic perspective.
Fig 1: The Unholy Alliance - User and Project Manager Conspire to Avoid Outside “Interference”
It is in neither the project manager's nor the user-client's immediate interest to cooperate for the longer term benefit of others - for example by designing their data structures to be readily useable by others or by providing documentation beyond that needed locally. Chances are that neither of them will be rewarded for doing so, and that both will be supported by their managers in their narrow view. The unholy alliance is equally an obstacle for applications planners, and is such a part of the way that organisations manage IT investments that we cannot expect it to go quietly.
Packaged software, with its pre-defined data structures, independent coding schemes and self-contained documentation is seldom well catered for by traditional data administration approaches, and data administrators frequently find themselves arguing for in-house development. The reality is that organisations are increasingly adopting a “buy not build” philosophy and assigning more weight to the advantages of packaged software than to the data management problems which it may generate.
What is surprising is the continued persistence with the data administration approach in the face of such strong evidence that it does not work. It is not uncommon to see organisations making their second or third attempt at data administration, having seemingly learnt little or nothing from past failures. One government department claimed to be making their seventh try. It is hard not to be reminded of those Marxists who argued for decades that “it just hasn't been given a proper chance”.
2003 Note: Two important developments since this paper was originally written about five years ago.
The “persistence with the data management approach” has wavered - many organisations have given up on formal data administration groups. Nevertheless, data management professionals continue to push the conventional paradigm, not without some success. We are now seeing more emphasis on data quality issues as a focus for “selling” data management.
The IT community has found other solutions to some of the problems traditionally seen as the province of data management groups. In particular, ERP offers a different approach to data integration (buy one big integrated package); data warehouses and search engines offer solutions to the problems of locating and assembling data for management information purposes; CRM products focus on data integration in a key area; data integration (and the associated need for standardisation) has been facilitated by the growing use of XML.
Back to Basics
The failure of one approach to managing data is no reason to reject the problem as unsolvable. Yet few organisations seem to have exercised their imagination and tried to break out of the data administration paradigm.
To do so, it is worth going back to basics, and revisiting the objectives of data management. Fig 2 shows a reasonably typical (and characteristically ambitious) set of objectives.
Data must be:
Figure 2: Typical Data Management Objectives
The fundamental question we need to ask is: What is the best way of achieving these goals? We could add: Who is best placed to do the work? (And what skills are required?).
The second question is an interesting one. Often the most important progress towards the sorts of goals described in Fig 2 comes from outside the data administration area. A project to develop a common customer database, for example, may represent a data management initiative of more value to the organisation than anything that the data administration team delivers.
A Fresh Approach
In the last few years, frustrated by the limited success of the traditional data administration approach, we have helped several clients to go down a quite different path.
The alternative approach rests on several key ideas:
The Pareto Principle (80-20) rule. Rather than try to tackle everything, the data management team focuses on the areas which are causing the most pain or which offer the most benefit for least investment. Typically, the starting point is a meeting of business managers in which they identify the priorities and quantify the benefits, possibly prompted by a list such as that in Figure 2.
Progress through Projects. This is in direct contrast to the usual “progress through policy / procedures / policing” and involves establishing discrete projects with clear objectives and timetables. The culture of most organisations is far more conducive to projects with a limited time-frame than to new standards and procedures which offer pay-backs only in the long term.
Hard Measures. We aim to tie every initiative to its value to the organisation - measurable value wherever possible - and to have that value assessed and owned by the business - not the data management group. It is not enough to announce that “we will reduce data duplication”; we need to tie it back to the business costs of re-keying, interfaces, maintenance and decisions based on incorrect versions.
Opportunistic Approach to Infrastructure. Data documentation and models are developed as they are needed to support specific projects, rather than as a precursor to being able to do anything useful.
In essence this is a tactical or guerilla approach to data management, directed at tackling today's most pressing problems using whatever techniques will achieve the objective. It is in direct contrast to the more traditional approach which seeks long term benefits across the board.
Data Management teams established under these principles behave and are regarded quite differently from their more traditional counterparts. Results come more quickly, and the emphasis on the big issues and on measurement means that the team spends less time defending its right to exist.
Elements of a Tactical Approach
The tactical approach to data management is typified by the current interest in data warehouses. A data warehouse project is a pragmatic short-to-medium term approach to a classic data management objective of making data more readily available to support management information needs. It is likely to have a high level of visibility in the organisation, deliver well-defined results with clear business benefits, and be manageable as a discrete project . It may necessitate the development of data management infrastructure, such as a data dictionary covering the source systems, and a corporate data model as a specification of the data to be held in the warehouse. And it may or may not be initiated and managed by the data management team.
Here are some other examples of traditional data management areas in which the tactical approach has delivered results:
Management Information and Data Mining
Not all management information requirements require the establishment of a data warehouse. In many organisations, there are opportunities to deliver “quick wins” by providing information which may be technically relatively easy to source, but which has not been formally requested in the past. All that is required is to move from the general (“do you have difficulty sourcing management information?”) to the specific (“what exactly do you need?”). The astute data manager will look for issues which are critical to the organisation's goals, management or image and actively seek opportunities to provide supporting information. For example, a data mining project provided hard data which enabled a government agency to respond effectively to criticism of its performance.
Data quality management is emerging as a discipline in its own right, in response to the business costs of incorrect data. Mission statements notwithstanding, data administrators have traditionally focussed more on data structure than content, with responsibility for data quality often falling between systems designers and users.
One of the attractions of data quality as a target for beleaguered data administrators is that its impact is frequently quantifiable and measurable in terms of rework (incorrect addresses), customer complaint handling (incorrect billing), risk (legal action for incorrect advice) or other metrics of clear relevance to the business.
Data quality initiatives usually address either stocks (existing data) or flows (incoming data). Cleaning up data stocks may involve comparison of the accuracy of different databases on an item by item basis, with a view to selecting “best of breed”, whilst problems with data flows may be solved using process re-design techniques. Measurement and statistical skills are frequently required to assess quality and to determine the effect of initiatives.
In many organisations the storage and linking of certain types of data may be a sensitive issue. For example, in the health sector, linking of patient data within hospital, area, state or country or to external data such as police records is likely to be constrained by law, policy and practice - and by assumptions about these! Frequently, policy regulations reflect practices which originated in a manual environment, and they may be unsound, ambiguous, contradictory, or just too difficult to come to grips with. In the absence of clear guidance, systems developers may fail to meet legal requirements, or, in erring on the side of caution, miss opportunities to provide better functionality.
The job of clarifying policy, and even recommending and driving changes, can be (and has been) taken up by data management teams willing to raise their sights high enough.
The grand data management visions of the 1970s and 1980s frequently called for the progressive implementation of “subject area databases” as components of an ultimate corporate database. Packaged software, short term requirements and the unholy alliance contributed to the failure of most such ambitious plans. Nevertheless, there is scope for data management teams to manage or promote the development of “reference databases” containing relatively small volumes of widely used data - for example, common codes, the organisation chart, product list, delivery outlets, policy and procedures. Such databases can provide a relatively high pay-off in data sharing, reduced design costs, and standardisation (especially of data widely used in reporting) for a small investment and minimal perceived interference in the unholy alliance.
Electronic commerce and electronic data interchange in general have greatly increased the importance of data standards. The argument is no longer about consistency across internal systems; it is now about being able to communicate with trading partners, using standards external to the organisation. The data management team is ideally placed to promote and advise on the use of standards, and to offer strong reasons why they should be adhered to.
Data Modelling Consultancy
One area in which data administration teams have usually made a positive impact is in the provision of expertise in data modelling to applications development teams. Although such consultancy may be disguised as “checking adherence to the corporate data model”, it is often the only expert input which the project team receives in this critical area. Even package implementation teams may be in a position to define coding schemes and keys - and without informed data modelling input, they may well make decisions which seriously restrict the ultimate flexibility and integration of the package. It is worth explicitly recognising this service role, independent of the less popular role of policing conformity with the corporate data model.
Putting it into Practice
This paper has presented what might seem an obvious argument: that those charged with better managing data should clarify their objectives then creatively seek ways of achieving them in a cost-effective manner, balancing short and long-term initiatives. Unfortunately, much data management practice is no more than variations on a fundamentally rigid theme. Those seeking to move forward from such a paradigm could begin by asking some fundamental questions and challenging a few sacred cows:
What are the most pressing and important data management issues in our organisation?
What are the most cost-effective ways of addressing them?
Is there adequate return and support to justify proceeding?
What skills do we need to achieve our goals?
Do we need a corporate data model at all? If so, when? If so, what scope and depth are needed for our purposes?
Where should we start?
Experiences with the tactical approach have so far been positive. By concentrating on key issues, visibility and management buy-in are significantly improved. The short-to-medium term focus encourages stakeholders to remain interested and engaged, and visible, measurable results mean that the team is not constantly trying to justify its existence.
Teams operating under this model do need a wider range of skills than are traditionally sought - in particular, data modelling is only one of a broader range of analysis and problem solving tools required. And the emphasis on delivering clear business benefit not only requires an understanding of the business, but support from line management (often the IS manager) to venture out into the world beyond information systems management.
Goodhue, D.L., Kirsch, L.J., and Wybo, M.D: The Impact of Data Integration on the Costs and Benefits of Information Systems, MIS Quarterly , Sept, 1992.
Moody, D.L. and Simsion, G.C: Justifying Investment in Information Resource Management. Australian Journal of Information Systems , Sept 1995.Shanks G: Building and Using Corporate Data Models - An Empirical Study. 1 st Annual Australasian DAMA Conference , Melbourne December, 1996.