How Metadata Management Can Accelerate Data Migration
Steve Winkler | October 5, 2016
Some of the biggest challenges facing an organization that seeks to migrate its data are the large volumes of data residing in the legacy system, and the data’s inconsistent structure and quality. Typically, the data must be assessed, cleaned, or otherwise transformed before it can be moved. Even more challenging, the older data repositories may have been created using obsolete or proprietary applications. Documentation is incomplete, insufficient or non-existent. To make it even more difficult, the staff member who designed and built the database retired 12 years ago and is living like a monarch in Provence. Anyone who’s worked on more than a couple of modernization programs has been there.
What does my data really look like?
Data migration efforts are inherently risky, frequently due to assumptions that are easy to make. For example, you might assume that your stakeholders and users know and understand all the data in the current platform, and that you can wait until the project’s end to completely familiarize yourself with the data. Neither of these are prudent assumptions. It’s going to take time to understand what your data looks like, so you’d better start early. Don’t ask your stakeholders to tell you about the data; instead, use the metadata you find to acquire the understanding you need, and then use your stakeholders to tell you how to deal with the exceptions. It’s a far better way to leverage your stakeholders’ valuable time and effort. Maximize the information you acquire from metadata to automate data analysis, identify valid information patters, and isolate outliers.
Metadata force multipliers
A good development team is always looking for ways to deliver products more effectively. Development tools frequently provide wizards, templates, and scaffolding capability to speed the production of common solutions. These approaches help each individual developer become more productive by automating repetitive tasks. Data migration problems lend themselves well to such standardized procedures, since data sets are frequently processed in very similar ways during the migration.
The rules may change regarding how some data types are handled, as well as the specific fields and data items represented in each data set. But they still follow a pattern. Once you’ve identified the pattern, using metadata management to comprehensively evaluate metadata on the source environment enables you to start leveraging tools and building new ones to greatly expand developer productivity.
Innovative tools can be created to generate migration code and associated functional tests using available sources such as COBOL copybooks and Oracle metadata. Such tools can empower developers to work much more quickly and shorten the overall development time. In addition, by leveraging generated code for repetitive tasks, one can greatly reduce the risk of programmer error.
A better solution
Everyone in this situation is presented a fundamental choice: work smart or work hard. These sound like equivalent options but unfortunately, we’re talking about internal resources, systems integrators, contractors, consultants, and time and materials task orders (don’t even get me started on change orders). Let’s face it – “work hard” is really expensive. The type of work we are talking about here lends itself to the “work smart” option. By collecting, analyzing, and leveraging metadata — that is, data about the data — project teams can create tools to automate the process of analyzing, extracting, transforming, and loading legacy data into the new platform.
The solution begins with conducting metadata management to determine how the data is organized and structured in the legacy system. This metadata can exist in many formats, and the trick is to find the metadata that will be most effective in streamlining the process. Legacy systems almost always inherently include certain types of metadata that describe the way data is organized and stored. In most cases, this provides sufficient insight (and repeatable patterns) for developers to use in automating repetitive migration activities based on those patterns. The potential sources for metadata are myriad — database schemas, COBOL copybooks, XML schemas — and the list goes on.
For optimal results, it’s essential to have experience and insight working in the environment in which the data was created and stored in, whether it’s a mainframe environment, mid-range systems, UNIX/LINUX, or Windows. In addition, it’s vital to have a solid understanding of the requirements and constraints of the target platform.
Early due diligence for long-term impact
Gathering and analyzing available metadata at the outset of a data migration project can require some additional time and effort. However, once you’ve collected and leveraged metadata through effective metadata management, each developer or analyst can do the work of five or ten. As a result, you can significantly accelerate the migration process, particularly when numerous data sets/tables are being migrated. Using the metadata at hand, automation can be applied to bring far greater consistency to the data during migration to the modernized platform.