Data Migration Best Practices for Reducing Infrastructure Costs
Steve Winkler | September 29, 2016
The potential costs of data migration generally fall into several categories, including commercial off the shelf (COTS) tools, server infrastructure, software development, and consultation with technology partners. And all that’s on top of the costs associated with acquiring and deploying the new operating platform that you’re migrating to.
This can be a frustrating and expensive realization. You’ve probably already spent a significant amount of your budget on the new platform itself and its dependent infrastructure. The last thing you want to do is spend additional resources on Extract, Transfer and Load (ETL) infrastructure for the data migration process — unless you absolutely have to.
What’s more, some of these infrastructure products could cost you hundreds of thousands of dollars (or more), depending on how many servers you need to deploy them on — and yet you may only need to use them in production for a few days (or hours even) during the migration process. So it makes sense to consider using data migration best practices to control your related infrastructure costs.
Consider your options — and their costs
Fortunately, there are a variety of low-cost (or even no-cost) tools that you can use to conduct the ETL process, or its alternative, Extract, Load and Transfer (ELT).
When considering tools, start by reviewing the ones that you have already purchased as part of your new data platform. Many database operating environments — whether Microsoft, Oracle, IBM, or others — include embedded tools that help facilitate the process of querying, bulk loading, importing, or exporting data. More often than you might think, these embedded tools will be sufficient for your needs.
In other cases, you might need to acquire expanded capabilities. One option for doing so would be to pay for developers to write code to handle any specialized ETL/ELT needs during the migration process, and combine them with embedded database tools. I’ve found that common scripting tools (Unix Shell, Perl, Python, and Microsoft PowerShell) can be enormously useful for this and are very easy to develop with. Another option is to consider emerging open source products which can be leveraged for the most common ETL/ELT patterns required for data migration. And of course, there are multiple COTS options that are industry-proven for helping organizations meet their data migration needs (although they come at a cost).
The underlying strategy
Data migration best practices suggest that you should first determine whether your embedded infrastructure tools are sufficient for the job. If so, that’s great — and you can move ahead.
If not, you’ll want to do a cost/benefit analysis of your other options — paying for custom development, buying COTS products, or developing with open source solutions. Expect that some code development is going to be required, no matter which option you use. It’s rarely a “drag-and-drop” process as some tools promise, and if it is, you are almost always able to do the same thing with embedded tools.
Be prepared to spend time optimizing your ETL approaches. ETL tools (both commercial and open source) can distract data migration developers from the critical need to understand the details of how information is being pulled from data sources and pushed into targets. Remember that every efficiency realized saves your organization critical migration time or server costs. Tool vendors have effective solutions for splitting your migration up into parallel parts to save time, but each of those parts can incur additional server and/or software licensing costs.
Look to the cloud?
Another possible solution is the use of cloud infrastructure — but there are pros and cons to doing so. If you’re only looking for server infrastructure to use while you migrate for a specific modernization effort, acquiring and leveraging cloud infrastructure could provide you with a measure of speed and flexibility that could be very useful. Cloud environment are sometimes very useful for development and test environments, since they can be implemented very quickly and decommissioned when development and testing are complete.
The downside of moving large volumes of data into the cloud, however, is that in some cases, you will still have to deal with some fairly significant limitations in available network bandwidth. If you have to move significant amounts of information over your network in a reasonable amount of time, using cloud infrastructure may not be optimal.
Infrastructure as a resource
At the end of the day, infrastructure is just another resource to be managed in the modernization program. Data migration best practices emphasize leveraging whatever you have, and optimizing wherever you can. Then, and only then, buy more of what you need to get the job done.