Tips & Tricks: VP Engineering Frank Orozco on the Five Best Practices for Building a Data Warehouse

Edgecast
4 min readMay 18, 2017

Ever tried to cook in the kitchen of a vacation rental? The spatulas are over there, the knives are somewhere else, and the cheese grater apparently doesn’t exist. No one person can figure out a holistic view of the situation. That was us at Verizon Digital Media Services, before our Master Data Management (MDM) system came along. We had a solution for everything, but the problem was that we had too many solutions.

We had Netsuite for customers’ billing and invoice and usage details. We had a proprietary system that managed customer configurations. We had systems for customer complaint cases; we had systems that only Sales knew how to use.

What we needed was a master system — one that would show every bit of information about every customer, all in one place. So we decided to build a Data Management Strategy, aggregating our best practices as we went along, helping us move toward our ultimate goal: functional, coherent data, all in one place.

1. Give yourself a time limit
Before we started out, we gave ourselves an ambitious requirement: to deliver value to our stakeholders as soon as possible. This was tough because we didn’t have anyone working full-time on the product. But we told the stakeholders that we’d have something for them in three months: a system that could aggregate some of our Transactional Data by customer name and date.

This approach fit nicely with Edwin Locke’s revered Goal-Setting Theory: that specific and challenging goals lead to far better results than easy, vague goals. Instead of giving ourselves a Sisyphean task, or simply lobbing a softball at our stakeholders (“We’ll try to get you something generally useful within the next 18 months!”), we kept things small and precise.

2. Opt for immediate value
To provide our stakeholders with value and to buy ourselves time to further implement our data strategy down the line, we built a system that would be immediately useful for everyone: a Federated Data Warehouse that gave everyone access to descriptive analytics only.

We had some existing APIs and borrowed a few excellent engineers to build others. With this, we were able to put together a straightforward interface that provided the enterprise with the ability to see “what happened” so to speak. This approach enabled us to adhere to the agile best practice of delivering working software regularly. Any user of this warehouse could access simple aggregations and historical information simply by plugging in the appropriate customer name. The value here was immediate and benefited everyone.

3. Start with the customer
If you’re familiar with the world of Data Management, you know that a standard “best practice” in MDM efforts is to not start with the customer. In fact, we’ve seen a lot of MDM efforts die off because of the complexity of the efforts of starting with the customer.

Before we started working on this project, our customer data was spread out over several different systems and each system could potentially have a different name for the same customer. In other words, “Company Z”, could appear as “Company Z Corp.” in one system and “Company Z LLC” in another. It was madness. If we would have started working with another form of Reference Data, like a Country Code, we wouldn’t be adding value immediately.

So we stuck with one attribute, “Customer Name”, and registered its value in the MDM Hub. We wrote a tool, One Customer View that was a simple dashboard, which gave each customer an ID and then collected and easily displayed that customer’s data from all of our disparate systems. So once you found Company Z on the new system we created, you’d be able to see all of Company Z’s sub-accounts from our previous systems. Again, this was immediately valuable to our company, and we would never have accomplished it if we hadn’t started with the customer.

4. Sometimes, it’s okay to be really inflexible
Everyone knows that the ability to pivot is a key trait for accomplishing just about everything in life. We certainly made sure to be as flexible as possible for most of our processes, but sometimes flexibility was actually the enemy.

While we were building the Federated Data Warehouse, we were also putting together the physical, structural foundations of our end goal: a Data Lake and a MDM Repository. This meant that we need to open various ports and obtain certain permissions to access data outside of standard APIs. And guess what? This process took forever. Verizon puts a premium on the security of its data, in fact, nothing is more valuable. So understandably, getting access to data comes with a lot of procedures and checkpoints. We understood that this would be a cost we had to pay on this project, and we budgeted time for it.

5. Think people and processes, not tables and columns
Data may seem impersonal, but one of the greatest mistakes that can be made in an effort like this is to ignore the very real people who are working with the data on a daily basis. Our stakeholders had been working with this data and the associated business processes for years. The technical leaders of this effort knew that coming in with a superior attitude would have doomed the project.

We appointed a data governance committee, which met bi-weekly. Data stewards were responsible for overseeing and reporting on the data in their groups. While data might look like a bunch of robotic numbers, there’s rarely a quick technological fix for data problems. You need a human approach. And, without an empowered team and regular communication between humans that approach is hard to achieve.

Want to find out more about our MDM system? Get in touch with us today.

Frank Orozco, VP of Engineering

--

--

Edgecast

Formerly Verizon Media Platform, Edgecast enables companies to deliver high performance, secure digital experiences at scale worldwide. https://edgecast.com/