Data storage options

Data storage options
Data and context
Categories
Data Management
Keywords
No items found.
Author
Tanja Kiellisch
Reading time
5 minutes

Data warehouse vs. data lake vs. data mart

If you want to store large amounts of data, you need to consider not only the location - on-premise or off-premise - but also the form: Data warehouse, data lake and data mart are the three most common data repositories for combining a variety of different sources on one platform. In an interview with our data expert Frederic Bauerfeind, we discuss why and when which storage variant is the right choice and how it can be integrated into the modern data stack.


"If the worm is in here, all analyses will turn rotten."

A data warehouse is a relational database system for analytical queries. Several mostly heterogeneous sources are brought together within this database. All the data that can be retrieved at any time for further processing is "stored" here in a structured manner. A data warehouse can collect and combine data on a very large scale. While they used to be hosted on premise, data warehouses are now predominantly based on cloud technologies. The volume of data required is constantly increasing, meaning that cloud-based modern data warehouses can retrieve or deliver any amount of storage volume without being dependent on servers.

taod: Frederic, what else characterizes modern data warehouses?

Frederic Bauerfeind: They are incredibly flexible and can be managed by any member of the data team using no-code components. Important for analysts: Business Intelligence Platforms are very easy to integrate. This provides direct access to data in order to create reports and dashboards. What's more, they are ideal for managing users and rights. The topic of data governance cannot really be managed properly without a data warehouse.

The data warehouse is already part of the standard equipment of many companies. How do they know that their chosen solution works?

I know three typical pain points of companies that are dissatisfied with their data warehouse. Firstly, too little computing power. Data volumes and analysis requirements have changed rapidly in recent years. Technologies that were aimed at simple Excel spreadsheets can no longer cope with the requirements of new analysis tools. Secondly: complexity. The more data and source systems are acquired and stored, the more confusing the warehouse can become. Thirdly, a lack of data quality. The data sources are not properly integrated and the entire process chain has become so complex that data quality cannot be checked holistically. If one of these problems arises, companies should consider whether their current warehouse is still up to date.

No Modern Data Stack without a Modern Data Warehouse?

Correct. The data warehouse is the source of truth and the single point of truth for all analysts. If the worm is in here, all analyses become rotten. All associated processes are no longer valid and are called into question.

How can companies modernize their warehouse?

Off to the cloud. And then, as always, it depends on the company's specific requirements as to which warehouse is suitable. But it's easy to find out and test.

Is the Data Lake something for people who don't like to tidy up?

Yes, also. But of course it is first and foremost a very practical and fast method of collecting and storing particularly extensive data. Analysts sometimes have much better evaluation options with this raw data than with pre-structured data in a warehouse, as they can choose and combine freely.

What is the so-called data swamp all about?

The masses of data can quickly become so large and confusing that the lake mutates into a swamp and users sink into it. This is the data swamp. This is why a data lake is always a good interim solution, especially for huge amounts of data, but a data warehouse should definitely be connected for further structuring and transformation.

This clearly defines the role of the data lake within the modern data stack.

Anyone who collects an enormous amount of data needs a data lake. Collecting is good in the first place and there are several application scenarios for this raw data. So far, I don't know of any company that doesn't also need structured data and whose analysts don't prefer to work with BI tools. There is therefore no question that data lakes and data warehouses need to work together in a modern technology stack.

Then there is also the term Data Lakehouse. What exactly is that?

Counter question: I will now give you five animal names. Which of the following animal names hides a technology: Elk, Ant, Python, Impala, GNU?

As far as I know, behind every name?

Exactly. Behind each of the animal names are technologies. Many names and terms are simply marketing. And back to our Data Lakehouse: it's old wine in new bottles. New software makes it possible to aggregate data directly from the data lake without having to recopy the data into a warehouse. This can make sense in some scenarios. However, the basic principle of the data lake remains the same.

Where are data marts located within the Modern Data Stack?

Data marts are modeled and provided within the data warehouse. Architecturally, the data mart is located before the business intelligence tools.

Must-have?

Secure. Data can thus be ideally clustered and documented for specific user groups. They then cover a specific subject area. You could even link them with data from other source systems and enrich them, creating hybrid data marts. So the design options are really extensive.

Data warehouse, data lake and data mart are essential for the modern data stack. How do companies manage the configuration of these stack elements?

These three elements are the linchpin of the modern data stack, I say that clearly and without pathos. If you already have an existing infrastructure and want or need to modernize it, you can do it just as well as someone without a solid basic structure: with a detailed analysis of the current situation, the selection and evaluation of possible tools, openness to cloud technologies and motivation.

Rome wasn't built in a day either?

Not Rome. But the Modern Data Stack certainly is.

No items found.
No items found.
Further topics and advice on data and analytics
No items found.
Stay up to date with our monthly newsletter. All new white papers, blog articles and information included.
Subscribe to our newsletter
Company headquarters Cologne

taod Consulting GmbH
Oskar-Jäger-Str. 173, K4
50825 Cologne‍‍‍
Hamburg location

taod Consulting GmbH
Alter Wall 32
20457 Hamburg‍‍‍‍
Stuttgart location

taod Consulting GmbH
Schelmenwasenstraße 37
70567 Stuttgart