Data Storage

Data Lake

Definition updated April 2026

What is a data lake?

A data lake is a storage repository that holds large volumes of raw data in its native format - structured, semi-structured, and unstructured - until it is needed for analysis. Unlike a data warehouse, which requires an upfront schema, a data lake stores data as-is and applies schema at query time (schema-on-read).

Data lakes are often the first landing zone for data from APIs and other sources before transformation into a warehouse-ready format. The flexibility to store raw data means nothing is discarded - you can derive new structures later as requirements evolve.

The practical challenge with data lakes is governance. Without good cataloging and metadata management, a data lake becomes a 'data swamp' - data is present but nobody knows what is in it, where it came from, or whether it is trustworthy. Pairing a data lake with a data catalog is best practice.

Related Terms

Data Warehouse ETL Big Data

Ready to work with live data?

HappyEndpoint APIs deliver real-world data from leading platforms - no scraping, no stale snapshots.

Browse Datasets