Data Lakes on the Cloud
- Principle of Least Privilege
As previously noted, the principle of least privilege is the most important practice to maintain in a Data Lake. This principle ensures that data can only be accessed by those who need access to it. This stops everyone in an organization from having access to all the information in a Data Lake, such as Personally Identifiable Information, or PII.
Many organizations divide their Data Lake information into different zones, to make granting access and permissions much easier. Organizations will usually form four zones which are called the temporal, raw, trusted, and refined zones. The temporal zone holds temporary data that does not require long term storage. The raw zone holds data that is sensitive and unencrypted, before it has been processed and is secure. The trusted zone holds data that has been deemed secure and is ready to be used in applications. Anyone needing processed data, such as end users, will find it in the trusted zone of the Data Lake. The final zone, the refined zone, holds data that has been run through other applications and returned here as a final output.
- Data Encryption
One important step to securing Data Lakes is the use of data encryption. By following compliance guidelines, such as the Federal Information Processing Standards (FIPS), the most advanced encryption algorithms can be selected for your Data Lake.
- SIEM Tool Use
Security Information and Event Management (SIEM) tools and software work to detect threats, ensure compliance, and manage any other security issues in an organization’s Data Lake. These tools assist companies with providing the highest level of Data Lake Protection possible by finding threats within an IT infrastructure before those threats can compromise data.