Data Lakes on the Cloud
Cloud Service Providers (CSPs) like GCP, AWS, and Microsoft Azure, provide an easy and inexpensive way of creating a Data Lake for any organization’s data. By migrating IT infrastructure, like databases, from on-premises to the cloud, a Data Lake is formed. Cloud Data Lakes are becoming more and more common on the cloud, as CSPs provide a variety of helpful tools to analyze and secure data. Encryption management can be left to the CSP, or the user can control it with Hardware Security Modules, encryption key management, and Google Cloud Functions.
- Principle of Least Privilege
As previously noted, the principle of least privilege is the most important practice to maintain in a Data Lake. This principle ensures that data can only be accessed by those who need access to it. This stops everyone in an organization from having access to all the information in a Data Lake, such as Personally Identifiable Information, or PII.
Many organizations divide their Data Lake information into different zones, to make granting access and permissions much easier. Organizations will usually form four zones which are called the temporal, raw, trusted, and refined zones. The temporal zone holds temporary data that does not require long term storage. The raw zone holds data that is sensitive and unencrypted, before it has been processed and is secure. The trusted zone holds data that has been deemed secure and is ready to be used in applications. Anyone needing processed data, such as end users, will find it in the trusted zone of the Data Lake. The final zone, the refined zone, holds data that has been run through other applications and returned here as a final output.
- Data Encryption
One important step to securing Data Lakes is the use of data encryption. By following compliance guidelines, such as the Federal Information Processing Standards (FIPS), the most advanced encryption algorithms can be selected for your Data Lake.
- SIEM Tool Use
Security Information and Event Management (SIEM) tools and software work to detect threats, ensure compliance, and manage any other security issues in an organization’s Data Lake. These tools assist companies with providing the highest level of Data Lake Protection possible by finding threats within an IT infrastructure before those threats can compromise data.
EC Data Lakes
A great way to begin protecting your organization’s Data Lake is by utilizing Encryption Consulting’s training sessions. At Encryption Consulting, we offer a variety of training services, including learning to use AWS’ Data Protection Service, GCP’s Key Management Services, and Microsoft Azure’s Key Vault. We can also help install and configure Hardware Security Modules to protect your data encryption keys. Our Cloud Utility Functions, Cloud Data Protector and Bucket Protector, were created specifically with Cloud Data Lake Protection in mind. Cloud Data Protector encrypts on-premises data before it is sent to Google Cloud Platform. Bucket Protector works within Google Cloud Platform itself, to encrypt data as it is uploaded into buckets. Encryption advisory services and enterprise encryption platform implementation services are also offered.