As businesses continually move their services online to services, like Google Cloud Platform (GCP) and Amazon Web Services (AWS), the need to protect this data grows as well. This solution is referred to as Data Lake Protection. A Data Lake is one place where all of an organization’s data is stored at the same time. Storing all information in one place has a multitude of advantages. Data from different teams can be accessed all in one place and analyzed to correlate data and create better strategies. IT infrastructure also becomes much simpler to manage, as there is only one location where all data is stored. Processes from analysis to auditing are made much more streamlined as well.
Data Lakes make the majority of processes and tasks much simpler, but this means Data Lake Protection is the number one priority of organizations using these Data Lakes. The main desire when creating a secure Data Lake Protection plan is to limit access to data to only those who need access. This is called the principle of Least Privilege. The way Least Privilege works is that access to one portion of the organization is disabled for those who do not need access to it. For example, if Gary in sales wants to access the human resource records on Roger, he cannot due to the policies in place only allowing him access to sales, and sales-related, data. One way many companies create a Data Lake is by migrating their data to the cloud.
Data Lakes on the Cloud
Cloud Service Providers (CSPs) like GCP, AWS, and Microsoft Azure provide an easy and inexpensive way of creating a Data Lake for any organization’s data. By migrating IT infrastructure, like databases, from on-premises to the cloud, a Data Lake is formed. Cloud Data Lakes are becoming more and more common on the cloud, as CSPs provide a variety of helpful tools to analyze and secure data. Encryption management can be left to the CSP, or the user can control it with Hardware Security Modules, encryption key management, and Google Cloud Functions.
To begin the process of protecting an organization’s Data Lake, there are best practices one should follow. These best practices are:
- Principle of Least PrivilegeAs previously noted, the principle of least privilege is the most important practice to maintain in a Data Lake. This principle ensures that data can only be accessed by those who need access to it. This stops everyone in an organization from having access to all the information in a Data Lake, such as Personally Identifiable Information, or PII.
- ZoningMany organizations divide their Data Lake information into different zones, to make granting access and permissions much easier. Organizations will usually form four zones which are called the temporal, raw, trusted, and refined zones. The temporal zone holds temporary data that does not require long term storage. The raw zone holds data that is sensitive and unencrypted, before it has been processed and is secure. The trusted zone holds data that has been deemed secure and is ready to be used in applications. Anyone needing processed data, such as end users, will find it in the trusted zone of the Data Lake. The final zone, the refined zone, holds data that has been run through other applications and returned here as a final output.
- Data EncryptionOne important step to securing Data Lakes is the use of data encryption. By following compliance guidelines, such as the Federal Information Processing Standards (FIPS), the most advanced encryption algorithms can be selected for your Data Lake.
- SIEM Tool UseSecurity Information and Event Management (SIEM) tools and software work to detect threats, ensure compliance, and manage any other security issues in an organization’s Data Lake. These tools assist companies with providing the highest level of Data Lake Protection possible by finding threats within an IT infrastructure before those threats can compromise data.
EC Data Lakes
A great way to begin protecting your organization’s Data Lake is by utilizing Encryption Consulting’s training sessions. At Encryption Consulting, we offer a variety of training services, including learning to use AWS’ Data Protection Service, GCP’s Key Management Services, and Microsoft Azure’s Key Vault. We can also help install and configure Hardware Security Modules to protect your data encryption keys. Our Cloud Utility Functions, Cloud Data Protector and Bucket Protector, were created specifically with Cloud Data Lake Protection in mind. Cloud Data Protector encrypts on-premises data before it is sent to Google Cloud Platform. Bucket Protector works within Google Cloud Platform itself, to encrypt data as it is uploaded into buckets. Encryption advisory services and enterprise encryption platform implementation services are also offered.
As you can see, Data Lakes provide organizations with a multitude of benefits, from making processes simpler to cutting back on infrastructure costs. By creating zones, using SIEM tools and software, and following the principle of least privilege, your data lake will stay secure from any attempted compromise. To learn more about the services Encryption Consulting can offer you, visit our website: www.encryptionconsulting.com .