Big Data refers to collecting large volumes of data, giving us greater insight into our data which can be used to drive better business decisions and greater customer satisfaction. Securing Big data is difficult not just because of the large amount of data it is handling, but also because of the continuous streaming of data, multiple types of data and cloud-based data storage.
Some of the major challenges in securing Big data are:
- Secure Computations:Big data technologies use distributed programming frameworks to process large amounts of data. These distributed frameworks like MapReduce don’t have good security protections. In MapReduce, the data is split, then processed by a mapper and allocated storage. If someone can change the mapper settings as it doesn’t have any additional security layer, it can manipulate the data being processed. Also, it is very difficult to detect these untrusted mappers. It is very important to secure the computations being handled in these distributed programming frameworks so as to ensure that the integrity of data is maintained.
- Protecting Data and Transaction logs:Due to the size of data and transaction logs, these are stored in multi-tiered storage environments with auto-tiering functionality. Auto-tiering does not keep track of the data location. Auto-tiering systems can expose new vulnerabilities because of unknown physical data locations, untrusted storage devices which can result in organizations losing control over data. Data transmission between tiers can also provide information regarding user activities and data properties which can be used by attackers. Data and transaction logs need to be protected to maintain the confidentiality, integrity, and availability of data.
- Validation of Inputs from Endpoints: Big data collects data from a variety of input devices including endpoints. It may be collecting logs from a large number of devices and applications. The data which Big data is receiving might contain rogue data being sent by an untrusted endpoint. This can affect the organization’s analytical outputs. A challenge here is to validate all the inputs the Big data is receiving to ensure that it came from a trusted source.
- Secure Non-Relational Data Stores: Non-Relational data stores like NoSQL are rapidly being used in Big data technologies. These data stores are not mature and secure enough, as of today. They have many security issues like no encryption support for the data files, weak authentication between client and server, data at rest is unencrypted which can cause privacy threats.
- Privacy-preserving data analytics:Privacy is an important issue in applying Big data technologies for analytics. As more and more data is being collected, this data aggregation along with data analytics could result in user privacy violation. If the data analytics is outsourced, an untrusted third party employee can infer personal information of users. The organizations want to use Big data analytics tools to enhance customer satisfaction, but they need to ensure protecting user privacy while doing so.
- Access control:Big data handles a variety of data including sensitive data such as Personally Identifiable Information of users. There are many legal and compliance requirements to protect those data. Granular access control policies should be implemented so that only authorized users to have access to sensitive user data and analytics done on those data sets. This is needed to ensure the confidentiality of data.
- Real-time security monitoring:Real-time security monitoring is needed for Big data infrastructure and the analytics it is handling. It has always been a difficult task because of the number of alerts generated by devices. These alerts have a large number of false positives as well. Due to this reason, companies often struggle to monitor real-time data.
Encryption can help in handling data protection in Big data technologies at multiple stages to ensure confidentiality, integrity, and availability of data is maintained. If you want to know more about it, watch out for our next blog on Data Protection in Big data using encryption.