Big Data is characterized by a high volumes of data, high velocity of streamed information in real-time and a variety of data sources and formats being stored and processed. Because of these characteristics, the risks and challenges to secure and protect such large-scale dynamic data is more critical and more dramatic in terms of any data breach as opposed to a small-scale static data.
Those challenges are twofolds : first, the protection of the user’s anonymity and each individual’s privacy. Although this issue is more visible when researchers gather data directly from citizens hence affecting the privacy rights of the citizens directly, it is still possible to reveal the actual identity of each user by cross sourcing existing datasets even after masking or removing their actual identity. Hence, these large-scale datasets are more sensitive, regardless of the filtering, cleaning and masking of the data in times of attacks and security breaches.
In addition to data privacy and protecting individual’s identity, it is clear that the issue is more than just anonymity. The second critical issue is protecting the large-scale datasets as well as the computing infrastructure (Big Data ecosystem) from people with ill-intentioned motives. The danger of aggregated facts and results produced by processing and computing a large amount of data is not to be underestimated : intruders may use these results for commercial purposes and targeting large groups of people for unethical purposes. Additionally, individuals or underground groups such as hackers who act illegally can make the results of their attacks available to certain people who represent a threat to the national security and public safety.
While having such large-scale datasets and big data architectures is helping scientists to achieve more in less time and unveil facts that couldn’t be identified with a small amount of data, the risk and devastating consequences of data breaches should not be taken lightly. Any piece of information is worth something to somebody and in the wrong hands of people it will be catastrophic.
In this poster session, I will present “Multivac Platform” defense and security system also known as “6 layers of hell” and explain how each layer could overcome some of the main concerns about Big Data security and privacy. Multivac Platform is one of the biggest academic repositories with over 6 billion documents hosted across 40 servers. These later contain metadata from published scientific papers and social networks with wide range of topics. Multivac platform is meant as an interface between researchers and Big Data, especially in domain of NLP and text mining. It offers services such as comprehensive dashboards that enable scientists to explore and discover facts with a wider overview on large-scale data through visualizations. It also offers API access that allow researchers to exploit this huge architecture and computation without any prior technical knowledge.
Multivac platform aims to reach over 20 billion data by the end of 2017 and accelerate the research by offering services such as text-mining algorithms, machine learning and detection of emergence topics such as climate change, risk, transportation, health, politics, news and media, etc.
With a platform growing in real-time with the rate of more than 15 million additional data every day, privacy and security of the data is a must and a critical mission. The questions that must be underlined are :
- How do you secure a platform with multi billion scientific and personal data ? - How to protect every user’s privacy and personal information ?
- How to protect the Big Data infrastructure and its large-scale datasets from unauthorized intruders with extreme ideologies and prevent them from misusing the results agains the public ?
Six Layers of Hell implemented in Multivac Platform :
- Infrastructure security (physical access to equipment, security cameras, motion detection and sound detection, limiting network access in switches, disposing used devices and used disks properly)
- Hardware disk encryption and software data encryption
- SSL/TLS communications (HTTPS to internet, SSL between devices on the network, point-to-point IPSec)
- Dual firewall (DMZ, global firewall and local firewall with more restrict rules)
- Authentication system (secure API, IP-based protection, token-based authentication system with timelimited access)
- Monitoring and alert system (real-time security monitoring and alert system integrated with IDS/IPS systems, computer agents to monitor the monitoring system to be sure of its health and well functioning)