Privacy engineering and data security has become a top architecture challenge for enterprises. Privacy legislation has been multiplying around the world in the last few years, becoming both more extensive and more complex.
Compliance with local, regional, and international data privacy laws is now a vital business concern, and there’s immense pressure on engineering teams to effectively implement privacy engineering in ways that bake in compliance. They have to consider national and international standards such as NIST SP 800-53 Rev. 5, FedRAMP® Security Controls Baseline, ISO 31700, ISO 27701, and more.
However, privacy engineering is often easier said than done. Engineering teams, particularly at medium and large enterprises, face significant and serious obstacles which they have to overcome in order to meet business compliance requirements and privacy standards.
What is Privacy Engineering?
Privacy engineering encompasses the technical aspects of privacy within the broader field of privacy management. It represents the convergence of technical expertise, legal compliance, and user experience, with privacy engineering teams playing a crucial role in ensuring that privacy considerations are seamlessly integrated into the design and development of products and services. Privacy engineering addresses the complex challenge of creating products and services that not only meet regulations, but also offer intuitive, privacy-preserving user controls.
The primary goal is to embed privacy protections into systems and products from the outset. This aligns with regulations like the EU’s GDPR, which emphasizes the concepts of “Privacy by Design” and “Privacy by Default.” Privacy engineers help translate legal requirements into technical solutions, ensuring that only necessary personal data is processed and that user experiences are both compliant and privacy-conscious.
Privacy engineers act as intermediaries between various teams within an organization, including product, design, IT, security, legal, and compliance teams, bridging the gap between technical implementation and privacy requirements. They are involved in a range of activities, such as code inspections to assess privacy risks, devising strategies for anonymizing personal data, and designing user-friendly privacy controls.
Why is privacy engineering such a challenge?
In an ideal world, every data-based digital service, program, or algorithm would be developed using Privacy by Design principles, so the entire system would be built in a way that simplifies data management and makes it easy to erase or restrict access to specific datasets.
But in the real world, it’s all too common for developers to build solutions according to the needs of the product, often under time pressure. Privacy by Design protocols frequently fall by the wayside.
As a result, privacy engineering typically involves a large amount of work sifting through existing dataflows, attempting to understand data usage, and reverse-engineering compliance with privacy engineering and data subject request (DSR) legislation.
Here are some of the main obstacles in the path to privacy engineering success.
The never-ending task of data mapping for privacy engineers
Privacy engineering typically begins with data mapping, a short phrase that conceals an enormous amount of work. Data mapping involves discovering and assessing all your data flows, and surveying and listing every service, database, storage, type of storage, and type of database that make up your data processes. You need to understand all the relationships between all the moving parts in your possession, and chart them in a way that enables you to refer back to them with ease.
Data mapping and inventorying is vital to understanding what systems, tools, and storage units touch your data. Without a comprehensive and clear inventory, you won’t be able to move on to any further privacy engineering tasks. But it is also tedious and time-consuming, requiring a great deal of energy and focus to overcome these significant challenges to success.
The moving target of live projects
Software never stands still. It’s constantly being improved and developed, which means that more data, more tables, and more connections are added on a continual basis. Data mapping is often carried out manually, with the result that the map is out of date as soon as it’s completed, because of the speed at which live projects expand.
There’s no way that humans can keep up with the pace at which new data and relationships are added to the project. Many of today’s data maps may be “good enough,” but they are not comprehensive. In addition, even if your data map is 100% correct at the moment, the findings become obsolete the day after you’ve created it.
The risk of inaccuracies
So how does this mapping typically take place? Sometimes it requires interviews with developers, generating and presenting vast Excel files listing entities that could include personal or sensitive data.
The developers need to review this file, but they typically respond from memory rather than going back and checking the code. This can potentially lead to misalignment with the actual state of the software.
So not only has the software changed while the mapping was taking place, the mapping may not have been fully accurate in the first place.
Complexity of vast code with multiple layers and connections
In medium to large companies, the sheer volume of code that needs to be checked is a significant challenge. Big corporations are constantly rolling out new digital services or updated versions. Today’s cloud-based, user-friendly machine learning (ML) tools make it easy for developers to deploy new data-based algorithms, systems, and workflows.
Many organizations involve dozens of teams with hundreds of developers, each busy with different projects. All these data-touching projects are built in highly complex environments, using infrastructure like Kubernetes or Dockers, which are tough to explore but easy to deploy. As a result, it’s extremely difficult to monitor, validate, and enforce data privacy best practices, or even to track the path of data through the system.
Legacy code is a mystery to most modern developers
The bigger the company, the greater the likelihood that there’ll be considerable amounts of legacy code lurking in the depths of the organization’s systems. Very few developers properly understand legacy code, so it’s usually highly opaque.
Some employees might know the connections for some of the lines of code, and some sections might have been replaced more recently, but in general there’s very poor visibility into which services are related to which database, which services are sharing data with which other services, and other aspects of legacy code.
The enormity of gap analysis
Data mapping can take months or even years to complete, but it’s only the first in a series of privacy engineering challenges. Once you have an inventory of all data flows, you need to move on to the next tedious and time-consuming job: gap analysis. It’s not unusual for the gap analysis to take just as long as data mapping, and it could be every bit as frustrating to undertake.
Gap analysis involves uncovering holes in the data flow and identifying data privacy issues which you then need to resolve. It requires data engineers to look for details like tables and/or storage that are duplicated, redundant, or no longer in use. It also means seeking out third parties who receive information from your developers, and checking that they are only sent information that they are entitled to.
Reviewing and enforcing data regulations
After succeeding at gap analysis, you still need to remove all unused third parties, tables, storage, etc. However, that’s the relatively easy part. The bigger challenge is to review and enforce all the relevant data regulations, including access controls, encryption for all sensitive data, and some method of de-identification whenever relevant.
Access controls apply to internal users who are authorized to work with some customer data, but are not authorized to view all the data that you store. You’d need to provide access to permitted data while restricting it for sensitive data, often within the same table. The process includes building in auditing to check who has access, when they have access, and very often why they have access.
Encryption goes hand in hand with access controls, because it’s the primary way to protect the data that you are permitted to hold. You need encryption that is strong enough to ensure that these boundaries are respected and that nobody can view or use data without authorization.
Some datasets also need de-identification or anonymization for different users. For example, a hospital holds sensitive patient health data. Doctors need to view full patient details for diagnoses, prescriptions, and effective follow-up. Hospital administrators might need access to personal identifying data to help patients schedule appointments, but not to their health data. Researchers developing cures for health conditions may access anonymized health data that doesn’t link any patient-identifying data to details about reactions to drugs, the onset of disease, or the progression of disorders.
Bear in mind that these review and enforcement activities need to be carried out continuously. It’s not enough to enforce access controls, encryption, and pseudonymization once on a database. It’s crucial to build an automated service that will take the raw data, send it to the right database, and apply the right levels of access, encryption, and anonymization on a continuous basis.
Compliance with DSR and other regulations
Privacy engineering requires compliance with complex, extensive, and frequently-changing regulations around data privacy protections and DSR (Data Subject Rights) requests. It’s not enough to securely store data; you also need to be able to access specific data within the haystack of datasets, and respond to requests from data subjects to opt out from data storage and/or ML profiling, have their data deleted, or receive a copy of their data.
To comply with these regulations, privacy engineering teams need to consider 4 fundamental issues.
1. Data minimization
A live program needs to verify processes that store data for limited periods of time, such as validation data that can be kept for a few hours. For example, a car rental company might collect a customer’s driver’s license details for verification purposes, but they need to remove it from the database after verification is complete.
This should be built into the solution during development, but it often isn’t. With so many live projects and so many developers, it’s hard to identify all the places where user data is collected, and verify if there’s a built-in opt-out for each table and process.
2. Accuracy
All the data that you collect and store, even for limited periods of time, has to be accurate. That often presents more problems for long-term data storage than for data that’s kept for only a short amount of time, because details can change. For example, people might change their address every few months, or get a new credit card or contact phone number after a year or two.
To update stored data for each individual data subject whenever necessary, you need to know exactly what data you hold and about whom, as well as being able to trace its location across all your databases. You also need a workflow set up to regularly check that the data is still accurate, and gather updated information if needed.
3. Purpose limitation
According to most data privacy regulations, organizations may only store, process, and/or share data that is necessary to operate their products and services effectively. In the event of an audit or a challenge from a data subject, you need to be able to validate the purpose for every piece of data you collect and hold. This can be highly challenging, and once again requires you to know the location of every data set.
4. Data retention
Last but not least, every organization needs a reliable way to delete data on request and according to the principles mentioned above. You need an automated dataflow that deletes data on an appropriate schedule.
Depending on the way a software program was built, deleting data can be difficult. If a digital service uses columnar storage, enforcing adequate data retention and minimization policies will be a long and complex process.
Related Blog: It’s Time for Some Machine Unlearning
.
Privacy engineering: can you find the gap?
Privacy engineering is becoming a growing concern for virtually all companies globally. In order to meet this dynamic set of challenges, business leaders must implement a gap analysis specifically around issues related to privacy engineering in their organizations, and grade themselves accordingly.
Based on those findings, they will be able to understand and lay out the precise steps that will protect the privacy of their employees and customers, and ensure compliance with ever-changing privacy regulations. Every company is different, and so are the actions and policies that must be implemented to ensure privacy engineering success.
Privya empowers privacy engineering to overcome these challenges
Privya uses artificial intelligence (AI) to help organizations comply with privacy engineering policies and protocols: GDPR, CPRA, HIPAA, GLB Act, ISO, NIST, FedRamp and more.
The AI-powered scanner speeds up data mapping while increasing confidence around data discovery, tracking data flows, and verifying and enforcing controls in place. With Privya, it’s possible to automate and streamline privacy engineering tasks and DSR compliance.