Privacy is often cited as a concern in cloud computing. The EU Data Protection Directive (DPD), while not strictly a privacy law, may affect not just those whose personal data are processed in the cloud (data subjects), but also those using cloud computing to process others' personal data, and indeed those providing cloud services used to process that data.
The DPD's scope is broad. Even cloud computing service providers based outside Europe may become subject to EU data protection laws through choices made by their customers, of which they may have no knowledge or control.
The key foundational issues are:
- What information in the cloud is regulated under EU data protection laws?
- Who is responsible for personal data in the cloud?
- Whose laws apply in a dispute?
- Where is personal data processed? (restrictions on international data transfers).
This is the first of several articles on the DPD and cloud computing. We start with "personal data" in the cloud.
What information is regulated? "Personal data"
The DPD's "personal data" definition is crucial. It's the trigger for EU data protection law requirements to kick in.
Information qualifying as "personal data" is regulated under the DPD, and all DPD rules on "personal data" apply to it.
Information which is not "personal data" or which stops being such, e.g. through anonymisation, can be handled without any DPD issues. For instance, anonymous data may be transferred freely, to the USA or elsewhere, regardless of the DPD's restrictions on exporting personal data outside the EEA. That data might be subject to other laws, such as on confidentiality, but not the DPD.
Conversely, if information is "personal data", the person who "determines the means and purposes" of processing that data is the "controller" of the data. Controllers who have certain connections with the European Economic Area are subject to various obligations.
These include having to register with authorities, handle personal data in certain ways, etc. There are also requirements regarding any "processor" processing personal data "on behalf of" a controller. "Processing" includes storing, holding, operating on, transmitting, disclosing or accessing data.
The DPD takes a very binary, "all or nothing" approach to data. If information is "personal data", all data protection rules apply to it (not just some); if it's not, none do.
Personal data in cloud computing
Now consider three types of data often encountered in cloud computing:
- anonymised data
- encrypted data
- fragmented data (shards or chunks)
Are these kinds of data "personal data"? It depends.
Cloud users may wish to anonymise data to process in non-EEA data centres. Cloud services providers may wish to anonymise data stored with them, to use, sell or share the resulting data without having to obtain consent.
But it's not straightforward to anonymise "personal data" for the purposes of the DPD definition, so that it no longer relates to an "identified or identifiable" living individual taking into account all "means likely reasonably to be used" by the controller or by any other person to identify individuals.
Key-coded data (changing names to code numbers, where someone holds a "key" of numbers and corresponding names) may still be "personal data". Even deleting names or other "identifying" details such as addresses may not suffice.
Aggregating data (perhaps by years, or towns) might render data "anonymous". But it might not, particularly as advances in re-identification science increasingly enable identification indirectly through combining data from different sources. Much depends on the circumstances.
However, EU data protection regulators seem to consider that even "retraceably pseudonymised" data, eg key-coded data, may be "anonymous" when processed by another person operating "within a scheme where re-identification has been explicitly excluded" - provided appropriate measures were taken to stop them re-identifying individuals, even if, theoretically, third parties could re-identify individuals through "accidental matching". The focus is thus on preventing identification.
So, for anonymised or pseudonymised data, including "anonymised" data in the cloud, the strength of "anti-identification" measures affects the data's status as personal or anonymous.
This status matters to cloud services providers too. If supposedly "anonymised" information in the cloud is actually "personal data", the provider could find itself being a "processor" of "personal data" for its customer, or even a "controller" of that data with greater obligations.
Similarly, encrypted personal data would be "personal data" for someone with the decryption key; but it may not be, in the hands of a cloud storage provider with no access to the key.
Again, much depends on the "anti-identification" measures of the cloud customer uploading encrypted data: strength of encryption algorithm used, key length, effectiveness of key management, etc. The better the encryption, the less likely that others can decrypt data to identify individuals from it.
Research progresses on how to operate on encrypted data without decrypting it but currently, even if information is stored encrypted, it must first be decrypted before it can be worked on at practical speeds. If decryption occurs on the cloud provider's servers, theoretically it could access the decrypted data and identify data subjects.
So data, even if not "personal data" while in encrypted form in the cloud, could become "personal data" when decrypted for use in a cloud application.
In cloud computing, data may be broken up automatically into fragments for storage or processing.
Does a fragment contain "personal data"? Again, it depends. Even a tiny fragment could contain personal data. Providers' sharding policies are likely to be based on performance considerations, not the DPD.
More to the point, if the provider can login to the customer's account and re-unify the shards to access the full data, it can access "personal data".
Access to data
Recall the importance EU data protection regulators place on excluding re-identification. It seems reasonable that if you can't view or read data, you can't identify data subjects, so identification may be excluded by excluding others from accessing data.
The nature of the cloud service matters. With social networking sites, photo sharing sites etc, much stored personal data is meant to be publicly accessible, and obviously re-identification of people from data can't be excluded; indeed, identification's often the point. Such cloud services will therefore be holding "personal data".
Strongly encrypted personal data ought to be considered non-personal data in the cloud, as discussed above. Where cloud services store unencrypted personal data not meant to be publicly accessible, can re-identification be excluded? Re-identification is more likely with weak access controls where unauthorised persons could access data, so the weaker a provider's access controls, the more likely that stored unencrypted personal data will be "personal data".
More generally, many cloud service providers can themselves monitor and access customers' data and operations anyway, and reserve rights to do so. With unencrypted personal data, this means the scheme doesn't exclude re-identification. Providers could theoretically identify individuals even if, in practice, they wouldn't access the data except to handle support requests or troubleshoot service problems.
Currently there's no possibility of allowing limited re-identification where it's incidental to a cloud provider accessing data to deal with service issues, for instance, rather than to identify data subjects. There's a blanket requirement that re-identification must be excluded altogether, before the data may be considered non-"personal data" in the hands of the provider.
So, excepting cloud services with total end-to-end strong encryption where the provider can't ever access stored encrypted data, it seems unavoidable that most cloud providers are processing "personal data" under the DPD. This seems the inevitable result of focusing on preventing identification, rather than on assessing the risks to individuals’ privacy in context.
The cloud of unknowing
Unlike, say, social networking sites, many cloud service providers provide only IT infrastructure: IaaS or PaaS where customers process data using the provider's infrastructure, or "storage as a service" SaaS where the service is limited to data storage and tools to upload, download and manage the stored data.
Such providers generally can't control the form in which their customers choose to upload the data. Nor would they necessarily know the nature of data their customers intend to process using their infrastructure. Hence, the "cloud of unknowing".
Yet the status of "anonymised" or encrypted data stored or operated on using cloud services, which affects the status of the provider as "processor" (or not) of the data, may vary with the customer's decisions and actions. This may differ for different customers, or for the same customer storing different kinds of data or the same data at different times.
It seems unsatisfactory that the DPD status of cloud providers should depend on how well their customers anonymise or encrypt data.
The DPD is being reviewed, with a draft revised DPD due out this year. It would be helpful if the revised DPD clarified the status of encrypted and anonymised or pseudonymised data. Even the procedure of encrypting or anonymising data, which might be considered privacy-enhancing, may be a regulated "processing", requiring consent to encrypt data, etc. Similarly with sharding or chunking personal data. Again, the new DPD provides an opportunity to clarify this.
More generally, as more data becomes available and de-anonymisation techniques improve, under the current "personal data" definition everything could potentially be "personal data". This makes the definition unhelpful as a trigger for data protection obligations.
Instead of a binary approach and "all or nothing" application of data protection rules, it may be better to concentrate on protecting privacy rather than protecting data, and focus instead on the risk of identification and risk of harm. The definition of "personal data" should be based on realistic likelihood of identification, and, rather than applying all data protection requirements to information determined to be "personal data", there should be consideration in each situation of which rules should be applied, and to what extent, based on the realistic risk of harm and its likely severity. How the information should be processed should then be tailored accordingly, with measures appropriate to the risks.
This would require an end-to-end accountability-based approach proportionate to the circumstances, including the situation of the controller, data subject and any processor. More sensitive situations, with greater risk of resulting harm and/or greater severity of the likely harm, would require more precautions than less sensitive ones.
Such an approach might result in fewer requirements being applied to infrastructure or passive storage providers than to SaaS providers who actively encourage processing of personal data through their services, and also seems more appropriate in today's environment of complex relationships where there may be multiple layers of cloud services providers.
The full paper discussing the above in detail, along with other related issues, "'Personal Data' in Cloud Computing - What Information is Regulated? The Cloud of Unknowing, Part 1", is available for free download.
Future articles will address other DPD issues relating to cloud computing.