Should you encrypt everything?
Encryption is the mechanism by which normal data, like this text, is made to look like random gibberish – arguably also like his text. The basic idea of encryption has been around ever since the first secret was confidentially told to someone in earshot of someone else: when the first “Alice” had a message for the first “Bob” that she didn’t want the first “Eve” to understand, even if she could overhear it.
One of the most famous encryption mechanisms in history is the German Enygma machine, used by the Nazis during the Second World War to encrypt messages between different parts of the German army, and eventually cracked by the British under the leadership and guidance of Alan Turing, one of the fathers of modern computing. Since those days which, while less than a century ago and still part of living memory today, are ancient history for the annals of computing, encryption has made great strides with the development of various “block” and “stream” ciphers (i.e. encryption algorithms), and various “modes” which each have their advantages and drawbacks. All of these algorithms essentially do the same thing, though: using some mathematical trickery, they obfuscate the true meaning, and sometimes the true size, of a message or some data at rest, providing confidentiality to those who have the key to unlock that true meaning, against those who don’t.
But when is it worth doing that?
In information security (a.k.a. InfoSec), we learned a long time ago (again, computer years – it may only jhave been a decade or so) that the mantra “I have nothing to hide” is false on its face: you identify yourself to your bank with a four or five-digit PIN and a physical card, and expect your bank to believe you when you contest a fraudulent charge made with that same card and PIN. Thanks to Artificial Intelligence and Machine Learning, they probably will, but keeping your PIN secret is also part of that equation. Similarly, your IT department will tell you to change your password often, and keep it a secret. Passwords date back to the 1960s and are due for an overhaul, but they’re still secrets you need to keep. So, it’s probably fair to say everyone has secrets to keep. The answer to keeping secrets, or maintaining confidentiality of those secrets, when stored or when in transit (e.g. while being shared between you and the bank), has been the same since before computer science was a thing: encryption.
The IT, InfoSec, and more generally cybersecurity folks might just go ahead and encrypt everything, then. And why not? Today’s iPhone and Android smart phones have more computing power than a supercomputer from the 1970s by a factor of more than a thousand. The computational overhead of encrypting and decrypting everything is negligeable. But that does mean you should just encrypt everything?
Let’s take a look at the encryption you’re using right now: the Applied Paranoia blog is public. There are no secrets on this website: any secrets that are part of the CI/CD pipeline are not on the site itself – I’m paranoid enough to have checked. Yet, the website uses TLS for the connection between the Amazon-hosted server and your browser. TLS encrypts everything, but that’s not its purpose in this case: it’s there to make it clear to your browser that what it’s showing you really came from this blog, and not from some “man in the middle”. TLS isn’t there to encrypt, it’s there to authenticate. The encryption is there just because it’s part of the default settings.
That is likely true for most encryption today: the data on my laptop, my phone, my tablets, my S3 buckets, etc. is all encrypted. That means that to access any of it, I need the key – or at least I need access to the key. Without that, I don’t have access to any of it, and neither will you.
This fact is what the “bad guys” use when they attack you with ransomware: they encrypt your data, on your computer, so it’s no longer available to you until you pay the ransom. The fact that your laptop out-performs a vintage supercomputer helps them more than it helps you in this case. But the real pain comes from you no longer having the key to your data.
And that’s the drawback: sometimes you need to be able to access data – you need the data to be available to you, even if you don’t have the keys. You need to be able to read it, but not change it. You don’t need the confidentiality that comes from encryption, even if you do need the integrity you can validate with authentication. Sometimes, availability trumps confidentiality.
In Operational Technology, where physics meets software and cyber security takes on a meaning that is very different from Information Security, that “sometimes” becomes “almost always”. That is why DNP3 Secure Authentication version 6 encryption is still optional1. OT aside, the non-availability argument applies to anything that is encrypted: encryption comes at the cost of not just some compute cycles, but of availability as well. Sometimes, we ought to ask ourselves if it’s really worth it.
Sometimes, it isn’t.
So, how does one go about choosing what to encrypt and what to leave in the clear? The answer to that question is a technique called “Design Failure Mode and Effect Analysis” or DFMEA. A DFMEA allows you to look at your architecture using the perspective of how it might fail, and to plan for such failures. You can do this before you’ve implemented anything, or at any time after that, and you should probably do this on a regular basis. It’s really not that complicated provided you know the architecture you’re working with.
Let’s look at an example web applicaiton and analyse it. The application, which we’ll call “Crassula”, implements a web store for a florist. It uses a static website generated with Jekyll, and an Angular app embedded in that website for the purchasing workflow. The static front-end and angular app are served out of an S3 bucket behind a CloudFront proxy, the back-end for the store uses some serverless functions (lambdas), and S3 bucket to download invoices from, and a NoSQL database. The whole thing is tied together using AWS’ Simple Queue Service and deployed using CloudFormation.
The S3 bucket that contains the front-end (static site and Angular app) is the first point of entry for any customer. The Angular app calls out to the API and implements the workflows, including downloading invoices from the second S3 bucket using SAS URIs, but unlike the static app it can handle temporary failures of the API or the S3 bucket with a modicum of grace. That means that, for the front-end to work properly, the S3 bucket containing it needs high availability, but we don’t care as much about its confidentiality: we don’t need its contents to be encrypted and we can leverage a content delivery network like CloudFront to ensure availability.
On the other hand, access to invoices is a different matter: invoices contain confidential, personally-identifiable information that should not be available to anyone except the customer themselves and the store’s personnel. That means our DocumentDB should almost certainly be encrypted, as should the S3 bucket containing the invoice PDFs for download. Failure to access either of these two resources would have one of two possible effects in this application: failure to access the S3 bucket would thwart the delivery of an invoice to a customer, possibly delaying payment. In such cases, an alert to the shop’s staff would be warranted to mitigate the issue. Failure to access the database could either result in a transaction not completing, or in an invoice not being generated. In both cases, staff should be alerted but in the former case, any charge should also not be authorized. In any case, proper controls for failures of availability can be implemented here whereas for an inability to download the front-end application, detection of such a failure as well as its mitigation may require additional infrastructure and resources that would arguably be needed anyway, but not due to the confidentiality of the deployed front-end.
This is the essential trade-off: failure of an encryption mechanism or loss of the key leads to data not being available. In our example, the front-end is essentially guaranteed high availability through the content delivery network which cares much less about confidentiality than it does about availability. The back-end, on the other hand, makes the trade the other way around, though mitigations for failure (of availability) can be identified and implemented.
The way to identify those failures, and the way to identify how to mitigate and (as needed) resolve those failures, are are a different topic: design failure more and effect analysis or DFMEA. You can read about that on my other blog.
I may have authored the proposal to put it in using an AEAD, but the DNP CSTF can attest that I’ve never argued for it to be the default option. ↩