GC Labs: Mitigating XML Entity Expansion Attacks with Xerces

Here are a few typical questions from my “assessment runbook” that I ask for an app that accepts and parses XML data that can be controlled by an attacker (think file uploads, web services, stored XML content made up of inputs from an external user, etc.):

(1) Is there a check to enforce a maximum number of bytes that is read from the data stream that consumes the data?
(2) Is there a check to ensure the markup is valid XML?
(3) Is there a check to ensure the content of the XML message is part of the application’s protocol and matches expected data types and formats?

I’ve intentionally left out two questions that are the focus of this post (entity expansion) and the next post (external entities). It turns out that you can implement checks to cover (1), (2), and (3) and still be vulnerable to a denial of service scenario caused by insecure entity expansion due to parser weaknesses. In fact, most parsers that I’ve come across seems to be vulnerable to entity expansion attacks by default. Xerces is no exception. It’s a widespread issue because developers must explicitly configure the XML parser being used to mitigate the vulnerability; a step that often isn’t included in company security standards.

Sometimes called XML bombs or “the billion laughs” attack, an entity expansion attack is carried out by specially crafting an XML document’s inline DTD so that internal document entities exponentially expand (in some cases a billion times) when the parser loads the document.

This is one of favorite attacks to pull off because the payload is incredibly small and the consequences (if the attack succeeds) are insanely painful. The parser expands entities until memory resources are completely consumed. In Java, you’ll see “out of memory” exceptions all over the place. If your business thrives on system availability, you don’t want to be vulnerable to entity expansion attacks.

Here’s an example payload:

<?xml version="1.0"?>
<!DOCTYPE billion [
<!ELEMENT billion (#PCDATA)>
<!ENTITY laugh0 "ha">
<!ENTITY laugh1 "&laugh0;&laugh0;">
<!ENTITY laugh2 "&laugh1;&laugh1;">
<!ENTITY laugh3 "&laugh2;&laugh2;">
<!ENTITY laugh4 "&laugh3;&laugh3;">
<!ENTITY laugh5 "&laugh4;&laugh4;">
<!ENTITY laugh6 "&laugh5;&laugh5;">
<!ENTITY laugh7 "&laugh6;&laugh6;">
<!ENTITY laugh8 "&laugh7;&laugh7;">
<!ENTITY laugh9 "&laugh8;&laugh8;">
<!ENTITY laugh10 "&laugh9;&laugh9;">
<!ENTITY laugh11 "&laugh10;&laugh10;">
<!ENTITY laugh12 "&laugh11;&laugh11;">
<!ENTITY laugh13 "&laugh12;&laugh12;">
<!ENTITY laugh14 "&laugh13;&laugh13;">
<!ENTITY laugh15 "&laugh14;&laugh14;">
<!ENTITY laugh16 "&laugh15;&laugh15;">
<!ENTITY laugh17 "&laugh16;&laugh16;">
<!ENTITY laugh18 "&laugh17;&laugh17;">
<!ENTITY laugh19 "&laugh18;&laugh18;">
<!ENTITY laugh20 "&laugh19;&laugh19;">
<!ENTITY laugh21 "&laugh20;&laugh20;">
<!ENTITY laugh22 "&laugh21;&laugh21;">
<!ENTITY laugh23 "&laugh22;&laugh22;">
<!ENTITY laugh24 "&laugh23;&laugh23;">
<!ENTITY laugh25 "&laugh24;&laugh24;">
<!ENTITY laugh26 "&laugh25;&laugh25;">
<!ENTITY laugh27 "&laugh26;&laugh26;">
<!ENTITY laugh28 "&laugh27;&laugh27;">
<!ENTITY laugh29 "&laugh28;&laugh28;">
<!ENTITY laugh30 "&laugh29;&laugh29;">
]>
<billion>&laugh30;</billion>

If you’re familiar with XML, you know this causes the parser to exponentially expand entities defined in the inline DTD.

Look at any Xerces tutorial on the interwebs, and you’ll find vulnerable code similar to the following that shows how easy it is to load an XML document:

public boolean load() {
		boolean loaded = true;

		System.out.println("Loading: " + this.filePath);

		try {
			SAXParser parser = new SAXParser();
	        parser.parse(this.filePath);

	        // do something useful with the document

		} catch (SAXNotRecognizedException caught) {
			System.out.println(caught.getMessage());
			loaded = false;
		} catch (IOException caught) {
			System.out.println(caught.getMessage());
			loaded = false;
		} catch (SAXException caught) {
			System.out.println(caught.getMessage());
			loaded = false;
		}

		return loaded;
	}

Nothing unusual here: we simply construct SAXParser and then tell it to parse an XML file. The problem is that if we load the payload shown above, the parser consumes memory and CPU while loading entities until there’s no more memory left.

Most XML APIs have features that offer defense against malicious XML. There’s usually an option to disable inline DTD or DOCTYPE processing or a feature to limit the number of entity expansions that can occur. Usually, the default settings are too high, which is an “insecure by default” problem.

The most secure thing to do is just disable DOCTYPE processing. The more interesing case, however, is mitigating the vulnerability when you must allow for a DOCTYPE. In Xerces, we can tell the parser to limit the number of entity expansions using the SecurityManager class.

The following code throws an exception if the parser expands more than 16 entities:

public boolean loadSecurely() {
		boolean loaded = true;

		System.out.println("Loading: " + this.filePath);

		try {
			SAXParser parser = new SAXParser();
	        SecurityManager securityManager = new SecurityManager();
			securityManager.setEntityExpansionLimit(16);
			parser.setProperty("http://apache.org/xml/properties/security-manager", securityManager);

	        parser.parse(this.filePath);

	        // do something useful with the document

		} catch (SAXNotSupportedException caught) {
			System.out.println(caught.getMessage());
			loaded = false;
		} catch (SAXNotRecognizedException caught) {
			System.out.println(caught.getMessage());
			loaded = false;
		} catch (IOException caught) {
			System.out.println(caught.getMessage());
			loaded = false;
		} catch (SAXException caught) {
			System.out.println(caught.getMessage());
			loaded = false;
		}

		return loaded;
	}

Make sure the following two questions also appear in your runbook:

(4) Does the parser allow inline DTD or DOCTYPE processing?
(5) Does the parser set a limit on the number of entities that can be expanded?

Finally, if you run static analysis tools, make sure you have a rule that spots where XML is being loaded. Share this insight with your pen testers so they can upload malicious XML and conduct additional XML processing attacks.

This is a post from the good code (GC) labs, a library of software security tests cases. Check out the source code for this post here.

Post a Comment

Your email is never published nor shared. Required fields are marked *