Are your regular expressions anchored?

I recently found a SQL injection vulnerability in a J2EE app where a request parameter is used to construct a dynamic JDBC query via string concatenation. When I discussed the issue with the developer, he said something to the effect of “Wait, I’m validating this field against a white list like our standards say, what’s the deal?”. So we reviewed the validation check. Turns out a regex was being applied, but it was flawed: it wasn’t anchored from beginning to end. Let’s roll up our sleeves and figure out what happened.

The validation check should ensure a field consists of exactly 10 digits. Here’s the flawed regex:

Pattern r = Pattern.compile("[0-9]{10}");
Matcher m = r.matcher(data);
if (m.find()) {
	// valid
} else {
	// invalid
}

Because this regex is not anchored using ‘^’ and ‘$’, a malicious payload such as the following will get through:

1234567890'or 1=1

Here’s the better regex:

Pattern r = Pattern.compile("^[0-9]{10}$");
Matcher m = r.matcher(data);
if (m.find()) {
	// valid
} else {
	// invalid
}

How can we identify unintentionally flawed code before the almighty pen testers?

(1) We might be able to write a static analysis rule that identifies all regular expressions and throws a warning if the literal is missing anchors. You don’t need an expensive commercial tool to implement this sort of check. You could write a custom FindBugs detector.

(2) QA should write test cases that specifically try to bypass bounds checking

The possibility of not getting the input validation right is also a good reason why the query should be converted into a parameterized query with proper data type binding on the user supplied data. :-)

Post a Comment

Your email is never published nor shared. Required fields are marked *