See, I’ve got a large list of optimizations that I’d like to implement in Arachni, some more novel and radical than others. A few weeks ago I started playing with one of those ideas and got caught in an optimization groove; next thing you know a bunch of items from my list were now in the system and the results were astonishing.
The explanations are boring, so let’s start with looking at the results:
|v1.2.1 (previous stable)||28930||00:06:30||1652||00:04:03|
|v1.3 (latest stable)||30019||00:05:49||1544||00:03:47|
|Coming next (WiP)||< 26533||< 00:03:56||–||–|
As you can see, performance has been significantly increased and I’m not done yet, scan durations will keep tumbling for the foreseeable future.
arachni http://testhtml5.vulnweb.com --audit-links --audit-forms
arachni http://testhtml5.vulnweb.com --checks -
Rewritten URI parser
Most of the URI parsing in Arachni is handled by a custom written, lightweight and very fast parser, instead of relying on the one provided by Ruby’s standard libs or an external library.
Re-inventing the wheel isn’t the way to go most of the time, but sometimes you can make a wheel that fits better to your very special car; this was one of those times. This isn’t new by the way, that code has been in there for a long time.
Arachni was parsing URLs into their components using the custom code (very fast) and then passing them to a wrapper of the standard-library URI class, which provided some nice helpers when you need to handle them later on.
Problem was that by involving Ruby’s URI lib at all, it acted as a muzzle to all the performance gains. The URI class is now entirely custom written in order to bypass outside dependencies as much as possible which yields much better performance.
Rewritten HTTP proxy server
All the browser connections in Arachni (either Arachni’s browser or a user browser when using the proxy plugin) pass through system controlled HTTP proxy servers. This allows for traffic inspection and manipulation which in turn allows for great control over browser operations.
The HTTP proxy server used up to now was Ruby’s own, called WEBrick, and it is terrible.
Why was I using a terrible library, you may ask; well, there wasn’t really much choice, and there were a lot of complicated issues (like SSL interception) that I had already handled. So when it was time for the v1.0 rewrite (adding a real browser environment) there was so much other work that needed to be done that that this became secondary.
Now, though, it became primary. I bit the bullet and decided to write a purpose-built HTTP proxy server just for Arachni.
The new proxy server is much faster than the previous one, supports keep-alive and SSL interception is about 10 times faster than before.
In addition to the performance gains, it also enables total control over connections, allowing for websocket and other traffic to be inspected and manipulated, if need be.
Switched WebDriver clients
Arachni’s browser leverages PhantomJS in combination with a custom JS environment in order to monitor the DOM and trace taints. PhantomJS is controlled via WebDriver, which is a very common way to manipulate browsers programatically.
One of the very popular Ruby libraries that allows for easy control of browsers is the Watir WebDriver (which in turn uses Selenium) which it turns out was a suboptimal choice for Arachni. Don’t get me wrong, Watir is a fine library when you’re doing integration tests for your web application, less so when you want fine control, low overhead and high performance.
Watir does a lot of housekeeping behind the scenes, which became redundant as Arachni’s browser does its own housekeeping as well.
For a long time I wanted to switch to something more lightweight than Watir internally, but still expose its interface to plugins (like the login_script one) because it’s popular and easy to use. The obvious choice was bypassing Watir and directly using Selenium.
Turns out, Selenium was a much better fit, it just does what you tell it and each call corresponds to 1 WebDriver request.
As a result, the overall amount of WebDriver requests necessary for each Arachni browser operation was almost halved, resulting in massively less CPU utilization for both for the Arachni and PhantomJS processes.
Directly using Selenium combined with many other smaller optimizations resulted in browser operations becoming many times faster than before while using significantly less system resources.
Optimized signature analysis
A very large part of the security checks (and thus a big part of the overall scan) is processing large strings and looking for signatures, either via substrings or regular expressions.
For example, some of the SQL injection checks look for known errors in HTTP response bodies after injecting certain payloads, if the response matches any of the error signatures, then an issue is logged.
The performance of this approach can vary wildly depending on the complexity of the signature and the size of the HTTP response body.
Looking for substrings tends to be a little faster than matching simple regular expressions and much faster when dealing with complex ones. Still, even simple regular expressions under-perform significantly when dealing with very large strings.
What changed now is that you can mix and match signature types in order to provide the best signature for any situation, as well as dynamically generate or specify signatures after having a peek at the response.
So, if you need to look for a somewhat complex regular expression which would result in significant resource utilization when dealing with lots of large responses, you can bail out early and not do the match at all if the response does not full-fill some other criteria.
For example, Arachni uses /:.+:\d+:\d+:.+:[0-9a-zA-Z\/]+/ to look for contents of /etc/passwd, and you can bet that contents of that file will always include the substring bin/ somewhere in there. So, you can now say “only match the regular expression if the response contains the substring bin/”.
That’s all for now, the rest of the optimizations are more extreme (and interesting) so they probably won’t be pushed until the nightlies get stable.
Fresh code means new bugs, so I’ll be taking this one step at a time to ensure that everything works right.
If you come across any issues while testing the nightlies please get in touch.