First of all, I know what you’re thinking: Wasn’t v0.5 supposed to be the one to be released?
Well, it was, but the project is mature now and the only major feature missing to reach completeness was a real browser and so much (i.e. pretty much everything) has changed, so it’s time to go into the v1.0s. It is essentially a new Arachni, Arachni reborn; a chance to fix ancient decisions I had to carry along for years in order to preserve compatibility.
Obviously, this is a big milestone (biggest so far), but I’ll skip the customary retrospection. What to say though…this is kind of a big deal you know? I feel like I’ve got to make a speech of some sort. Let’s try this (although I think I’ve used it before):
In this industry, scanners have always been separated into two groups, open source and commercial, and that’s because that has pretty much always meant “without browser functionality” and “with browser functionality”. Of course, there are other differences too, but from the feedback I’ve received through the years, this is most commonly the decisive feature, and you can see why.
Basically, if you want any sort of coverage that’s not a complete joke, you need to support these technologies. No matter how clever you’re being with regular expressions and static analysis of JS code and whatnot, the only way to make a web application work these days is with real support for the environment it expects. And this has been bugging me for a long time, a long time before I decided to start experimenting with approaches to implement it and a long time until I finished my implementation. That’s because the only way to make this work is with the real thing, a real browser.
This is why this release is a big deal, Arachni is the first open source scanner to cross that line. And as you read through this post, you’ll see that not only has Arachni crossed over, but leaped over, surpassing even most of the established products out there (WIVETv3 and WAVSEPv1.5 scores will follow) in a few ways.
I hope that by now you’ve realized that this post is going to be quite lengthy, the changes to Arachni are massive, so let’s get to the juicy parts.
I’d like to make one thing clear from the get go: v1.0 breaks backwards compatibility.
Every way you used to interact with Arachni has changed, don’t bother trying to upgrade, you need to start from scratch. CLI options are different, reports are different, the RPC API is mostly different, the RPC protocol is different and so on and so forth.
Things have changed for the better though, the much better, so don’t start cussing just yet.
This is one of the reasons this release is v1.0, it is a major change.
No more Spider
Aside from options being different, which I’ve already mentioned, the first thing you’ll notice when running a scan will be that there’s no crawl first.
While I was developing the browser analysis, I noticed that when combined with the Framework’s Trainer, the higher level pattern that emerged was that of a highly advanced crawl operation; that rendered the crawl-first approach redundant, thus I gave the old Spider the boot.
So now the scan goes right into the audit, discovering and handling new workload as it goes along, with full browser analysis happening in parallel and feeding it more pages. The result of that is that the regular audit hides the browser-analysis latency as much as possible but you get the best of both, while avoiding waiting for a crawl.
Scans can now be suspended to (and restored from) disk.
This was an old feature request, but couldn’t be implemented unless a rewrite was performed to separate and split system Data and State information. The v1.0 rewrite gave me that chance so now you’ll be able to suspend running scans and save their data and state as an Arachni Framework Snapshot (.afs) file.
Aside from the obvious, this allows you to do some cool things like:
- If you’re in a distributed environment, you can transfer running scans to other nodes — for runtime load-balancing for example.
- If you’re paranoid, you can set suspend/restore operations to happen at regular intervals — as checkpoints of a sort.
The above functionality isn’t provided by the system though. What is provided is:
- A scan timeout, after which the scan can either be aborted or suspended.
- The ability to suspend/restore via all available interfaces (CLI, RPC, WebUI).
As a heads-up, the suspension operation isn’t instant, you’ll have to wait for the audit of the current page to finish and also wait for all the browser jobs to be processed. How much time that’ll take greatly depends on the web application being audited.
New input vectors
Full-browser support brought about a new range of elements — well, to be more precise, it brought about DOM extensions for existing elements. Furthermore, this part of the system was also rewritten to allow for the easy addition of new element types (of which I’ll be making full use in the near future).
Arachni can now audit links which rely on the client in order to perform their functions. It’s quite common these days that client-side inputs are passed via URL fragments, like: http://example.com/!#/?param=val¶m2=val2
Arachni knows to extract and audit those inputs, using the browsers to facilitate the submission of this element.
Forms don’t need anything special in order to have a DOM counterpart attached to them (although I’ll optimize this in the future to look for associated DOM events).
If form audits and the browser have been enabled, all forms will also be submitted via the browser — although only a few XSS checks need to audit form DOMs, so the overhead is rather small.
Cookie DOMs are handled exactly the same as the form DOMs so not much to say here. The browser cookies are set to their fuzzed values and the page is loaded via the browser.
Link templates allow you to specify rules in order to extract arbitrary inputs from paths; the rules are basically regular expressions with named captures.
For example, to extract the “input1” and “input2” inputs from: http://test.com/input1/value1/input2/value2
You can use: /input1\/(?<input1>\w+)\/input2\/(?<input2>\w+)/
You can, of course, specify multiple templates (whose order will be preserved) which allows you to build rules that match any sort of parameter formatting a web application might use.
This is the same as the normal LinkTemplate, but it matches the given rules against the URL fragments and submits the elements via the browser.
URL rewrites aren’t really treated as elements, but as scope adjustments, although for the purposes of this announcement, it makes more sense to place them in this section.
The deal is simple, you provide rules (as pattern-substitution pairs) which the system will use to rewrite the action URLs of links, forms and other elements.
For example, say you want to rewrite “http://test.com/articles/some-stuff/23” to “http://test.com/articles.php?id=23”, you’d use “/articles\/[\w-]+\/(\d+)/” as the pattern and “articles.php?id=\1” as the substitution.
If you are aware of the webapp’s URL-rewrite rules, it’s best to provide them, instead of using the LinkTemplates.
“Modules” have been renamed to “Checks”, to make their purpose clearer, and their categories have been renamed from “Audit”/”Recon” to “Active”/”Passive”, for the same reason. So let’s see what changed on that front.
All checks have had their information (check descriptions, issue descriptions and issue remedy guidances) overhauled and formatted as Markdown. This allows the system to render them in a much more presentable fashion in the HTML report and the WebUI, and since Markdown is human readable, it looks quite nice when just printed to the console.
- xss_dom — Injects HTML code via DOM-based links, forms and cookies.
- xss_dom_inputs — Injects HTML code via orphan text inputs with associated DOM events.
- no_sql_injection — NoSQL Injection (error-based) .
- no_sql_injection_differential — Blind NoSQL Injection (differential analysis).
- xss — Added support for Browser-based taint-analysis.
xss_script_context — Added support for Browser-based taint-analysis.
- Renamed from xss_script_tag.
- unvalidated_redirect — Updated to also use full browser evaluation in order to detect JS redirects.
- os_cmd_injection — Added payloads for *BSD and AIX.
- xpath => xpath_injection
- ldapi => ldap_injection
- sqli => sql_injection
- sqli_blind_rdiff => sql_injection_differential
- sqli_blind_timing => sql_injection_timing
- htaccess => htaccess_limit
- backup_directories — Backup directories.
- cookie_set_for_parent_domain — Cookie set for parent domain.
- hsts – Checks HTTPS pages for missing Strict-Transport-Security headers.
- backup_files — Updated filename formats.
x_forwarded_for_access_restriction_bypass renamed to
- Also updated to use more origin headers.
- emails – Updated to handle simple ( [at] and [dot]) obfuscation.
- insecure_cookies – Only check HTTPS pages.
Like with the checks, plugin descriptions have also been updated to Markdown, which is especially useful because some plugins have a somewhat lengthy description of their features and example configurations.
- Updated to use HTTP::ProxyServer.
- Forces the proxy to only extract vector information from observed HTTP requests and not analyze responses.
- params option renames to parameters.
- Changed results to include status ( String) and message ( String) instead of code ( Integer) and msg ( String).
- Updated to abort the scan upon login failure.
- Renamed params in logged results to parameters.
- Renamed res in logged results to response.
- Changed results to include status ( Symbol) and message ( String) instead of code ( Integer) and msg ( String).
- Changed results to use with_issues and without_issues instead of unsafe and safe.
- resolver — Removed as the report now contains that information in the responses associated with each issue.
A great deal of things have been changed in order to accommodate the new functionality, with lots of those things providing their own context to logged issues; as such, the reports had to be redesigned from scratch.
Of one thing I am certain though, you will love them. All reports include an abundance of context for easy reproduction and verification of identified issues, such as:
- Affected page snapshots, including:
- DOM transitions, allowing for restoration of state.
- DOM capture as HTML code.
- Referring page snapshots, for easy comparison of before and after states.
- Function names.
- Function argument signatures.
- Function locations.
- Function source codes.
- Function argument lists.
Here are some examples of generated reports:
- HTML (zip)
- AFR — This is the Arachni Framework Report file, it serves as a reference point and can be converted to any of the above formats.
Also, “Report” components have now been renamed to “Reporters” to avoid name clashes with the new scan “Report” object, previously called “AuditStore”.
Integrated browser environment
This is the pièce de résistance.
Arachni monitors both the data and execution flows of its injected payloads and once the page reaches a vulnerable state, it captures full JS stack data (as you saw from the report examples earlier). This is really cool because you’re not just getting a JS backtrace to work with, you get access to pretty much the same info as if you were using a debugger (like FireBug), had you set a breakpoint at the perfect time.
In addition, extra optimizations have been included for frameworks like JQuery and AngularJS (with more to come) to bring the context of the logged data closer to the developer’s perspective, rather than just logging lower-level JS/DOM calls.
Before I forget, support for more browsers (Firefox and Chrome) is on my TODO list.
This was an unplanned perk that came from having a real browser environment
You can now use Arachni to audit responsive and mobile web applications by configuring its user-agent identification and viewport size and orientation, to make it look like the device of your choice. This way you can gain as much coverage of interface, client-side and server-side code as possible.
Crawl coverage (WIVETv3)
If you’re thinking that this is probably a lowly Open Source project with half-assed support for a browser environment, you’d be quite wrong.
Let’s use a very basic benchmark (WIVETv3), to compare the coverage provided by today’s scanners: http://sectoolmarket.com/wivet-score-unified-list.html
You can see that the highest score in that table is 96%, which is only achieved by HP’s WebInspect, with other established commercial products hovering in the 90s — Arachni’s old score was 19%, quite pitiful.
Arachni v1.0 scores 96%, tying it with WebInspect for the lead, and surpassing pretty much everything. The reason that Arachni maxed-out at 96% is because there’s not (and will never be) support for SWF.
You can verify for yourself with:
./bin/arachni http://192.168.1.7/wivet/ --checks trainer \
--audit-links --audit-forms --scope-exclude-pattern=logout \
You need to:
- Visit WIVET with your browser and pass its PHPSESSD to Arachni, to force it to maintain a single session. That will also allow you to easily inspect its progress via the “Current run” page.
- Load a check that submits forms (in this case it’s the “trainer”), so that Arachni can train from the web app’s behaviour and pass the wizard test.
- Exclude the logout link, so that Arachni won’t nullify its session.
The other options are there to force Arachni to only submit links and forms and skip cookies, which would have been enabled by default, but that doesn’t affect the results; it just makes the process faster and makes the progress output less hectic.
So you see, the era of needing a commercial, closed-source scanner to get decent coverage is gone.
Vulnerability coverage (WAVSEPv1.5)
We saw how good Arachni is at finding web application resources, let’s now see how good it is at identifying vulnerabilities; for that we’ll use the latest WAVSEP benchmark results: http://sectoolmarket.com/price-and-feature-comparison-of-web-application-scanners-unified-list.html
As you can see, IBM’s AppScan is at the top, with some impressive results:
- WIVET: 92%
- SQL injection: 100% (0% false positives)
- Reflected XSS: 100% (0% false positives)
- Local file inclusion: 100% (0% false positives)
- Remote file inclusion: 100% (0% false positives)
- Unvalidated redirect: 36.67% (11% false positives)
- Backup files: 5.43% (66.67% false positives)
And Arachni v1.0 scores:
- WIVET: 96%
- SQL injection: 100% (0% false positives)
- Reflected XSS: 90.62% (0% false positives) — Misses cases which require a browser with VBScript, which is not (and will not be) supported.
- Local file inclusion: 100% (0% false positives)
- Remote file inclusion: 100% (0% false positives)
- Unvalidated redirect: 100% (0% false positives)
- Backup files: 98.91% (0% false positives) — 2 test cases are broken, otherwise it would have been 100%.
If VBScript support is important to you, then the best choice is IBM’s AppScan, otherwise, Arachni is the obvious choice, providing better crawl coverage, better vulnerability coverage and no false-positives.
Honestly, I’m a bit worried that I may have somehow let my bias affect the benchmark results, like having subconsciously setup my benchmarks in a favourable way. However, that shouldn’t be an issue, because we’ll find out how Arachni v1.0 does against other scanners objectively, at the next round of WAVSEP benchmarks.
For posterity’s sake, here’s how the benchmarks were run:
- No “special” optimizations were enabled for these tests, they were performed using the default settings.
- Only applicable checks were loaded for each test category.
- The tests were not run per case, but rather per category.
- Official WAVSEP benchmarks are performed per-case, to help scanners return consistent results. The WAVSEP benchmark is rather limited in its design to handle concurrent DB connections and when stressed can make scanners miss cases or return false-positives.
- Arachni doesn’t need that sort of pampering, it is designed to handle bad network conditions and broken web applications (up to a point), since that’s what you’ll face in real-life.
So, in an effort to be fair, I’ve actually put Arachni at a disadvantage during my tests; although, even Arachni has its limits, so if you try to do the same YMMV.
The RPC side of Arachni has also seen its fair share of changes.
The biggest change is that Marshal and YAML have been ditched as the serialization formats, and substituted with MessagePack. This required quite a massive clean-up due to the lack of custom deserialization hooks, but it was for the best, as every object that passes over RPC is now placed there explicitly.
The clean-up also provides the system with the ability to switch between pretty much any serialization format, with minimal effort, should that be required in the future. Although I don’t see that happening, MessagePack is quite perfect for this use-case.
In addition, Arachni’s serializer uses ZLib-compressed MessagePack (once a certain message size is exceeded), which makes a huge difference when transporting heavy objects like issues or reports.
The RPC API has been cleaned up as well, with the only object now exposed being the RPC::Server::Instance, accessible via the “service” RPC handler.
The WebUI usually gets its own section in these announcements, separate from those of the Framework, although in this case that won’t be necessary. Its changes are minimal and their purpose was to bring it up to date with the Framework. There haven’t been any mind-blowing updates to it.
Scans can now be suspended and restored, profiles have changed to keep up with the Framework and the Issue pages now show more context than before — that’s pretty much it.
This is a good news, bad news, good news situation.
I have decided to turn Arachni into a self-funded Open Source project (and hopefully an Open Source business) and it is for that reason that the project is now dual-licensed (Apache License v2.0/Commercial).
- Good news is that for 99.9% of you nothing changes, except for having a really good F/OSS scanner to work with.
- Bad news is that people who commercialize Arachni (for SaaS services or distribution as part of a commercial product) will need to acquire a commercial, non-free license to use Arachni v1.0.
- Good news is that the funding will help improve Arachni.
The license page contains more information on the subject and a way to get in touch.
More input vectors
Even though I am very satisfied with Arachni v1.0, I still see room to improve. The only important category in which it does not excel is in supporting as many input vectors as possible. For comparison see: http://sectoolmarket.com/input-vector-support-unified-list.html
The old version scored a 7, v1.0 scores a 9 (or 10, not sure), but there’s still more to be done. Thankfully, support for JSON/XML is now a trivial matter and has been scheduled for v1.1 — actually, implementing any kind of basic input vector type is now as easy as pie so I’ll try adding as many as possible for v1.1.
After that, I will be adding support for nested parameters, which is going to be a bit trickier, but not that hard.
Specialised REST/SOAP scanners
Once my work on supporting as many input vectors as possible is complete (see above), I’m planning on creating specialised web service scanners.
This will allow the system to branch out in a clean fashion and have executables like “arachni_rest” and “arachni_soap”, with UIs specifically designed for those types of services.
And of course, these will be accompanied by the appropriate RPC and WebUI interfaces as well.
Intelligent state monitoring
There’s a lot of talk about complex workflows and such these days (mostly talking about a feature Arachni has had for years, its Trainer) but that has got me thinking, what if you could actually keep track of the state of the web application?
- Page A has a form FA which only appears when condition CA is met.
- Page B has a form FB that, once submitted, updates the state of the web application to CA.
- Page B is audited, form FB submitted.
- Page A, if revisited, will now show FA.
That scenario is troublesome, because if Page A was audited prior to getting to Page B, you’ll miss that form. And I know what you’re thinking: Do a crawl at the end of the scan, it’s very common.
You see, that solves the wrong problem, that solves only the absolutely most basic case, which is the one I just described. You’re trying to only solve that case when the real problem is being aware about state changes. Whose to say that by the end of the scan CA won’t have been reverted due to another action? Can you re-crawl after every event?
I have a few rough ideas on how to tackle this issue, nothing concrete enough to share yet, but I just wanted to let you know that this is also on the TODO list.
There’s also the very good chance that what I’m talking about is impossible, so I’ll have to implement a solution for the most basic of cases too, either way though, this should be a fun challenge.
One thing I failed to do for this release was make available an MS Windows package.
I haven’t touched on this a lot (mostly because there were a lot more exciting things to talk about), but I spent a decent amount of time making sure v1.0 was portable. I even went so far as to rip EventMachine out of the RPC implementation and replace it with a custom written, lightweight, high-performance, portable, pure-Ruby reactor implementation, which the RPC clients and servers now use.
I wanted v1.0 to work everywhere, MRI (on Linux, OSX and Windows), Rubinius (on Linux and OSX) and JRuby (on Linux, OSX and Windows) and it does, kinda:
- Linux: Works
- OSX: Works
- Windows: Crashes due to memory violation errors.
- Linux: Segfaults when using more than 1 browser — multi-threaded libcurl/FFI issue.
- OSX: Doesn’t segfault but the same issue hinders interaction with the browsers.
- JRuby: RPC stuff don’t work due to OpenSSL and non-blocking socket bugs.
Once the above issues are resolved, Arachni will truly be multi-platform, AND TAKE OVER THE WORLD! Or just provide a cool scanner which people can use from the 3 most common OSs without hassle — between you and me, it’ll probably be the latter.
Another cool thing is that thanks to JRuby support (even if it’s not 100% yet), you can now directly integrate Arachni with Java systems.
There is a lot of fresh code in this release, and even though it has been tested extensively, I fully expect it to have a hefty amount of bugs, so please do report them accordingly.
At this point, I’d like to thank the people who mercilessly tested Arachni for months, worked with me to resolve bugs and provided valuable feedback:
- Anton Abashkin
- Simon Treadaway – A special thanks to Simon for taking the time to write proper descriptions and remedy guidances for the issues.
- Benoit Chevillot
- Robert Gouin
Thanks a lot guys, really appreciated!
And thank you (the reader) for getting this far, happy scanning and don’t forget to send feedback!