Go to Top

Arachni::Browser: JS Data flow taint tracing

Hey folks,

I’ve got a big treat for you today. Since I can’t show any code yet, the least I can do is demo my progress and today I’d like to present the JS taint data-flow tracer. If you recall, my last post showcased the execution-flow tracer, which in essence injected a JS payload and created a detailed stack-trace of its execution — that functionality covers cases where a JS payload lands in an executable segment of the page. In that last post, I remarked that in order to provide full coverage, I would have to investigate data-flow tracing for cases where executable code couldn’t be introduced, like a taint being injected to, and retrieved from, a source such as document.location.hash or something similar.

Desired functionality

Upon introducing a taint, that taint should be tracked throughout the data flows of the JS execution.We need to know how a given token traverses through the data structures of a JS application.

The result would be the gathering of a lot of helpful information to make available to the user and making fixing a given issue much easier. Telling a user that the scanner managed to create an HTML element by placing it in  document.location.hash is alright, but being able to tell the user the entire path that the payload in question followed would be better still.

Approach

Ideally, access to the interpreter’s memory space would provide full privilege to do just that, alas that’s practically impossible. The next best thing is to identify the points of data transfer, how and where data is being exchanged inside a JS application, and treat those as data inspection checkpoints.

So, we’re going back to basics, Programming 101, data is being exchanged via function arguments, so that’s where we need to set up shop.

In essence, each function becomes a sink, if the taint lands in its arguments that function needs to be logged — and as an added bonus, provide a full execution-flow stack trace as well, since we can.

Implementation

Initially, I figured track everything, go through the namespaces and objects and wrap their functions in magic tracing code. Although functional, it resulted in too much information, a more focused approach was required; which wasn’t that hard since the necessary components of the tracer system had already been written.

At this point it’s sort of point-and-shoot, you can either provide an object whose functions to monitor or provide a single function to monitor. You can also set when to install the tracers, either upon page initialization or every time new JS code is being introduced — that way you won’t miss anything.

(Since this ended up being that simple, I may add a user option to the finished system to allow adding more namespaces and functions to trace, so that traces can become more context-aware and optimized when dealing with custom namespaced code.)

Then came the fun part, going over mountains of code and documentation and determining which functions to track. And that’s assuming that you’ve got access to the prototype object so that you can patch it with tracing code, which you don’t always have, but more on that later.

Taint analysis code

We’ll use the following code to run the analysis for each example:

The test server doesn’t have any meaningful server-side logic so I’ll skip posting the code, however the examples will include the client-side code, which is what we care about here.

FYI, you’ll see some weird things in the client-side code, like high line numbers (caused by the injected JS monitoring code being prepended to the existing page) and some injected calls to that code now and then to update the trackers. It’s a bit crude but gets the job done.

Supported sinks

I won’t go through every currently supported sink (traced function) in detail, because it’s way to early for a definitive list and because this post would end up being several pages longer, but I’ll present a few examples from each category to give you a rough idea.

Global functions

Right now, the only namespace which is fully traced (i.e. not targeted tracers for specific functions, but for every one that appears under it) is the window one. This is suboptimal but necessary, in order not to miss user-defined global functions, usually helpers in one way or another.

Analysis result

Each entry in the array is a logged sink, i.e. a function with the taint landing in its arguments, accompanied by an execution-flow trace (a fancy stacktrace basically).

There is some duplication between the sinks and their traces, but you never know which sink will be the last, so it’s good to have this. Also, in cases where the taint follows multiple flows, all the available data can be used to create graphs for each flow.

DOM prototypes

These are the native, browser-exposed APIs which are being tracked, so let’s pick HTMLElement.insertAdjacentHTML() as an example.

Analysis result

jQuery

Basically, all you need is the ability to track the DOM prototypes, since every DOM manipulation procedure eventually comes down to these. From there, the execution-flow traces accompanying the sink data will let you know how the relevant DOM method got called.

However, this can get a bit tedious if using a JS framework such as jQuery. Presumably, you’re using these frameworks to make your job easier, but they will create huge traces which can be cumbersome to review — especially when using minified versions, where the source code and a line number is meaningless.

For that reason, jQuery specific trackers have been introduced.

Analysis result

Of course, you still get the down-to-the-native-DOM sink and trace, but you also get the relevant jQuery call by itself first, so that you can tell what’s going on at a glance.

AngularJS

AngularJS is basically an MVC framework written in JavaScript and does a lot of wonderful things to make it easy to create Single-Page Applications. It removes the hassle of DOM manipulation from the development process and quietly handles it in the background and also provides a boatload of other features as well.

In order to do that manipulation though it uses either jQuery (if available) or its own library (called JQLite) which is a jQuery-API-compatible subset of jQuery.

There are a few issues though. First of all, the entire codebase uses strict mode which imposes a few restrictions with regards to access to things like function caller information and arguments. More importantly, all prototypes are private in scope, so getting access to them in order to install tracers does seem like a bitch (although it may not be, I may be missing something).

I’ve managed to add tracing to JQLite, so let’s have a look.

Analysis result

The restrictions aren’t apparent in this case, but if there was a trace within AngularJS code, you’d only have access to the url and line data, no source or full list or arguments for anything but the sink, and that’s because of strict mode.

AngularJS does a lot of things for you, so I’ll try to tap into its facilities to make the trace data more context-aware when it comes to AngularJS applications. Or that may be unnecessary, I’m not sure yet.

Network data tracing

One thing I didn’t showcase here was that besides DOM manipulation, the XMLHttpRequest API is being traced as well (and its jQuery helper — the AngularJS one is next), so you don’t only get to inspect intra-application communications, but external ones as well. And if you also consider that all Browser communication passes through a custom proxy, it means that all network data can be traced — and that’s a lot of coverage.

Summing up

To sum up, this has been a long post and I’m now hungry, so see ya.

, , , , , , , , , , ,

About Tasos Laskos

CEO of Sarosys LLC, founder and lead developer of Arachni.

2 Responses to "Arachni::Browser: JS Data flow taint tracing"

  • Nabil Stendardo
    January 2, 2017 - 8:51 pm Reply

    Hi,
    I know your focus is on “how to secure your web application”, but I think that data flow taint tracing can also be useful for a privacy-concerned browser user (such as if implemented in a browser extension), in that it can notify the user of what variables (such as Window.outerWidth, etc.) had some sort of an impact in Ajax requests that were sent to the server, which can alert the user of browser fingerprinting. We also need to take into account dynamic addition of script tags to the page (and then be able to track the variables that were used in order to generate the script tag).

    • Tasos Laskos
      January 2, 2017 - 9:16 pm Reply

      I guess you could do that but like you said, it’s out of the scope of this project.

Leave a Reply