Friday, October 30, 2009

Notes on Content Security Policy

It seems I learn better when I write things down, so I'm taking notes as I read the spec for Content Security Policy.

  • It's opt-in on a per-site basis.
  • It is initially activated in the browser by the presence of an X-Content-Security-Policy header field in an HTTP response. The value of the header field must be either contain a policy specification or a policy-uri field which denotes the URI from which the browser should fetch the policy.
  • The header field must not be in the trailer headers (i.e. it must be at the top of the HTTP response). I surmise the purpose of this constraint is that existing browsers may evaluate inline JavaScript as they can, so if the X-Content-Security-Policy field is in the trailer, it's too late.
  • There are two URI types in CSP: policy-uri and report-uri. The former defines a URI from which a security policy must be fetched. The latter defines a URI to which violations of the policy should be reported (using e.g. an HTTP POST).
  • This is interesting. If there's more than one X-Content-Security-Policy in a response, the browser complies with the intersection of the policies.
  • If there's more than one report-uri, the browser reports violations to each unique URI — if there are duplicate URIs, the browser only sends one report to it.
  • A policy-uri or report-uri is only legal if it complies with the conventional same-origin policy — that is, if the URI refers to the same scheme/host/port as the page itself.
  • Inline JavaScript won't execute when CSP is enabled. The presence of inline JavaScript in a page for which CSP is in effect is a violation and causes a report to be sent to each report-uri.
  • Eval and any other mechanism for creating code from data (e.g. new Function("i'm evil code masquerading as data")) are not allowed to execute. They trigger a report to the report-uri, too.
  • CSP has options for stating different sources for different media types (e.g. img-src for images, media-src for audio/video, script-src for JavaScript, object-src for applets and the like, frame-src for frame and iframe elements, font-src for fonts, xhr-src for XMLHttpRequest, style-src for stylesheets)
The spec also contains examples of policy definitions.

Friday, October 16, 2009

Independent Study: Web Application Security

(This is one of a series of posts about papers I'm reading for an independent study with Prof. Evan Chang at the University of Colorado, Boulder. The format is similar to that of a review of a paper submitted to a computer science conference. There are already-published papers, so I'll be writing with the benefit of hindsight, especially when the paper was published at least several years ago.)

Last week and the week before, I read papers which analyzed and proposed solutions for injection attacks in dynamic languages. This week, instead of reading a paper, I'm digging around for security trends related to JavaScript in the browser and dynamic languages on the server. Not much seems to have changed since the Spectator paper was published. There are still JavaScript worms, and the attackers are still using fancy tricks to subvert the filters of the web site operators. Compare, for instance, these technical descriptions of the original MySpace worm and the quite recent Reddit worm. So all-in-all, not much is new. However, reading this reportfrom the Web Application Security Consortium, I did run across what is, to me, a new kind of attack — HTTP response splitting — which may warrant further investigation. I suspect it is the case that existing taint mode techniques can be appropriately applied to HTTP response splitting, but it would be worthwhile to verify.


For Dynamic Languages

Douglas Crockford's slides on JavaScript security
Ruby on Rails Security Project
Python Security Advisories

General Internet Security

Web Application Security Consortium
Security Focus
Common Vulnerabilities and Exposures
SANS Storm Center

To get a flavor of US-CERT data, here are 2009 current activity reports:

Web Application Security Consortium feed
Security Focus feeds
Common Vulnerabilities and Exposures
SANS Internet Storm Center
SANS: @RISK: The Consensus Security Vulnerability Alert

File Under Very Useful

The Web Application Security Consortium has statistics (which appear to be actively maintained) on website vulnerabilities. The WASC describes the data as the result of "a collaborative industry wide effort to pool together sanitized website vulnerability data and to gain a better understanding about the web application vulnerability landscape."

For tracking web application (i.e. web app framework and web browser) security vulnerabilities, SANS's @RISK: The Consensus Security Vulnerability Alert seems to be quite useful. It compiles reports from a number of commercial security sources. Here's an example of the web application section of a recent report:
Web Application - Cross-Site Scripting
Web Application - SQL Injection
Web Application


The advice everyone who knows anything gives to anyone who wants to be safer online is to use NoScript. I use it. You should too. But it's only for Firefox, not all the other browsers out there, and discretionary plugins only get adopted so far. That leaves a whole lot of browsers (many of which have unpatched vulnerabilities) running on the desktops and laptops of the world. Further, most Internet users aren't sophisticated enough to know when they need to enable JavaScript, and since there's not a WWW cop to enforce the unobtrusive use of JavaScript, it's just easier for people to allow JavaScript from every site on the web, which gets us back to square one.

A few weeks ago I reviewed a system for detecting and containing JavaScript worms which mentions the MySpace JavaScript worm. Here are some more recent incidents. The Reddit incident was only a few weeks ago.
Somebody's made a javascript worm
source code for the reddit/firefox [sic] exploit

JavaScript worm from late 2007 happily frolicking in 2008
JavaScript worm still spreading, infection origin unknown

More on Orkut worm

JavaScript worm targets Yahoo!

I'm Popular
Technical explanation of the MySpace Worm
Buffer Overflows, Oh My!

Because they manage memory on behalf of the programmer, dynamic languages may be thought of being invulnerable to buffer overflow attacks. However, the runtimes of some dynamic languages are implemented in C, which is itself subject to buffer overflow attacks, so programs executing in such runtimes may themselves be vulnerable. This is illustrated by these Ruby, Perl, Python, and PHP vulnerabilities, all reported in 2008.

The same is true of JavaScript running in Firefox, Internet Explorer, and WebKit/Safari.

Friday, October 09, 2009

Indepent Study: Defending against Injection Attacks through Context-Sensitive String Evaluation

(This is one of a series of posts about papers I'm reading for an independent study with Prof. Evan Chang at the University of Colorado, Boulder. The format is similar to that of a review of a paper submitted to a computer science conference. There are already-published papers, so I'll be writing with the benefit of hindsight, especially when the paper was published at least several years ago.)

Submission: Defending against Injection Attacks through Context-Sensitive String Evaluation [PDF]

Please give a brief, 2-3 sentence summary of the main ideas of this paper:

What is the strength of this paper (1-3 sentences):

Its strength is that the authors present a simple but general analysis of all kinds of injection attacks (e.g. SQL, shell, and others), and implement a system for preventing and detecting those attacks. Their system is completely automated, requiring the programmer to make no decisions (and hence make no mistakes), and its generality is extremely appealing.
What is the weakness of this paper (1-3 sentences):
The runtime overhead of CSSE is a bit steep.
Excellent. This paper should be presented at POOPSLA '99!
All told, the approach the authors take to solving the problem of injection attacks is similar to Perl's taint mode, but the context-appropriate escaping mechanism is unique to CSSE.
I'm convinced.
Worth solving:
Definitely. And I love the way they've solved it. It's general and it requires no programmer input.
Detailed comments:

One thing I should note about this paper is that it is beautifully written. Killer paragraph (but you have to ignore the superfluous comma after "prevention method"):
This paper introduces Context-Sensitive String Evaluation (CSSE), which is an intrusion detection and prevention method, for injection attacks. It offers several advantages over existing techniques: it requires no knowledge of the application or application source code modifications and can therefore also be used with legacy applications. It is highly effective against most types of injection attacks, not merely the most common ones. It does not rely on the application developer, which makes it less error-prone. Finally, it is not tied to any programming language and can be implemented on a variety of platforms.
Asking programmers to help validate the security of their application -- as in last week's paper's attempt to disambiguate the purpose of regular expressions by prompting the programmer for input -- is invariably bound to fail.

The authors analyze injection attacks in general, SQL and shell e.g., focusing on how these attacks exploit assumptions about the syntactic content of user input.

More great paragraphs:
A common property of injection vulnerabilities is the use of textual representations of output expressions constructed from user-provided input. Textual representations are representations in a human-readable text form. Output expressions are expressions that are handled by an external component (e.g., database server, shell interpreter).

User input is typically used in the data parts of output expressions, as opposed to developer-provided constants, which are also used in the control parts. Therefore, user input should not carry syntactic content. In the event of an injection attack, specially crafted user input influences the syntax, resulting in a change of the semantics of the output expression. We will refer to this process as mixing of control and data channels.
The authors define a framework for understanding the sundry injection attacks in more general terms, identifying sets of input and output vectors. For most web applications, there's only a single input vector, HTTP operations. The output vectors for SQL injection attacks are the execution of SQL statements against a database, and for command injection attacks, the output vector is a call to execute a command, such as with system() or exec().

They describe existing approaches to this problem as either safe ad-hoc serialization or serialization APIs. Safe ad-hoc serialization includes manual input validation (i.e. the programmer is solely responsible for validating the safeness of the input), automated input validation (e.g. MagicQuotes in PHP), and variable tainting (e.g. Perl's -T flag), and, lastly, the approach of SQLrand, which requires that all SQL commands executed by an application must be encoded as constants in the application. Serialization APIs include DOM APIs for XML and, for SQL, any API which requires prepared statements. Examples of the latter are Java's PreparedStatement and the prepare_statement method of Perl's DBI module.

The authors propose to assign metadata to all strings in a program in order to track its origin. Strings read from a TCP/IP socket are tagged as untrusted. Strings that are constants in the source code are tagged as trusted. Their system, Context Sensitive String Evaluation (CSSE), tracks the untrusted string fragments at runtime. When an untrusted fragment is included in an expression passed to a function which interacts with the external resources (e.g. mysql_query(), exec()), CSSE can escape the untrusted fragment in a context-appropriate way (e.g. escape SQL in the case of mysql_query() and escape shell in the case of exec()), block the request, or raise an alarm.

This feature can be implemented using Aspect-Oriented Programming (AOP), but the authors note that at the time of their writing the AOP library for PHP did not support the interception of string operations, which is necessary to implement CSSE.

Sunday, October 04, 2009

Independent Study: Static detection of security vulnerabilities in scripting languages

(This is one of a series of posts about papers I'm reading for an independent study with Prof. Evan Chang at the University of Colorado, Boulder. The format is similar to that of a review of a paper submitted to a computer science conference. There are already-published papers, so I'll be writing with the benefit of hindsight, especially when the paper was published at least several years ago.)

Submission: Static detection of security vulnerabilities in scripting languages [PDF]

Please give a brief, 2-3 sentence summary of the main ideas of this paper:

SQL injection and other string-based exploits to which web applications are vulnerable can be detected by performing static analysis on web applications written in dynamic languages. The static analysis is supplemented with information gleaned from the symbolic execution of the source code.
What is the strength of this paper (1-3 sentences):
Techniques for automatically detecting SQL injection attacks in web applications written in dynamic languages are sorely needed.
What is the weakness of this paper (1-3 sentences):
I am skeptical of the usefulness of the interactive mode of the checker -- which is triggered when regular expressions are used to validate unsafe data -- for the average PHP programmer. Also, while the authors refer to Perl's taint mode (man perlsec) as an alternative way of sanitizing data, it would be useful if they were to compare the effectiveness of their approach to Perl's built-in approach.
Symbolic execution has been used before in the DART paper, but there the purpose was to determine what input values would cause a statically-typed program to take certain paths during automated testing; here the purpose is to determine whether any memory locations are untrusted.
The authors describe a checker that is effective at detecting SQL injection vulnerabilities.
Worth Solving
This problem is worth solving. It is all too easy for programmers to fail to untaint input received from the user of a web application, so a reliable, automated way of detecting such exploits is necessary.
Reasonably confident
Detailed Comments
Analysis starts with block-level symbolic execution, which generates a block summary. Intraprocedural analysis takes block summaries (a six-tuple, described below) as input and generates a four-tuple, which is consumed by the intraprocedural analysis phase.

The use of symbolic execution here reminds me of the DART paper. Here symbolic execution is used to understand the functioning of a program written in a dynamic language. In the DART paper, it was used to force a statically typed language to take different paths during automated testing.

Block Analysis

At the block level, the code is executed symbolically, and the resulting summary is used to perform analysis at intra- and interprocedural levels. Using a summary at the higher levels expedites the analysis.

The authors define a language to model what they believe is an appropriate subset of PHP for detecting SQL injection attacks with their simulator (i.e. the component of their checker which symbolically executes blocks of PHP code).

They devote particular attention to how they model strings, because strings are such essential types in dynamic languages: "Strings are typically constructed through concatenation. For example, user inputs (via HTTP get and post methods) are often concatenated with a pre-constructed skeleton to form an SQL query.... String values are represented as an unordered concatenation of string segments, which can be one of the following: a string constant, the initial value of a memory location on entry to the current block (l_0), or a string that contains initial values of zero or more elements from a set of memory locations (contains(sigma))." The latter part of the definition of strings in this model allows the cheker to track the flow of untainted data through a web application.

The motivation for and definition of untaint (as related to the definition of the Boolean type) in the modelling language is unclear to me.

The untainting of strings "occur[s] via function calls, casting to safe types (e.g. int, etc), regular expression matching (!), and other types."

The result of the block-level analysis is a six-tuple consisting of an error set ("the set of input variables that must be sanitized before entering the current block"), definitions ("the set of memory locations defined in the current block"), value flow ("the set of pairs of [memory] locations (l_1, l_2) where the string value of l_1 on entry becomes a substring of l_2 on exit"), termination predicate (whether the current block causes the program to exit), return value (undefined if and only if the termination predicate is true), and an untaint set (the set of [memory] locations that are sanitized by the current block, for each of the block's successors).

Intraprocedural Analysis

This phase of the analysis uses the six-tuple block summaries generated by the previous phase to generate a four-tuple consisting of an error set ("the set of memory locations ... whose value may flow into a database query, and therefore must be sanitized before invoking the current function"), return set ("the set of parameters or global values that may be a substring of the return value" of the function), sanitized values ("the set of parameters or global variables that are sanitized on function exit"), and program exit ("whether the current function terminates program execution on all paths").

Interprocedural Analysis

This phase involves using the previously-generated function-level tuple to substitute actual for formal parameters in the error set and marking memory locations as safe when they are unconditionally untainted. It also involves the use of the Boolean-related notion of untaint that I still don't understand.

In what order are functions analyzed? "Our algorithm analyzes the source codebase in topological order based on the static function call graph." Recursion doesn't compute fixed-point; the system inserts a no-op summary when it encounters recursion.

Since regular expressions are self-contained automata, little computational devices, I was surprised when the authors remarked that strings can be marked as untainted when they are checked by regular expressions. It sounded almost magical. It's not quite that. "Some regular expressions match well-formed input while others detect malformed input; assuming one way or the other results in either false positives or false negatives.... To make it easy for the user to specify the sanitization effects of regular expressions, the checker has an interactive mode where the user is prompted when the analysis encounters a previously unseen regular expression and the user's answers are recorded for future reference."

The authors mention the built-in Perl taint mode (man perlsec). This suggests that the proper way of implementing a checker like the one described here is to integrate it into the language runtime.