Friday, October 09, 2009

Indepent Study: Defending against Injection Attacks through Context-Sensitive String Evaluation

(This is one of a series of posts about papers I'm reading for an independent study with Prof. Evan Chang at the University of Colorado, Boulder. The format is similar to that of a review of a paper submitted to a computer science conference. There are already-published papers, so I'll be writing with the benefit of hindsight, especially when the paper was published at least several years ago.)

Submission: Defending against Injection Attacks through Context-Sensitive String Evaluation [PDF]

Please give a brief, 2-3 sentence summary of the main ideas of this paper:

What is the strength of this paper (1-3 sentences):

Its strength is that the authors present a simple but general analysis of all kinds of injection attacks (e.g. SQL, shell, and others), and implement a system for preventing and detecting those attacks. Their system is completely automated, requiring the programmer to make no decisions (and hence make no mistakes), and its generality is extremely appealing.
What is the weakness of this paper (1-3 sentences):
The runtime overhead of CSSE is a bit steep.
Evaluation:
Excellent. This paper should be presented at POOPSLA '99!
Novelty:
All told, the approach the authors take to solving the problem of injection attacks is similar to Perl's taint mode, but the context-appropriate escaping mechanism is unique to CSSE.
Convincing:
I'm convinced.
Worth solving:
Definitely. And I love the way they've solved it. It's general and it requires no programmer input.
Confidence:
High.
Detailed comments:

One thing I should note about this paper is that it is beautifully written. Killer paragraph (but you have to ignore the superfluous comma after "prevention method"):
This paper introduces Context-Sensitive String Evaluation (CSSE), which is an intrusion detection and prevention method, for injection attacks. It offers several advantages over existing techniques: it requires no knowledge of the application or application source code modifications and can therefore also be used with legacy applications. It is highly effective against most types of injection attacks, not merely the most common ones. It does not rely on the application developer, which makes it less error-prone. Finally, it is not tied to any programming language and can be implemented on a variety of platforms.
Asking programmers to help validate the security of their application -- as in last week's paper's attempt to disambiguate the purpose of regular expressions by prompting the programmer for input -- is invariably bound to fail.

The authors analyze injection attacks in general, SQL and shell e.g., focusing on how these attacks exploit assumptions about the syntactic content of user input.

More great paragraphs:
A common property of injection vulnerabilities is the use of textual representations of output expressions constructed from user-provided input. Textual representations are representations in a human-readable text form. Output expressions are expressions that are handled by an external component (e.g., database server, shell interpreter).

User input is typically used in the data parts of output expressions, as opposed to developer-provided constants, which are also used in the control parts. Therefore, user input should not carry syntactic content. In the event of an injection attack, specially crafted user input influences the syntax, resulting in a change of the semantics of the output expression. We will refer to this process as mixing of control and data channels.
The authors define a framework for understanding the sundry injection attacks in more general terms, identifying sets of input and output vectors. For most web applications, there's only a single input vector, HTTP operations. The output vectors for SQL injection attacks are the execution of SQL statements against a database, and for command injection attacks, the output vector is a call to execute a command, such as with system() or exec().

They describe existing approaches to this problem as either safe ad-hoc serialization or serialization APIs. Safe ad-hoc serialization includes manual input validation (i.e. the programmer is solely responsible for validating the safeness of the input), automated input validation (e.g. MagicQuotes in PHP), and variable tainting (e.g. Perl's -T flag), and, lastly, the approach of SQLrand, which requires that all SQL commands executed by an application must be encoded as constants in the application. Serialization APIs include DOM APIs for XML and, for SQL, any API which requires prepared statements. Examples of the latter are Java's PreparedStatement and the prepare_statement method of Perl's DBI module.

The authors propose to assign metadata to all strings in a program in order to track its origin. Strings read from a TCP/IP socket are tagged as untrusted. Strings that are constants in the source code are tagged as trusted. Their system, Context Sensitive String Evaluation (CSSE), tracks the untrusted string fragments at runtime. When an untrusted fragment is included in an expression passed to a function which interacts with the external resources (e.g. mysql_query(), exec()), CSSE can escape the untrusted fragment in a context-appropriate way (e.g. escape SQL in the case of mysql_query() and escape shell in the case of exec()), block the request, or raise an alarm.

This feature can be implemented using Aspect-Oriented Programming (AOP), but the authors note that at the time of their writing the AOP library for PHP did not support the interception of string operations, which is necessary to implement CSSE.

No comments: