- "For example, we find that two sanitizers in our test application are not commutative: the order of application matters, only one order is safe, yet both orders appear in our empirical study."
- "Instead, we use binary rewriting of server code to embed a browser model that determines the appropriate browser parsing context when HTML is output by the web application."
- "It is well-known that script-injection attack vectors are highly context-dependent | a string such as expression: alert(a) is innocuous when placed inside a HTML tag context, but can result in JavaScript execution when embedded in a CSS attribute value context."
- "In particular, two sanitizer functions, EcmaScriptStringEncode and HtmlAttribEncode, are applied for the JavaScript string context and the HTML attribute context, respectivel"
- "For instance, EcmaScriptStringEncode simply transforms all characters that can break out of JavaScript string literals (like the " character) to Unicode encoding (\u0022 for "), and, HtmlAttribEncode HTML-entity encodes characters (" for ""
- "The key observation is that applying EcmaScriptStringEncode first encodes the attacker-supplied " character as a Unicode representation \u0022. This Unicode representation is not subsequently transformed by the second HtmlAttribEncode sanitization, because \u0022 is a completely innocous string in the URI attribute value context."
- "For our purposes, we model a web browser as a parser consisting of sub-parsers for several languages"
- "More precisely, we treat the browser as a collection of parsers for different HTML standard-supported languages"
- "Conceptually, parsers for various languages are invoked in stages. After each sub-parser invocation, if a portion of the input HTML document is recognized to belong to another sub-language, that portion of the input is sent to the appropriate sub-language parser in the next stage."
- "As a result, any portion of the input HTML document may be recognized by one or more sub-grammars. Transitions from one sub- grammar to another are restricted through productions involving special transition symbols defined above as T , which is key for our formalization of context"
- "For instance, data recognized as a JavaScript string is subject to Unicode decoding before being passed to the AST."
- "In addition, HTML 5-compliant browsers subject data recognized as a URI to percent-encoding of certain characters before it is sent to the URI parser"
- "The goal of sanitization is typically to remove special characters that would lead to a sub-grammar transition"
- ""
- ""
- ""
- ""
- ""
- ""
- ""
- ""
- ""
- ""
- ""
- ""
- ""
- ""