- "To this end, we analyze a set of 1,273 real-world vulnerabilities con- tained on the Alexa Top 10k domains using a specically designed architecture, consisting of an infrastructure which allows us to persist and replay vulnerabilities to ensure a sound analysis"
- "In doing so, we nd that although a large portion of all vulnerabilities have a low complexity rating, several incur a signicant level of complexity and are repeatedly caused by vulnerable third-party scripts."
- "and nd that the root causes for Client-Side Cross-Site Scripting range from unaware developers to in- compatible rst- and third-party code."
- "this class of vulnerability occurs if user-provided input is insecurely processed on the client side, e.g., by using this data for a call to document.write,"
- "Like most security vulnerabilities, client-side XSS is caused by insecure coding"
- "The DOM and the current JavaScript engines offer many different methods to turn arbitrary strings into executable code.Therefore, the use of insecure APIs seems natural to an average developer for solving common problems like the interaction with the page (e.g. using innerHTML)or parsing JSON (e.g. using eval)."
- "The combination of the browser's Document Object Model API (or DOM API ), the highly dynamic nature of JavaScript, and the process in which Web content is assembled on the fly within the browser, frequently leads to non-obvious control and data ows that can potentially cause security problems"
- "From a conceptual standpoint, XSS is caused when an unfiltered data ow occurs from an attacker-controlled source to a security-sensitive sink."
- "In the concrete case of client-side XSS, such a source can be, e.g., the URL, whereas an example for a sink is eval or document.write. Both these APIs accept strings as parameters, which are subsequently parsed and executed as code (JavaScript and HTML, respectively). Therefore, passing attacker-controllable input to such functions eventually leads to execution of the attacker-provided code"
- "There are additional sinks, which do not allow for direct code execution (such as cookies or Web Storage)."
- "In order to spot a vulnerability, an analyst has to follow the data flow from source to sink and fully understand all operations that are performed on the data, with several properties increasing the diffculty of this task."
- "An analyst therefore has to decide whether the user-provided data is filtered or encoded properly and, thus, must understand all operations conducted on that data."
- "JavaScript, not unlike any other programming language, employs the concept of functions, which can be used to split up functionality into smaller units. While this is best practice in software engineering, it increases the difficulty a security auditor has to overcome as he has to understand specifically what each of these units does"
- "In order to understand that a certain flow constitutes a vulnerability, an analyst has to inspect all the code between the source and respective sink access."
- "Manual identification of vulnerable data flows in case of NLDFs is significantly harder, as no obvious relationship between the tainted data and at least one of the flow's functions exist"
- "In the context of this paper, we consider a data flow to be linear (LDF), if on the way from the source to the sink, the tainted value is always passed to all involved functions directly, i.e., in the form of a function parameter..In consequence, a non-linear data ow (NLDF) includes at least one instance of transporting the tainted value implicitly, e.g., via a global variable or inside a container object."
- "Furthermore, non-linear control flows (NLCF) are instances of interrupted JavaScript execution: A first JavaScript execution thread accesses the tainted data source and stores it in a semi-persistent location, such as a closure, event handler or global variable, and later on a second JavaScript thread uses the data in a sink access."
- "Instances of NLCFs can occur if the flow's code is distributed over several code contexts, e.g., an inline and an external script, or in case of asynchronous handling of events."
- "In terms of origin of the code involved in a flow, we differentiate between three cases: self-hosted by the Web page, code which is only hosted on third-party pages, and a mixed variant of the previous,where the ow traverses both self- hosted and third-party code."
- "Multiflows: A single sink access may contain more than one piece of user-provided data. This leaves an attacker with a means of splitting up his malicious payload to avoid detection. As we [27] have shown, given the right circumstances, such flows can be used to bypass existing filter solutions such as Chrome's XSSAuditor [1]."
- "Through the use of the eval function, JavaScript code can be dynamically created at runtime and executed in the same context"
- "Modern Web applications with complex client-side code often utilize minification to save bandwidth when delivering JavaScript code to the clients. In this process, space is conserved by removing white spaces as well as using identifier renaming."
- "The browser was enhanced to track data originating from sources, across all processing steps in the SpiderMonkey JavaScript engine as well as the Gecko rendering engine, and into sinks"
- "This includes access to source, calls to both built-in and user-dened functions which operate on a tainted string, as well as stack traces for each operation to allow for execution context analysis."
- "Note, that a linear data flow cannot occur with a non-linear control flow, since this implies no relation between source and sink accessing operations."
- "As outlined in Section 4.3, the automatic encoding behavior of data retrieved from the document.location source varies between browsers:"
- "but rather are either a combination of incompatible first- and third-party code or even caused completely by third-party libraries. This paradigm is enabled by the Web's programming model, which allows for third-party code to be included in a Web page, gaining full access to that page's DOM."
- "In our study, we found that 732 exploitable ows ended in document.write, 495 in innerHTML and remaining 46 in eval and its derivatives."
- ""