Definition

This is software that runs on the server, and attempts to allow benign requests from web trac, while detecting and blocking malicious requests. An example of an open-source Web Application Firewall is ModSecurity4 with OWASP Common Rule Set" - Script Gadgets
are request filtering mitigations deployed as hardware in front of web servers, as well as as software next to the web server itself. - Script Gadgets

--------- Mod Security ---------

is an open-source Web Application Firewall, commonly used with the OWASP Core Rule Set." - Script Gadgets

--------- Web Application Firewalls ---------

This is software that runs on the server, and attempts to allow benign requests from web trac, while detecting and blocking malicious requests. An example of an open-source Web Application Firewall is ModSecurity4 with OWASP Common Rule Set" - Script Gadgets
are request filtering mitigations deployed as hardware in front of web servers, as well as as software next to the web server itself. - Script Gadgets

========= Cross-Origin Resource Sharing - [2009] =========

is an extension of the XMLHttpRequest API to allow cross-origin content in the browser through explicit authorization" - Careful Who You Trust
is proposed to olve the problems of JSON-P, and to provide a protocol support of authorized access cross-origin network resources." - Empirical Study of CORS
is an access control model regulating access to cross-origin network resources - including sending requests and reading responses] between browsers and servers." - Empirical Study of CORS
is the most important access control mechanism to segregate static contents and active scripts from different origins."- Cookies Lack Integrity
is a set of server-side headers, enforced by the client, which allow a server to allow read access from JavaScript" - Careful Who You Trust

========= Origin =========

is a 3-tuple, consisting of the scheme, the domain and the port number" - Cookies Lack Integrity
for a given URL is defined by a 3-tuple: scheme - or protocol], e.g. HTTP or HTTPS, domain - or host], and port - not supported by IE]" - Cookies Lack Integrity
the triple consisting of scheme, host, and port of the involved resources." - XSSI
The combination of protocol, subdomain - or hostname], and port constitutes an origin as defined by RFC 6454 - 3]." - Site Policy
is defined as the triple consisting of scheme, host, and port of the involved resources" - Privacy Breach by Exploiting postMessage

========= Same-Origin Policy =========

only permits to exchange data with other documents sharing the same protocol, host, and port"- Careful Who You Trust
guards web resources from being accessed by scripts from another origin" - Empirical Study of CORS
defines the security boundary of a resource by its origin, the URI scheme/host/port tuple" - Empirical Study of CORS
specifies trust by URI"- RFC The Web Origin Concept
restricts access to resources as soon as the origin - protocol, host, and port] differs from the requesting page’s own values" - Careful Who You Trust
defines how code from mutually untrusted principals are separated." - ZigZag
automatically prevents client-side code from distinct origins from interfering with each others’ code and data" - ZigZag
is used to denote a complex set of rules which governs the interaction of different Web Origins within a web application" - SOP: Evaluation in Modern Browsers
strongly separates mutually distrusting Web content within the Web browser through origin-based compartmentalization" - XSSI
allows a given JavaScript access only to resources that have the same origin" - XSSI
ensures two pages with different origins are not allowed to access each other - e.g., through JavaScript." - Site Policy
the fundamental isolation strategy for client-side web application security." - Empirical Study of CORS
is a corner stone of web security, guarding the web content of one domain from the access from another domain" - Cookies Lack Integrity
is the principal security policy in Web browsers" - XSSI
is a fundamental security mechanism that provides boundaries between Web sites and prevents unauthorized access to sensitive information - Careful Who You Trust
can effectively separate mutually distrusting Web content within the Web browser through origin-based compartmentalization."" - Privacy Breach by Exploiting postMessage
is the baseline defense mechanism implemented in web browsers to provide confidentiality and integrity guarantees for contents provided by unrelated websites. - CSP Semantics Analysis

========= postMessage =========

is a primitive that enables crossorigin communication within the web browser" - Emperor's New API
is a message passing mechanism that can be used for secure communication of primitive strings between browser windows." - Emperor's New API
allows two iframe tags from different origins to communicate." - CORS in Action
is a client-side primitive to enable cross-origin communication at the browser side." - Emperor's New API
aims to provide a simple, purely client-side cross-origin channel for exchanging primitive strings." - Emperor's New API
is an exemption by design to enable cross-origin communication" - pMForce
allows for cross-domain message exchange whenever two sites are rendered in the same browser tab - or popup window]" - Uncovering History We Insecurity
allows to send serialized messages between two documents." - Careful Who You Trust
allows to exchange data across origin and site boundaries" - Careful Who You Trust
is a stringbased message passing mechanism proposed for inclusion in HTML 5" - FLAX]
enables applications to communicate with each other purely within the browser, and are not subject to the classical same origin policy - SOP]." - ZigZag
enables web content from different origins being exchanged between different service providers." - Privacy Breach by Exploiting postMessage
is a browser API designed for interframe communication" - Securing Frames
The Web’s most basic security policy" - Careful Who You Trust
is a fundamental security mechanism that provides boundaries between Web sites and prevents unauthorized access to sensitive information" - Careful Who You Trust
creates a security barrier around an application which is bounded by the origin" - Careful Who You Trust
allows sending serializable JavaScript objects from one frame to another" - pMForce
enables a script to send a message to a window regardless of their respective origins" - The Postman Always Rings Twice
allowing a script to send a string to any window in the same or different origin" - The Postman Always Rings Twice
is a stringbased message passing mechanism proposed for inclusion in HTML 5." - FLAX]

* Web Storage *

two persistent storage abstractions" - Emperor's New API
is a mechanism that allows a Web application to store structured data within the user’s Web browser via Javascript." - WebStorage-driven Content Caching
is a mechanism that allows a piece of Javascript to store structured data within the user’s browser" - WebStorage-driven Content Caching
is, thereby, an umbrella term for two related functionalities - SessionStorage and LocalStorage." - WebStorage-driven Content Caching
summarizes a set of browser-based technologies that allow application-level persistent storage of key/values pairs on the client-side" - WebStorage-driven Content Caching
is a specification that allows web applications to create a persistent key-value store in the browser, the content of which is maintained either until the end of a session - Session Storage], or beyond - Local Storage]"- Client-side storage APIs

========= Client-side validation vulnerability =========

represent bugs in JavaScript programs that allow for unauthorized actions via untrusted input." - Don’t Trust The Locals]
arise from unsafe usage of untrusted data in the client-side code of the web application that is typically written in JavaScript." - FLAX]
as one which results from unsafe usage of untrusted data in the client-side code of the web application." - FLAX]
a programming bug which results from using untrusted data in a critical sink operation without sufficient validation" - FLAX]

* Local Storage *

is a key/value store tied to an application’s origin." - Emperor's New API
is only one of many ways to persist data across multiple HTTP requests as Cookies, WebStorage or the File API exist nowadays.." - Beyond XSS Auditor]
is persistent across sessions, while data within SessionStorage is discarded whenever the corresponding session is closed." - WebStorage-driven Content Caching

* Cookie *

simple key-value stores used by browsers to persist small pieces of string data, which are sent along in every HTTP request to matching servers" - Don’t Trust The Locals]
is a short piece of data that a website sends to a visiting client, either via HTTP response headers or by using client-side scripting"- client-side storage APIs
are a browser-side assisted state management mechanism that are pervasively used by web applications"- Cookies Lack Integrity
are small pieces of text communicated via an HTTP header or set via JavaScript, which map a key to a value and have optional attributes" - Site Policy

* Indexed DB *

It defines a JavaScript-based interface for an embedded transactional database system" - Client-side storage APIs
is an object-oriented database" - Client-side storage APIs
is an asynchronous API" - Learn Typescript 3

========= Service Worker =========

is an event-driven and browsermanaged process triggered by the registration of a JavaScript code hosted by a web application, and registered to manage all or part of an application" - The Remote on the Local
is a script that can be registered to control one or more pages of your site. Once installed, a service worker sits outside of any single browser window or tab." - The Remote on the Local

========= robots.txt =========

is called “The Robots Exclusion Protocol”." - A Study of Different Web-Crawler Behaviour

========= Web crawler =========

is software for downloading pages from the Web automatically. It is also called web spider or web robot" - Web Crawler: Extracting the Web Data
are full text search engines which assist users in navigating the web" - Web Crawler: Extracting the Web Data
- also known as a robot or a spider] is a system for the bulk downloading of web pages" - Web Crawling
is a programme or a suit of programmes that is used to retrieve contents of web pages" - Reviews of Web Crawlers
is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashions" - WEB CRAWLER - AN OVERVIEW
also known as spider or web robotis a program that automatically traverses the large numbers of web pages by following hyperlinks, index them and stores the traversed web pages links for prospect use" - A Methodical Study of Web Crawler
is a vital part of the search engine. It is a program that navigates the web and downloads the references of the web pages" - A Methodical Study of Web Crawler

========= Web crawler =========

is software for downloading pages from the Web automatically. It is also called web spider or web robot" - Web Crawler: Extracting the Web Data
are full text search engines which assist users in navigating the web" - Web Crawler: Extracting the Web Data

========= Breadth First Crawler =========

starts with a small set of pages and then explores other pages by following links in the breadth-first - 6] fashion." - Web Crawler: Extracting the Web Data p.3

========= Breadth-First Algorithm =========

It is the simplest crawling strategy. It was designed in 1994. It uses the frontier as the FIFO queue and crawls the links in the order in which they are encountered. It is basically used as a baseline crawler. Main drawback of this approach is that it traverses the URLs in the order in which they are entered into the frontier. It is good to implement this approach if the numbers of pages are less. In real life a lot of useless links are produced by useless pages which results in the wastage of time and memory of the frontier. Therefore a useful page should always be selected from the frontier." - Web Crawler: Extracting the Web Data p.3

========= Blind Traversing Approach =========

Firstly a seed URL is decided and the crawling process is applied. This process is called blind as there is particular method for selecting the next URL from the frontier. Breadth- First search is a very common example of this approach." - Web Crawler: Extracting the Web Data p.3

========= Excluded Content =========

"The site’s robot.txt file needs to be fetched before fetching a page from the site so that it can be determined whether the web master has specified about how much file can be crawled - 9]." - Web Crawler: Extracting the Web Data

========= Form Focused Crawler =========

is software for downloading pages from the Web automatically. It is also called web spider or web robot" - Web Crawler: Extracting the Web Data
are full text search engines which assist users in navigating the web" - Web Crawler: Extracting the Web Data

========= Focused Web crawlers =========

is a type of web crawler that crawls web pages which are specific to a pre-defined topic or domain." - Reviews of Web Crawlers

========= Hidden Web Crawlers =========

A lot of data on the web actually resides in the database and it can only be retrieved by posting appropriate queries or by filling out forms on the web. Recently interest has been focused on access of this kind of data called “deep web” or “hidden web”. Current day crawlers’ crawl only publicly indexable web - PIW] i.e., set of pages which are accessible by following hyperlinks ignoring search pages and forms which require authorization or prior registration. In reality they may ignore huge amount of high quality data, which is hidden behind search forms." - Web Crawler: Extracting the Web Data

========= Parallel Crawlers =========

As the size of the Web grows, it becomes more difficult to retrieve the whole or a significant portion of the Web using a single process. Therefore, many search engines often run multiple processes in parallel to perform the above task, so that download rate is maximized. This type of crawler is known as a parallel crawler." - Web Crawler: Extracting the Web Data

========= Distributed Web Crawler =========

G.Distributed Web Crawler: This crawler runs on network of workstations. Indexing the web is a very challenging task due to growing and dynamic nature of the web. As the size of web is growing it becomes mandatory to parallelize the process of crawling to finish the crawling process in a reasonable amount of time. A single crawling process even with multithreading will be insufficient for the situation. In that case the process needs to be distributed to multiple processes to make the process scalable. It scales up to several hundred pages per second. The rate at which size of web is growing it is imperative to parallelize the process of crawling. In distributed web crawler a URL server distributes individual URLs to multiple crawlers, which download web pages in parallel. The crawlers then send the downloaded pages to a central indexer on which links are extracted and sent via the URL server to the crawlers. This distributed nature of crawling process reduces the hardware requirements and increases the overall download speed and reliability - 2]. FAST Crawler - 20] is a distributed crawler, used by Fast Search & Transfer." - Web Crawler: Extracting the Web Data

========= Politeness policy=========

Crawling algorithms should be designed in such a way that only one request is send to the server at a time. For this purpose, a politeness delay needs to be inserted between the requests. This will help reduce the risks." - Web Crawler: Extracting the Web Data
It limited the rate of requests to each site, it allowed web sites to exclude themselves from purview through the nascent robots exclusion protocol, and it provided a “black-list” mechanism that allowed the crawl operator to exclude sites" - Web Crawling
states how to avoid overloading Web sites" - Web Crawling

========= Duplicate Content =========

Crawlers should be able to recognize and eliminate duplicate data available on different URLs. Methods like checksum, visitor counter, fingerprinting etc. are needed for this purpose." - Web Crawler: Extracting the Web Data

========= Continuous Crawling =========

Carrying out full crawling after regular intervals is not a beneficial approach to follow. This results in low-value and static pages" - Web Crawler: Extracting the Web Data

========= Duplicate Content =========

Crawlers should be able to recognize and eliminate duplicate data available on different URLs. Methods like checksum, visitor counter, fingerprinting etc. are needed for this purpose." - Web Crawler: Extracting the Web Data

========= Best First Heuristic Approach =========

This was developed in 1998 to overcome the problems of blind traversing approach. The links are selected from the frontier on the basis of some estimation, score or priority. Always the best available link is opened and traversed. Various mathematical formulas are also used." - Web Crawler: Extracting the Web Data

========= Duplicate Content =========

Crawlers should be able to recognize and eliminate duplicate data available on different URLs. Methods like checksum, visitor counter, fingerprinting etc. are needed for this purpose." - Web Crawler: Extracting the Web Data

========= Parallelization policy =========

states how to coordinate distributed Web crawlers." - WEB CRAWLER - AN OVERVIEW

========= Selection policy =========

states which pages to download." - WEB CRAWLER - AN OVERVIEW

========= Re-visit policy =========

states when to check for changes to the pages." - WEB CRAWLER - AN OVERVIEW

######### Obfuscation #########

is to make a program unintelligible while preserving its functionality" - Hiding in Plain Site
when the intentional behavior of a script cannot be fully realized until execution." - Hiding in Plain Site
if the interactions of code with its underlying system cannot be deduced from static analysis of its source code." - Hiding in Plain Site

######### Minification #########

rewriting the source code to be more compact without changing its functionality." - Hiding in Plain Site

========= DOM clobbering =========

allows markup to override variables in JavaScript execution environment, making it possible to trigger specific script behavior" - Script Gadgets

========= URL Redirection =========

is a popular technique that automatically navigates users to an intended destination webpage without user awareness - Redirection Trail
allows a webpage to be accessible via multiple URLs. - Redirection Trail
is an automatic redirection of one URL to another, usually indicated by the 3xx HTTP status code. - Script Gadgets CSP

--------- Access Control ---------

provides a line of defense to prevent exploits by blocking unauthorized access. - Tracking the Provenance of AC Decisions
is a pervasive security mechanism which is used in virtually all systems. Automated Inference of AC Policies for Web Apps
restricts the subjects (e.g., users and programs) that may perform operations (e.g., read and write) over objects (e.g., files and records). Automated Inference of AC Policies for Web Apps

--------- Broken access control ---------

is a widely recognised security issue in web applications; it leads to unauthorised accesses to sensitive data and system resources. Automated Inference of AC Policies for Web Apps

--------- Insecure Direct Object References ---------

refers to the exposure of direct references to internal resources (such as files). Automated Inference of AC Policies for Web Apps

--------- Broken Authentication and Session Management ---------

relates to the authentication and session management of an access control mechanism. Automated Inference of AC Policies for Web Apps

********* Script Gadgets *********

========= Nonce =========

========= Hash =========

========= Mixed Content =========

********* DOM XSS *********

********* HTML Sanitizers *********

********* Exploit *********

********* Source *********

********* Sink *********

********* Propagation *********

********* Input Validation *********

********* Output encoding *********

********* DOM *********

//////// Dynamic Analysis ////////

//////// Dynamic Taint Analysis ////////

--------- Browser XSS Filters ---------

--------- Web Application Firewalls ---------

--------- Mod Security ---------

--------- Web Application Firewalls ---------

********* Web Storage *********

========= Client-side validation vulnerability =========

********* Local Storage *********

********* Cookie *********

========= Service Worker =========

========= robots.txt =========

========= Web crawler =========

========= Web crawler =========

========= Breadth First Crawler =========

========= Breadth-First Algorithm =========

========= Blind Traversing Approach =========

========= Excluded Content =========

========= Form Focused Crawler =========

========= Focused Web crawlers =========

========= Hidden Web Crawlers =========

========= Parallel Crawlers =========

========= Distributed Web Crawler =========

========= Politeness policy=========

========= Duplicate Content =========

========= Continuous Crawling =========

========= Duplicate Content =========

========= Best First Heuristic Approach =========

========= Duplicate Content =========

========= Parallelization policy =========

========= Selection policy =========

========= Re-visit policy =========

######### Obfuscation #########

######### Minification #########

========= DOM clobbering =========

========= URL Redirection =========

--------- Access Control ---------

--------- Broken access control ---------

--------- Insecure Direct Object References ---------

--------- Broken Authentication and Session Management ---------

* Script Gadgets *

* DOM XSS *

* HTML Sanitizers *

* Exploit *

* Source *

* Sink *

* Propagation *

* Input Validation *

* Output encoding *

* DOM *

* Web Storage *

* Local Storage *

* Cookie *