Securing Your Website With Subresource Integrity

When you load a file from an external server, you’re trusting that the content you request is what you expect it to be. Since you don’t manage the server yourself, you’re relying on the security of yet another third party and increasing the attack surface. Trusting a third party is not inherently bad, but it should certainly be taken into consideration in the context of your website’s security.

A real-world example

This isn’t a purely theoretical danger. Ignoring potential security issues can and has already resulted in serious consequences. On June 4th, 2019, Malwarebytes announced their discovery of a malicious skimmer on the website NBA.com. Due to a compromised Amazon S3 bucket, attackers were able to alter a JavaScript library to steal credit card information from customers.

It’s not only JavaScript that’s worth worrying about, either. CSS is another resource capable of performing dangerous actions such as password stealing, and all it takes is a single compromised third-party server for disaster to strike. But they can provide invaluable services that we can’t simply go without, such as CDNs that reduce the total bandwidth usage of a site and serve files to the end-user much faster due to location-based caching. So it’s established that we need to sometimes rely on a host that we have no control over, but we also need to ensure that the content we receive from it is safe. What can we do?

Solution: Subresource Integrity (SRI)

SRI is a security policy that prevents the loading of resources that don’t match an expected hash. By doing this, if an attacker were to gain access to a file and modify its contents to contain malicious code, it wouldn’t match the hash we were expecting and not execute at all.

Doesn’t HTTPS do that already?

HTTPS is great for security and a must-have for any website, and while it does prevent similar problems (and much more), it only protects against tampering with data-in-transit. If a file were to be tampered with on the host itself, the malicious file would still be sent over HTTPS, doing nothing to prevent the attack.

How does hashing work?

A hashing function takes data of any size as input and returns data of a fixed size as output. Hashing functions would ideally have a uniform distribution. This means that for any input, x, the probability that the output, y, will be any specific possible value is similar to the probability of it being any other value within the range of outputs.

Here’s a metaphor:

Suppose you have a 6-sided die and a list of names. The names, in this case, would be the hash function’s “input” and the number rolled would be the function’s “output.” For each name in the list, you’ll roll the die and keep track of what name each number number corresponds to, by writing the number next to the name. If a name is used as input more than once, its corresponding output will always be what it was the first time. For the first name, Alice, you roll 4. For the next, John, you roll 6. Then for Bob, Mary, William, Susan, and Joseph, you get 2, 2, 5, 1, and 1, respectively. If you use “John” as input again, the output will once again be 6. This metaphor describes how hash functions work in essence.

Name (input)	Number rolled (output)
Alice	4
John	6
Bob	2
Mary	2
William	5
Susan	1
Joseph	1

You may have noticed that, for example, Bob and Mary have the same output. For hashing functions, this is called a “collision.” For our example scenario, it inevitably happens. Since we have seven names as inputs and only six possible outputs, we’re guaranteed at least one collision.

A notable difference between this example and a hash function in practice is that practical hash functions are typically deterministic, meaning they don’t make use of randomness like our example does. Rather, it predictably maps inputs to outputs so that each input is equally likely to map to any particular output.

SRI uses a family of hashing functions called the secure hash algorithm (SHA). This is a family of cryptographic hash functions that includes 128, 256, 384, and 512-bit variants. A cryptographic hash function is a more specific kind of hash function with the properties being effectively impossible to reverse to find the original input (without already having the corresponding input or brute-forcing), collision-resistant, and designed so a small change in the input alters the entire output. SRI supports the 256, 384, and 512-bit variants of the SHA family.

Here’s an example with SHA-256:

For example. the output for hello is:

2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

And the output for hell0 (with a zero instead of an O) is:

bdeddd433637173928fe7202b663157c9e1881c3e4da1d45e8fff8fb944a4868

You’ll notice that the slightest change in the input will produce an output that is completely different. This is one of the properties of cryptographic hashes listed earlier.

The format you’ll see most frequently for hashes is hexadecimal, which consists of all the decimal digits (0-9) and the letters A through F. One of the benefits of this format is that every two characters represent a byte, and the evenness can be useful for purposes such as color formatting, where a byte represents each color. This means a color without an alpha channel can be represented with only six characters (e.g., red = ff0000)

This space efficiency is also why we use hashing instead of comparing the entirety of a file to the data we’re expecting each time. While 256 bits cannot represent all of the data in a file that is greater than 256 bits without compression, the collision resistance of SHA-256 (and 384, 512) ensures that it’s virtually impossible to find two hashes for differing inputs that match. And as for SHA-1, it’s no longer secure, as a collision has been found.

Interestingly, the appeal of compactness is likely one of the reasons that SRI hashes don’t use the hexadecimal format, and instead use base64. This may seem like a strange decision at first, but when we take into consideration the fact that these hashes will be included in the code and that base64 is capable of conveying the same amount of data as hexadecimal while being 33% shorter, it makes sense. A single character of base64 can be in 64 different states, which is 6 bits worth of data, whereas hex can only represent 16 states, or 4 bits worth of data. So if, for example, we want to represent 32 bytes of data (256 bits), we would need 64 characters in hex, but only 44 characters in base64. When we using longer hashes, such as sha384/512, base64 saves a great deal of space.

Why does hashing work for SRI?

So let’s imagine there was a JavaScript file hosted on a third-party server that we included in our webpage and we had subresource integrity enabled for it. Now, if an attacker were to modify the file’s data with malicious code, the hash of it would no longer match the expected hash and the file would not execute. Recall that any small change in a file completely changes its corresponding SHA hash, and that hash collisions with SHA-256 and higher are, at the time of this writing, virtually impossible.

Our first SRI hash

So, there are a few methods you can use to compute the SRI hash of a file. One way (and perhaps the simplest) is to use srihash.org, but if you prefer a more programmatic way, you can use:

sha384sum [filename here] | head -c 96 | xxd -r -p | base64

sha384sum Computes the SHA-384 hash of a file
head -c 96 Trims all but the first 96 characters of the string that is piped into it
- -c 96 Indicates to trim all but the first 96 characters. We use 96, as it’s the character length of an SHA-384 hash in hexadecimal format
xxd -r -p Takes hex input piped into it and converts it into binary
- -r Tells xxd to receive hex and convert it to binary
- -p Removes the extra output formatting
base64 Simply converts the binary output from xxd to base64

If you decide to use this method, check the table below to see the lengths of each SHA hash.

Hash algorithm	Bits	Bytes	Hex Characters
SHA-256	256	32	64
SHA-384	384	48	96
SHA-512	512	64	128

For the head -c [x] command, x will be the number of hex characters for the corresponding algorithm.

MDN also mentions a command to compute the SRI hash:

shasum -b -a 384 FILENAME.js | awk '{ print $1 }' | xxd -r -p | base64

awk '{print $1}' Finds the first section of a string (separated by tab or space) and passes it to xxd. $1 represents the first segment of the string passed into it.

And if you’re running Windows:

@echo off
set bits=384
openssl dgst -sha%bits% -binary %1% | openssl base64 -A > tmp
set /p a= < tmp
del tmp
echo sha%bits%-%a%
pause

@echo off prevents the commands that are running from being displayed. This is particularly helpful for ensuring the terminal doesn’t become cluttered.
set bits=384 sets a variable called bits to 384. This will be used a bit later in the script.
openssl dgst -sha%bits% -binary %1% | openssl base64 -A > tmp is more complex, so let’s break it down into parts.
- openssl dgst computes a digest of an input file.
- -sha%bits% uses the variable, bits, and combines it with the rest of the string to be one of the possible flag values, sha256, sha384, or sha512.
- -binary outputs the hash as binary data instead of a string format, such as hexadecimal.
- %1% is the first argument passed to the script when it’s run.
- The first part of the command hashes the file provided as an argument to the script.
- | openssl base64 -A > tmp converts the binary output piping through it into base64 and writes it to a file called tmp. -A outputs the base64 onto a single line.
- set /p a= <tmp stores the contents of the file, tmp, in a variable, a.
- del tmp deletes the tmp file.
- echo sha%bits%-%a% will print out the type of SHA hash type, along with the base64 of the input file.
- pause Prevents the terminal from closing.

SRI in action

Now that we understand how hashing and SRI hashes work, let’s try a concrete example. We’ll create two files:

// file1.js
alert('Hello, world!');

and:

// file2.js
alert('Hi, world!');

Then we’ll compute the SHA-384 SRI hashes for both:

Filename	SHA-384 hash (base64)
`file1.js`	`3frxDlOvLa6GGEUwMh9AowcepHRx/rwFT9VW9yL1wv/OcerR39FEfAUHZRrqaOy2`
`file2.js`	`htr1LmWx3PQJIPw5bM9kZKq/FL0jMBuJDxhwdsMHULKybAG5dGURvJIXR9bh5xJ9`

Then, let’s create a file named index.html:

<!DOCTYPE html>
<html>
  <head>
    <script type="text/javascript" src="./file1.js" integrity="sha384-3frxDlOvLa6GGEUwMh9AowcepHRx/rwFT9VW9yL1wv/OcerR39FEfAUHZRrqaOy2" crossorigin="anonymous"></script>
    <script type="text/javascript" src="./file2.js" integrity="sha384-htr1LmWx3PQJIPw5bM9kZKq/FL0jMBuJDxhwdsMHULKybAG5dGURvJIXR9bh5xJ9" crossorigin="anonymous"></script>
  </head>
</html>

Place all of these files in the same folder and start a server within that folder (for example, run npx http-server inside the folder containing the files and then open one of the addresses provided by http-server or the server of your choice, such as 127.0.0.1:8080). You should get two alert dialog boxes. The first should say “Hello, world!” and the second, “Hi, world!”

If you modify the contents of the scripts, you’ll notice that they no longer execute. This is subresource integrity in effect. The browser notices that the hash of the requested file does not match the expected hash and refuses to run it.

We can also include multiple hashes for a resource and the strongest hash will be chosen, like so:

<!DOCTYPE html>
<html>
  <head>
    <script
      type="text/javascript"
      src="./file1.js"
      integrity="sha384-3frxDlOvLa6GGEUwMh9AowcepHRx/rwFT9VW9yL1wv/OcerR39FEfAUHZRrqaOy2 sha512-cJpKabWnJLEvkNDvnvX+QcR4ucmGlZjCdkAG4b9n+M16Hd/3MWIhFhJ70RNo7cbzSBcLm1MIMItw
9qks2AU+Tg=="
       crossorigin="anonymous"></script>
    <script 
      type="text/javascript"
      src="./file2.js"
      integrity="sha384-htr1LmWx3PQJIPw5bM9kZKq/FL0jMBuJDxhwdsMHULKybAG5dGURvJIXR9bh5xJ9 sha512-+4U2wdug3VfnGpLL9xju90A+kVEaK2bxCxnyZnd2PYskyl/BTpHnao1FrMONThoWxLmguExF7vNV
WR3BRSzb4g=="
      crossorigin="anonymous"></script>
  </head>
</html>

The browser will choose the hash that is considered to be the strongest and check the file’s hash against it.

Why is there a “crossorigin” attribute?

The crossorigin attribute tells the browser when to send the user credentials with the request for the resource. There are two options to choose from:

Value (`crossorigin=`)	Description
`anonymous`	The request will have its credentials mode set to `same-origin` and its mode set to cors.
`use-credentials`	The request will have its credentials mode set to `include` and its mode set to `cors`.

Request credentials modes mentioned

Credentials mode	Description
`same-origin`	Credentials will be sent with requests sent to same-origin domains and credentials that are sent from same-origin domains will be used.
`include`	Credentials will be sent to cross-origin domains as well and credentials sent from cross-origin domains will be used.

Request modes mentioned

Request mode	Description
`cors`	The request will be a CORS request, which will require the server to have a defined CORS policy. If not, the request will throw an error.

Why is the “crossorigin” attribute required with subresource integrity?

By default, scripts and stylesheets can be loaded cross-origin, and since subresource integrity prevents the loading of a file if the hash of the loaded resource doesn’t match the expected hash, an attacker could load cross-origin resources en masse and test if the loading fails with specific hashes, thereby inferring information about a user that they otherwise wouldn’t be able to.

When you include the crossorigin attribute, the cross-origin domain must choose to allow requests from the origin the request is being sent from in order for the request to be successful. This prevents cross-origin attacks with subresource integrity.

Using subresource integrity with webpack

It probably sounds like a lot of work to recalculate the SRI hashes of each file every time they are updated, but luckily, there’s a way to automate it. Let’s walk through an example together. You’ll need a few things before you get started.

Node.js and npm

Node.js is a JavaScript runtime that, along with npm (its package manager), will allow us to use webpack. To install it, visit the Node.js website and choose the download that corresponds to your operating system.

Setting up the project

Create a folder and give it any name with mkdir [name of folder]. Then type cd [name of folder] to navigate into it. Now we need to set up the directory as a Node project, so type npm init. It will ask you a few questions, but you can press Enter to skip them since they’re not relevant to our example.

webpack

webpack is a library that allows you automatically combine your files into one or more bundles. With webpack, we will no longer need to manually update the hashes. Instead, webpack will inject the resources into the HTML with integrity and crossorigin attributes included.

Installing webpack

Yu’ll need to install webpack and webpack-cli:

npm i --save-dev webpack webpack-cli

The difference between the two is that webpack contains the core functionalities whereas webpack-cli is for the command line interface.

We’ll edit our package.json to add a scripts section like so:

{
  //... rest of package.json ...,
  "scripts": {
    "dev": "webpack --mode=development"
  }
  //... rest of package.json ...,
}

This enable us to run npm run dev and build our bundle.

Setting up webpack configuration

Next, let’s set up the webpack configuration. This is necessary to tell webpack what files it needs to deal with and how.

First, we’ll need to install two packages, html-webpack-plugin, and webpack-subresource-integrity:

npm i --save-dev html-webpack-plugin webpack-subresource-integrity style-loader css-loader

Package name	Description
html-webpack-plugin	Creates an HTML file that resources can be injected into
webpack-subresource-integrity	Computes and inserts subresource integrity information into resources such as `<script>` and `<link rel=…>`
style-loader	Applies the CSS styles that we import
css-loader	Enables us to `import` css files into our JavaScript

Setting up the configuration:

const path              = require('path'),
      HTMLWebpackPlugin = require('html-webpack-plugin'),
      SriPlugin         = require('webpack-subresource-integrity');

module.exports = {
  output: {
    // The output file's name
    filename: 'bundle.js',
    // Where the output file will be placed. Resolves to 
    // the "dist" folder in the directory of the project
    path: path.resolve(__dirname, 'dist'),
    // Configures the "crossorigin" attribute for resources 
    // with subresource integrity injected
    crossOriginLoading: 'anonymous'
  },
  // Used for configuring how various modules (files that 
  // are imported) will be treated
  modules: {
    // Configures how specific module types are handled
    rules: [
      {
        // Regular expression to test for the file extension.
        // These loaders will only be activated if they match
        // this expression.
        test: /\.css$/,
        // An array of loaders that will be applied to the file
        use: ['style-loader', 'css-loader'],
        // Prevents the accidental loading of files within the
        // "node_modules" folder
        exclude: /node_modules/
      }
    ]
  },
  // webpack plugins alter the function of webpack itself
  plugins: [
    // Plugin that will inject integrity hashes into index.html
    new SriPlugin({
      // The hash functions used (e.g. 
      // <script integrity="sha256- ... sha384- ..." ...
      hashFuncNames: ['sha384']
    }),
    // Creates an HTML file along with the bundle. We will
    // inject the subresource integrity information into 
    // the resources using webpack-subresource-integrity
    new HTMLWebpackPlugin({
      // The file that will be injected into. We can use 
      // EJS templating within this file, too
      template: path.resolve(__dirname, 'src', 'index.ejs'),
      // Whether or not to insert scripts and other resources
      // into the file dynamically. For our example, we will
      // enable this.
      inject: true
    })
  ]
};

Creating the template

We need to create a template to tell webpack what to inject the bundle and subresource integrity information into. Create a file named index.ejs:

<!DOCTYPE html>
<html>
  <body></body>
</html>

Now, create an index.js in the folder with the following script:

// Imports the CSS stylesheet
import './styles.css'
alert('Hello, world!');

Building the bundle

Type npm run build in the terminal. You’ll notice that a folder, called dist is created, and inside of it, a file called index.html that looks something like this:

<!DOCTYPE HTML>
<html><head><script defer src="bundle.js" integrity="sha384-lb0VJ1IzJzMv+OKd0vumouFgE6NzonQeVbRaTYjum4ql38TdmOYfyJ0czw/X1a9b" crossorigin="anonymous">
</script></head>
  <body>
  </body>
</html>

The CSS will be included as part of the bundle.js file.

This will not work for files loaded from external servers, nor should it, as cross-origin files that need to constantly update would break with subresource integrity enabled.

Thanks for reading!

That’s all for this one. Subresource integrity is a simple and effective addition to ensure you’re loading only what you expect and protecting your users; and remember, security is more than just one solution, so always be on the lookout for more ways to keep your website safe.

Cecile Muller

# June 14, 2021

If you’re using Node, all you need is the built-in module “crypto” for generating the hashes, for example: https://gist.github.com/cecilemuller/8477130a4fbb427fb341f67af128ab5a

Sora2455

Generating an SRI hash is easy with a tool like https://www.srihash.org/

Peter Goes

# June 15, 2021

Thanks for the writeup!

A question though:
From a JAMStack perspective: if the server is compromised so that the attacker can modify a javascript / css file. They will also be able to modify the HTML and updating the hash right?

Joe

# July 9, 2021

Looks great! But way too little documentation to be able to set it up with Webpack unfortunately. Was just generation incorrect digests for me.