{"id":342070,"date":"2021-06-14T06:30:27","date_gmt":"2021-06-14T13:30:27","guid":{"rendered":"https:\/\/css-tricks.com\/?p=342070"},"modified":"2021-06-14T06:31:29","modified_gmt":"2021-06-14T13:31:29","slug":"securing-your-website-with-subresource-integrity","status":"publish","type":"post","link":"https:\/\/css-tricks.com\/securing-your-website-with-subresource-integrity\/","title":{"rendered":"Securing Your Website With Subresource Integrity"},"content":{"rendered":"\n

When you load a file from an external server, you\u2019re trusting that the content you request is what you expect it to be. Since you don\u2019t manage the server yourself, you\u2019re relying on the security of yet another third party and increasing the attack surface. Trusting a third party is not inherently bad, but it should certainly be taken into consideration in the context of your website\u2019s security.<\/p>\n\n\n\n\n\n\n

A real-world example<\/h3>\n\n\n

This isn\u2019t a purely theoretical danger. Ignoring potential security issues can and has already resulted in serious consequences. On June 4th, 2019, Malwarebytes<\/a> announced their discovery of a malicious skimmer on the website NBA.com. Due to a compromised Amazon S3 bucket, attackers were able to alter a JavaScript library to steal credit card information from customers.<\/p>\n\n\n\n

It\u2019s not only JavaScript that\u2019s worth worrying about, either. CSS is another resource capable of performing dangerous actions such as password stealing, and all it takes is a single compromised third-party server for disaster to strike. But they can provide invaluable services that we can\u2019t simply go without, such as CDNs that reduce the total bandwidth usage of a site and serve files to the end-user much faster due to location-based caching. So it\u2019s established that we need to sometimes rely on a host that we have no control over, but we also need to ensure that the content we receive from it is safe. What can we do?<\/p>\n\n\n

Solution: Subresource Integrity (SRI)<\/h3>\n\n\n

SRI is a security policy that prevents the loading of resources that don\u2019t match an expected hash. By doing this, if an attacker were to gain access to a file and modify its contents to contain malicious code, it wouldn\u2019t match the hash we were expecting and not execute at all.<\/p>\n\n\n

Doesn\u2019t HTTPS do that already?<\/h4>\n\n\n

HTTPS is great for security and a must-have for any website, and while it does prevent similar problems (and much more), it only protects against tampering with data-in-transit. If a file were to be tampered with on the host itself, the malicious file would still be sent over HTTPS, doing nothing to prevent the attack.<\/p>\n\n\n

How does hashing work?<\/h4>\n\n\n

A hashing function takes data of any size as input and returns data of a fixed size as output. Hashing functions would ideally have a uniform distribution. This means that for any input, x<\/code>, the probability that the output, y<\/code>, will be any specific possible value is similar to the probability of it being any other value within the range of outputs.<\/p>\n\n\n\n

Here\u2019s a metaphor:<\/p>\n\n\n\n

Suppose you have a 6-sided die and a list of names. The names, in this case, would be the hash function\u2019s \u201cinput\u201d and the number rolled would be the function\u2019s \u201coutput.\u201d For each name in the list, you\u2019ll roll the die and keep track of what name each number number corresponds to, by writing the number next to the name. If a name is used as input more than once, its corresponding output will always be what it was the first time. For the first name, Alice, you roll 4. For the next, John, you roll 6. Then for Bob, Mary, William, Susan, and Joseph, you get 2, 2, 5, 1, and 1, respectively. If you use \u201cJohn\u201d as input again, the output will once again be 6. This metaphor describes how hash functions work in essence.<\/p>\n\n\n\n

Name (input)<\/strong><\/th>Number rolled (output)<\/strong><\/th><\/tr><\/thead>
Alice<\/td>4<\/td><\/tr>
John<\/td>6<\/td><\/tr>
Bob<\/td>2<\/td><\/tr>
Mary<\/td>2<\/td><\/tr>
William<\/td>5<\/td><\/tr>
Susan<\/td>1<\/td><\/tr>
Joseph<\/td>1<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n

You may have noticed that, for example, Bob and Mary have the same output. For hashing functions, this is called a \u201ccollision.\u201d For our example scenario, it inevitably happens. Since we have seven names as inputs and only six possible outputs, we\u2019re guaranteed at least one collision.<\/p>\n\n\n\n

A notable difference between this example and a hash function in practice is that practical hash functions are typically deterministic, meaning they don\u2019t make use of randomness like our example does. Rather, it predictably maps inputs to outputs so that each input is equally likely to map to any particular output.<\/p>\n\n\n\n

SRI uses a family of hashing functions called the secure hash algorithm (SHA). This is a family of cryptographic hash functions that includes 128, 256, 384, and 512-bit variants. A cryptographic hash function is a more specific kind of hash function with the properties being effectively impossible to reverse to find the original input (without already having the corresponding input or brute-forcing), collision-resistant, and designed so a small change in the input alters the entire output. SRI supports the 256, 384, and 512-bit variants of the SHA family.<\/p>\n\n\n\n

Here\u2019s an example with SHA-256:<\/p>\n\n\n\n

For example. the output for hello<\/code> is:<\/p>\n\n\n\n

2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824<\/code><\/pre>\n\n\n\n

And the output for hell0<\/code> (with a zero instead of an O) is:<\/p>\n\n\n\n

bdeddd433637173928fe7202b663157c9e1881c3e4da1d45e8fff8fb944a4868<\/code><\/pre>\n\n\n\n

You\u2019ll notice that the slightest change in the input will produce an output that is completely different. This is one of the properties of cryptographic hashes listed earlier.<\/p>\n\n\n\n

The format you\u2019ll see most frequently for hashes is hexadecimal, which consists of all the decimal digits (0-9) and the letters A through F. One of the benefits of this format is that every two characters represent a byte, and the evenness can be useful for purposes such as color formatting, where a byte represents each color. This means a color without an alpha channel can be represented with only six characters (e.g., red = ff0000<\/code>)<\/p>\n\n\n\n

This space efficiency is also why we use hashing instead of comparing the entirety of a file to the data we\u2019re expecting each time. While 256 bits cannot represent all of the data in a file that is greater than 256 bits without compression, the collision resistance of SHA-256 (and 384, 512) ensures that it\u2019s virtually impossible to find two hashes for differing inputs that match. And as for SHA-1, it\u2019s no longer secure, as a collision has been found<\/a>.<\/p>\n\n\n\n

Interestingly, the appeal of compactness is likely one of the reasons that SRI hashes don\u2019t<\/em> use the hexadecimal format, and instead use base64. This may seem like a strange decision at first, but when we take into consideration the fact that these hashes will be included in the code and that base64 is capable of conveying the same amount of data as hexadecimal while being 33% shorter, it makes sense. A single character of base64 can be in 64 different states, which is 6 bits worth of data, whereas hex can only represent 16 states, or 4 bits worth of data. So if, for example, we want to represent 32 bytes of data (256 bits), we would need 64 characters in hex, but only 44 characters in base64. When we using longer hashes, such as sha384\/512, base64 saves a great deal of space.<\/p>\n\n\n

Why does hashing work for SRI?<\/h4>\n\n\n

So let\u2019s imagine there was a JavaScript file hosted on a third-party server that we included in our webpage and we had subresource integrity enabled for it. Now, if an attacker were to modify the file\u2019s data with malicious code, the hash of it would no longer match the expected hash and the file would not execute. Recall that any small change in a file completely changes its corresponding SHA hash, and that hash collisions with SHA-256 and higher are, at the time of this writing, virtually impossible.<\/p>\n\n\n

Our first SRI hash<\/h3>\n\n\n

So, there are a few methods you can use to compute the SRI hash of a file. One way (and perhaps the simplest) is to use srihash.org<\/a>, but if you prefer a more programmatic way, you can use:<\/p>\n\n\n\n

sha384sum [filename here] | head -c 96 | xxd -r -p | base64<\/code><\/pre>\n\n\n\n