Visual Regression Testing with PhantomCSS

The following is a guest post by Jon Bellah, a Lead Front End Engineer at 10up. Jon reached out to us about writing on the idea of visual regression testing, which is a form of CSS testing (i.e. making sure you don’t screw up your site by accident). I thought the use-case was particularly interesting: re-architecting CSS (converting to Sass, splitting up files, etc) and making sure there wasn’t regressions during that process. Here Jon will go into all that as well as some of the challenges of visual testing (e.g. changing content changes visual result) with clever workarounds.

Inheriting a codebase from new client is one of the most common, and most difficult, challenges I’ve faced while working at an agency. In some cases, a client is transitioning to a new agency because the previous agency was not producing quality work. In an almost every case, the previous agency didn’t do things the way I would have.

I find myself in this situation often. Not every client has the need, desire, or budget to rebuild from the ground up.

Recently, my team inherited a codebase from a new client and were tasked with doing a little bit of cleanup and to quickly transition into building out new features. As we dug in, we felt we could improve their codebase, and set ourselves up for an easier path forward, by transitioning their stylesheets to Sass.

While we could certainly just rename the files and include them in a single pre-compiled stylesheet (without doing any cleanup), we felt there was much to be gained by re-architecting their styles. Doing so would cost a bit more upfront, but we felt that it would ultimately save them a lot of time and money down the road. Most importantly, it would allow us to iterate more quickly with greater confidence.

In the past, I would consider such an undertaking to be rather high risk. After all, the C in CSS does stand for cascading… order absolutely matters. Restructuring a handful of stylesheets means changing the order in which things appear, which naturally introduces a high risk of breaking things.

As a result, it’s always been something that was either tested manually (slowly) or was just deemed to be cost prohibitive.

This time, though, we decided to build a visual regression testing suite.

Visual regression testing has recently started gaining popularity- and for good reason. At its most basic, it’s a series of tests that run through your site, take screenshots of various components, compare those screenshots against a baseline, and alert you when things change.

That may sound counter-intuitive to some folks. We change CSS because we want things to look differently. Why would we want a build process telling us that we broke something every time we commit a change to our stylesheets?

Whether you’re re-architecting a client’s styles or just working with a team, it’s easy to make changes to CSS that we think only affects one component, only to find out later that we broke that component on an entirely different page.

To truly understand why visual regression testing can be beneficial, I think it’s helpful to understand what makes humans bad for the job.

Man versus Machine

It turns out that we humans are actually pretty terrible at spotting changes in visual stimuli. In fact, our inability to notice changes has become an increasingly studied set of physiological and psychological phenomena.

We’ve even made games out of it. Do you remember the old “spot the differences” pictures?

There are a number of real world problems that psychologists are keen to understand, such as how these phenomena affect things like eyewitness testimony or driving ability; but found in their research is a lot of knowledge that can be applied to our work in web development.

One phenomenon that I feel is particularly relevant to the conversation is change blindness.

Change Blindness

Research on the concept of change blindness dates back to the 1970s. In 1996, though, George McConkie and Christopher Currie, a couple of professors at the University of Illinois Urbana-Champaign, conducted a set of studies that is credited with sparking significant interest into the phenomena of change blindness.

Change blindness is a perceptual deficiency, whereby changes in visual stimulus can be made without the observer noticing them. It’s not linked to any visual defects, it’s purely psychological.

In the McConkie & Currie study, they found that, in some cases, changes of up to a fifth of the picture area could regularly go unnoticed. This video provides an excellent example of just how much change can be missed if you’re not looking for it.

The Tools

When it comes to building your test suite, there is a wide array of tools to choose from. I always recommend looking around, comparing tools, and figuring out what works best for you.

With that in mind, I’ve chosen PhantomCSS as my go-to tool for visual regression testing. I chose it for a couple of reasons.

First, because it has a relatively active and healthy community on GitHub. When it comes to open source software, I always like to check and make sure that a tool or library is still being actively developed. Relying on abandonware can quickly become a pain.

The second reason I chose PhantomCSS is because it has a handy Grunt plugin that allowed it to easily integrate with my existing workflow.

At it’s core, PhantomCSS is a combination of three key components:

PhantomJS or SlimerJS – A headless browser. PhantomJS is the headless version of WebKit, while Slimer is the Gecko engine used by Firefox.
CasperJS – Casper is a JavaScript navigation scripting and testing utility. It allows us to define a set of actions to occur inside our headless browser.
ResembleJS – Resemble is a JavaScript / HTML5 library for making image comparisons. It will test our new tests against our baseline and alert us of any differences between the two.

And finally, as mentioned, we’ll be using Grunt to run our tests.

The Implementation

Now that we know the what’s and the why’s, let’s walk through the steps of setting up and implementing your visual regression testing suite.

Setting up Grunt

First, we need to setup Grunt to run our test suite, so you’ll want to make sure you have Grunt installed.

In the command line, $ cd /path/to/your-site and run:

$ npm install @micahgodbolt/grunt-phantomcss --save-dev

Open your project’s `Gruntfile` and load the PhantomCSS task and define the task in the grunt.initConfig(), like so:

grunt.loadNpmTasks('@micahgodbolt/grunt-phantomcss');

grunt.initConfig({
  phantomcss: {
    desktop: {
      options: {
        screenshots: 'baselines/desktop',
        results: 'results/desktop',
        viewportSize: [1280, 800]
      },
      src: [
        'tests/phantomcss/start.js',
        'tests/phantomcss/*-test.js'
      ]
    }
  }
});

Testing Different Breakpoints

I like using Sass MQ to manage my breakpoints. This approach has the added benefit of giving me a list of all my breakpoints, that I can easily use to set up my tests.

With PhantomCSS, you are able to manipulate the browser width within your actual test definition, but I prefer to abstract that out of my tests to give a little more flexibility to my visual testing suite. Instead, choosing to define it in my Grunt task.

With grunt-phantomcss, we can define a set of tests to run at different breakpoints and, as an added bonus, save them to different folders.

To keep things a bit more organized and semantic, I also name each testing subtask to match its corresponding Sass MQ breakpoint.

So, for example:

grunt.initConfig( {
  pkg: grunt.file.readJSON('package.json'),
  phantomcss: {
    desktop: {
      options: {
        screenshots: 'baselines/desktop',
        results: 'results/desktop',
        viewportSize: [1024, 768]
      },
      src: [
        'tests/phantomcss/start.js',
        'tests/phantomcss/*-test.js'
      ]
    },
    mobile: {
      options: {
        screenshots: 'baselines/mobile',
        results: 'results/mobile',
        viewportSize: [320, 480]
      },
      src: [
        'tests/phantomcss/start.js',
        'test/phantomcss/*-test.js'
      ]
    }
  }
});

Here we are running through the same set of tests, but running them at different breakpoints and saving them to subfolders within our baselines and results.

Setting Up Your Test Suite

In our Grunt definition, you can see that we run begin by running `tests/phantomcss/start.js`. This file fires up Casper (which then fires up our our scripting tool and our headless browser), and should look like:

phantom.casperTest = true;
casper.start();

Now, back in our Grunt definition, you can see that we then run all files in our tests/phantomcss/ directory that end in `-test.js`. Grunt will loop through each of these files in alphabetical order.

How you organize your test files is entirely up to you. Personally, I like to create a test file for each component in my site.

Writing Your First Test

Once you’ve got your `start.js` file set up, it’s time to write your first test. We’ll call this file `header-test.js`.

casper.thenOpen('http://mysite.dev/')

.then(function() {
  phantomcss.screenshot('.site-header', 'site-header');
});

At the top of the file, we tell Casper to open the root URL, and then in our first test we grab a screenshot of the entire .site-header element. The second parameter is the name of our screenshot file. I prefer to name screenshots after the element or component that they’re responsible for, as it makes my test suite that much more semantic and easier to share with teammates.

In its simplest form, that’s all you need to write for your first test. However, we can build a much more robust testing suite, covering more of the actual element, page, and application states.

Scripting Interactions

Casper allows us to automate interactions that occur within our headless browser. For example, if we want to test the hover state of a button, we could write that as:

casper.then(function() {
  this.mouse.move('.button');
  phantomcss.screenshot('.button');
});

You can also test logged in and logged out states. In our `start.js` file, we can write a little function that will fill out the WordPress login form as soon as we spin up our Casper instance.

casper.start('http://default.wordpress.dev/wp-admin/', function() {
  this.fill('form#loginform', {
    'log': 'admin',
    'pwd': 'password'
  }, true);

  this.click('#wp-submit');

  console.log('Logging in...');
});

You’ll notice that we’re running this on casper.start() instead of inside it’s own individual test. Setting up your session on casper.start() in your `start.js` file makes the session available to other files in your test suite, since it will always be run first.

I recommend taking a look at the Casper documentation for more information on scripting interactions.

Running Your Tests

Now that we’ve built a basic test suite, it’s time to run our tests. In the command line, run $ grunt phantomcss.

PhantomCSS will automatically set the screenshots from your first run as the baselines to compare all future tests against.

If your test does fail, like the one above, PhantomCSS will output three different screenshots to your results folder. It will output the original, a `.diff.png`, and a `.fail.png`.

For example, we have changed the font size of text in an article page, but inadvertently decreased the font size in an archive view. PhantomCSS will give us these diffs to compare:

The Challenges

Building a visual regression testing suite is certainly not without its challenges. The two biggest challenges that I have encountered are dynamic content and distributing tests amongst a team.

Dynamic Content

The first, and in some ways most difficult, challenge that I have encountered is how exactly to handle dynamic content. The test suite is running through each of these pages, taking screenshots, and comparing them. If content is different, the test is going to fail.

If you’re working with a team, odds are everyone will be testing against their own local environment. Testing against a single staging environment doesn’t always fix the issue, because content may still change there; i.e., a randomly ordered set of related posts.

To solve this issue, there are two approaches that I’ve come to favor.

The first, and my favorite, approach is to use Javascript to replace content within the elements you’re testing with a set of representative demo content.

Since these tests should not be deployed to your production server, you don’t have to worry about the XSS vulnerabilities. As such, I like to use .html() in my tests to replace the dynamic content with static content from a JSON object that I include in my code repo, prior to taking the screenshot.

The second approach is to use a tool called Hologram or mdcss, which allow you to use comments in your CSS to create auto-generated style guides. This approach has more overhead, in that it requires the biggest shift in workflow, but has the added benefit of creating excellent documentation for your front-end components.

Distribution

The second major challenge that I encountered with regression testing is in determining the best way to distribute these tests amongst a team of engineers. So far in our tests we’ve hardcoded our testing URL, this will cause issues when working with a team where everyone may not be using the same URL for their local environment.

To get around this, my team and I have registered our $ grunt test task to accept a --url parameter, which is then saved to a file locally, using grunt.log.

// All a variable to be passed, eg. --url=http://test.dev
var localURL = grunt.option( 'url' );

/**
 * Register a custom task to save the local URL, which is then read by the PhantomCSS test file.
 * This file is saved so that "grunt test" can then be run in the future without passing your local URL each time.
 *
 * Note: Make sure test/visual/.local_url is added to your .gitignore
 *
 * Props to Zack Rothauser for this approach.
 */
grunt.registerTask('test', 'Runs PhantomCSS and stores the --url parameter', function() {
  if (localURL) {
    grunt.log.writeln( 'Local URL: ' + localURL );
    grunt.file.write( 'test/visual/.local_url', localURL );
  }

  grunt.task.run(['phantomcss']);
});

Then, at the top of your test file, you’ll use:

var fs = require('fs'), siteURL;

try {
  siteURL = fs.read( 'test/visual/.local_url' );
} catch(err) {
  siteURL = (typeof siteURL === 'undefined') ? 'http://local.wordpress.dev' : siteURL;
}

casper.thenOpen(siteURL + '/path/to/template');

Your suite will now look for the `.local_url` file whenever it is run, but if the file is not present, it will default to using `http://local.wordpress.dev`.

In Closing

There are a host of benefits that visual regression testing can bring to your projects. Rapid iteration and continuous integration are increasingly the mantra of today’s developers, it only makes sense to build yourself a safety net.

A visual regression testing suite is also great for working with people on open source projects. In fact, the WordPress project is working towards a pattern library with an accompanying regression testing suite. This test suite will provide the groundwork that allows the WordPress project to move forward with plans to restore sanity to their stylesheets.

Alternatives

PhantomCSS is not the only tool available, it’s simply the one that I felt was right for me, my team, and our workflow. If visual regression testing sounds cool to you but PhantomCSS doesn’t sound like your thing, or if you’re just interested in alternatives, I recommend taking a look at:

Alex McCabe

# November 17, 2015

I’ve looked into this before and it’s a very cool idea. Unfortunately I don’t have the time to implement on my current project :(

I am disappointed too to find that you didn’t show us the result of the side-by-side image. I wanted to know if I found them all.

Jon Bellah

Permalink to comment# November 17, 2015

Haha, sorry for no diff on the side-by-side image! Maybe run it through PhantomCSS :)

It took some time before I had a project with a budget that allowed me to implement a full test suite. Hopefully you’ll get to tinker with one on your next project!
Agop

Permalink to comment# November 17, 2015

Here’s a quick comparison for you: http://codepen.io/agop/full/yYwBLb

(Technique from here, also seen on CSS-Tricks.)

Pat Hartl

Interesting read. I’m currently playing around with BackstopJS. It’s okay I guess. It’s pretty easy to set up, but it leaves much to be desired. Backstop has just implemented CasperJS automation in the latest build, but it’s a bit disappointing as all the scripts are run THEN the screenshot is taken. I think I’ll play around with PhantomCSS as it could be much easier to write tests for things like menus that need more than just one screenshot per test.

Jon Bellah

Permalink to comment# November 17, 2015

Thanks! I had trouble with Backstop, as well, which is why I didn’t list it in my alternative recommendations. I hope PhantomCSS ends up being what you’re looking for!
Rick Boardman

Permalink to comment# November 23, 2015

I’m evaluating tools right now too and have been initially impressed with BackstopJS (including ease of setup as per Pat’s comment, as well as the nice reporting).

Are there any other issues you know of with BackstopJS, beyond timing issues with scripting?

[Excellent post – thanks John!]

Serg Gospodarets

You mentioned everyone runs tests locally in your approach.
I’m just wandering how do you solve cross platform rendering differences? E.g. the same markup is rendered differently in PhantomJS on Mac and on Windows.
Do all your team members have the same OS type or you use some mismatch tolerance for the tests?

Recently we implemented similar system but instead of testing on local environments (which caused unexpected errors due to cross platforms rendering issues) we use the same set of machines (platform + device + browser version, usually it’s Sauce Labs).

Jon Bellah

Permalink to comment# November 17, 2015

That’s a really great point to make.. and actually the exact reason I included Selenium in my recommendations. Selenium allows you to actually test different platforms and such. I think it’s a great alternative if you need more complete coverage.

I don’t think a visual regression suite is a total replacement for a QA process, I think it’s just a good addition to that process. I still use BrowserStack and VMs to test cross-browser and cross-platform.

We don’t all have the same OS, we use a bit of mismatch tolerance and that seems to take care of most of the issues.

Sergey Semashko

# November 18, 2015

PhantomCSS is a good tool, but it didn’t work well for our team. We had difficulties with CI integration and phantomJS only approach is not the ideal case, especially for enterprise. Eventually, our team switched to Gemini and it turned out it is way more powerful. It’s not that popular yet, but worth checking it out. In my opinion, Gemini is the best tool in “visual regression” area at the moment.

Jon Bellah

Permalink to comment# November 19, 2015

Ah! I can’t believe I forgot to include Gemini on my list. I don’t have too much experience with it, but it is one of the ones that I took a look at initially.

Thanks for the reminder!

Sergey Belov

Thanks for the interesting article!

Have you looked at the Gemini CSS regression testing tool? It is a lot better than PhantomCSS.

It’s key features are:
– compatibility with different browsers, not just PhantomJS
– position and size of an element are calculated including its box-shadow and outline properties
– some special case differences between images (rendering artifacts, text caret, etc.) are ignored
– and the one of the killer-features: CSS code coverage statistics

https://github.com/gemini-testing/gemini/

It also has a companion utility called Gemini GUI that is greatly help updating baseline images after the code changes.

https://github.com/gemini-testing/gemini-gui/

Gemini is also extendable using plugins.

Jon Bellah

Permalink to comment# November 19, 2015

Thanks for the thoughtful response! Agree that Gemini is a great tool… totally forgot to include it in my alternative recommendations.

# November 19, 2015

Thanks Jon, great stuff!!!

Check out this cool post by Dave Haeffner – http://bit.ly/1DCS61Y – it explains how to get started with Visual Regression testing and covers some of the leading tools that are out there (including some details on how to handle false positives in visual regression testing).

Jon Bellah

Permalink to comment# November 19, 2015

Thank you! Bookmarked that link, I’ll check it out when I get a some time.

Mark Zeman

Awesome to see how to put this together yourself but if you want an easy paid option then SpeedCurve also does visual diffs based on WebPageTest screenshots. As it’s WebPageTest you can also test in a range of real browsers.

Some examples here: https://speedcurve.com/blog/visual-diffs-on-every-deploy/

Be great to add a list of paid options to the end of the article.

(Disclaimer, I’m the founder of SpeedCurve)

Constantine M.

# November 20, 2015

Thank you very much for sharing!
I have a problem with testing on sites with httpS, it fails SSL handshakes, are there any option to pass into casper config (inside start.js)? bc all the solutions are about running with –ignore-ssl-errors=true and –ssl-protocol=any

thanks in advance!

Michael

# November 24, 2015

Does not work with flexbox :(

Tim Guo

Permalink to comment# November 24, 2015

Hi Michael, you need to upgrade both PantomJS casperJS so that it will support the flexbox. I know this trick by happen to have the same problem in my project. Try PhantomCSS 2.0.0.

Tom

# November 28, 2015

Another really big challenge for me has been the difference between phantomJS rendering between Linux and OSX. I have my tests set up on a CI server, but I need to maintain two different sets of baseline screenshots due to OSX phantomjs rendering the page differently from how the Ubuntu does it (fonts are a bit different, margins etc..