Visual Regression Testing with PhantomCSS

The following is a guest post by Jon Bellah, a Lead Front End Engineer at 10up. Jon reached out to us about writing on the idea of visual regression testing, which is a form of CSS testing (i.e. making sure you don't screw up your site by accident). I thought the use-case was particularly interesting: re-architecting CSS (converting to Sass, splitting up files, etc) and making sure there wasn't regressions during that process. Here Jon will go into all that as well as some of the challenges of visual testing (e.g. changing content changes visual result) with clever workarounds.

Inheriting a codebase from new client is one of the most common, and most difficult, challenges I’ve faced while working at an agency. In some cases, a client is transitioning to a new agency because the previous agency was not producing quality work. In an almost every case, the previous agency didn’t do things the way I would have.

I find myself in this situation often. Not every client has the need, desire, or budget to rebuild from the ground up.

Recently, my team inherited a codebase from a new client and were tasked with doing a little bit of cleanup and to quickly transition into building out new features. As we dug in, we felt we could improve their codebase, and set ourselves up for an easier path forward, by transitioning their stylesheets to Sass.

While we could certainly just rename the files and include them in a single pre-compiled stylesheet (without doing any cleanup), we felt there was much to be gained by re-architecting their styles. Doing so would cost a bit more upfront, but we felt that it would ultimately save them a lot of time and money down the road. Most importantly, it would allow us to iterate more quickly with greater confidence.

In the past, I would consider such an undertaking to be rather high risk. After all, the C in CSS does stand for cascading... order absolutely matters. Restructuring a handful of stylesheets means changing the order in which things appear, which naturally introduces a high risk of breaking things.

As a result, it's always been something that was either tested manually (slowly) or was just deemed to be cost prohibitive.

This time, though, we decided to build a visual regression testing suite.

Visual regression testing has recently started gaining popularity- and for good reason. At its most basic, it's a series of tests that run through your site, take screenshots of various components, compare those screenshots against a baseline, and alert you when things change.

That may sound counter-intuitive to some folks. We change CSS because we want things to look differently. Why would we want a build process telling us that we broke something every time we commit a change to our stylesheets?

Whether you're re-architecting a client's styles or just working with a team, it's easy to make changes to CSS that we think only affects one component, only to find out later that we broke that component on an entirely different page.

To truly understand why visual regression testing can be beneficial, I think it’s helpful to understand what makes humans bad for the job.

Man versus Machine

It turns out that we humans are actually pretty terrible at spotting changes in visual stimuli. In fact, our inability to notice changes has become an increasingly studied set of physiological and psychological phenomena.

We've even made games out of it. Do you remember the old "spot the differences" pictures?

There are a number of real world problems that psychologists are keen to understand, such as how these phenomena affect things like eyewitness testimony or driving ability; but found in their research is a lot of knowledge that can be applied to our work in web development.

One phenomenon that I feel is particularly relevant to the conversation is change blindness.

Change Blindness

Research on the concept of change blindness dates back to the 1970s. In 1996, though, George McConkie and Christopher Currie, a couple of professors at the University of Illinois Urbana-Champaign, conducted a set of studies that is credited with sparking significant interest into the phenomena of change blindness.

Change blindness is a perceptual deficiency, whereby changes in visual stimulus can be made without the observer noticing them. It's not linked to any visual defects, it's purely psychological.

In the McConkie & Currie study, they found that, in some cases, changes of up to a fifth of the picture area could regularly go unnoticed. This video provides an excellent example of just how much change can be missed if you're not looking for it.

The Tools

When it comes to building your test suite, there is a wide array of tools to choose from. I always recommend looking around, comparing tools, and figuring out what works best for you.

With that in mind, I've chosen PhantomCSS as my go-to tool for visual regression testing. I chose it for a couple of reasons.

First, because it has a relatively active and healthy community on GitHub. When it comes to open source software, I always like to check and make sure that a tool or library is still being actively developed. Relying on abandonware can quickly become a pain.

The second reason I chose PhantomCSS is because it has a handy Grunt plugin that allowed it to easily integrate with my existing workflow.

At it’s core, PhantomCSS is a combination of three key components:

  • PhantomJS or SlimerJS - A headless browser. PhantomJS is the headless version of WebKit, while Slimer is the Gecko engine used by Firefox.
  • CasperJS - Casper is a JavaScript navigation scripting and testing utility. It allows us to define a set of actions to occur inside our headless browser.
  • ResembleJS - Resemble is a JavaScript / HTML5 library for making image comparisons. It will test our new tests against our baseline and alert us of any differences between the two.

And finally, as mentioned, we'll be using Grunt to run our tests.

The Implementation

Now that we know the what’s and the why’s, let’s walk through the steps of setting up and implementing your visual regression testing suite.

Setting up Grunt

First, we need to setup Grunt to run our test suite, so you'll want to make sure you have Grunt installed.

In the command line, $ cd /path/to/your-site and run:

$ npm install @micahgodbolt/grunt-phantomcss --save-dev

Open your project’s `Gruntfile` and load the PhantomCSS task and define the task in the grunt.initConfig(), like so:

grunt.loadNpmTasks('@micahgodbolt/grunt-phantomcss');

grunt.initConfig({
  phantomcss: {
    desktop: {
      options: {
        screenshots: 'baselines/desktop',
        results: 'results/desktop',
        viewportSize: [1280, 800]
      },
      src: [
        'tests/phantomcss/start.js',
        'tests/phantomcss/*-test.js'
      ]
    }
  }
});

Testing Different Breakpoints

I like using Sass MQ to manage my breakpoints. This approach has the added benefit of giving me a list of all my breakpoints, that I can easily use to set up my tests.

With PhantomCSS, you are able to manipulate the browser width within your actual test definition, but I prefer to abstract that out of my tests to give a little more flexibility to my visual testing suite. Instead, choosing to define it in my Grunt task.

With grunt-phantomcss, we can define a set of tests to run at different breakpoints and, as an added bonus, save them to different folders.

To keep things a bit more organized and semantic, I also name each testing subtask to match its corresponding Sass MQ breakpoint.

So, for example:

grunt.initConfig( {
  pkg: grunt.file.readJSON('package.json'),
  phantomcss: {
    desktop: {
      options: {
        screenshots: 'baselines/desktop',
        results: 'results/desktop',
        viewportSize: [1024, 768]
      },
      src: [
        'tests/phantomcss/start.js',
        'tests/phantomcss/*-test.js'
      ]
    },
    mobile: {
      options: {
        screenshots: 'baselines/mobile',
        results: 'results/mobile',
        viewportSize: [320, 480]
      },
      src: [
        'tests/phantomcss/start.js',
        'test/phantomcss/*-test.js'
      ]
    }
  }
});

Here we are running through the same set of tests, but running them at different breakpoints and saving them to subfolders within our baselines and results.

Setting Up Your Test Suite

In our Grunt definition, you can see that we run begin by running `tests/phantomcss/start.js`. This file fires up Casper (which then fires up our our scripting tool and our headless browser), and should look like:

phantom.casperTest = true;
casper.start();

Now, back in our Grunt definition, you can see that we then run all files in our tests/phantomcss/ directory that end in `-test.js`. Grunt will loop through each of these files in alphabetical order.

How you organize your test files is entirely up to you. Personally, I like to create a test file for each component in my site.

Writing Your First Test

Once you’ve got your `start.js` file set up, it’s time to write your first test. We’ll call this file `header-test.js`.

casper.thenOpen('http://mysite.dev/')

.then(function() {
  phantomcss.screenshot('.site-header', 'site-header');
});

At the top of the file, we tell Casper to open the root URL, and then in our first test we grab a screenshot of the entire .site-header element. The second parameter is the name of our screenshot file. I prefer to name screenshots after the element or component that they're responsible for, as it makes my test suite that much more semantic and easier to share with teammates.

In its simplest form, that’s all you need to write for your first test. However, we can build a much more robust testing suite, covering more of the actual element, page, and application states.

Scripting Interactions

Casper allows us to automate interactions that occur within our headless browser. For example, if we want to test the hover state of a button, we could write that as:

casper.then(function() {
  this.mouse.move('.button');
  phantomcss.screenshot('.button');
});

You can also test logged in and logged out states. In our `start.js` file, we can write a little function that will fill out the WordPress login form as soon as we spin up our Casper instance.

casper.start('http://default.wordpress.dev/wp-admin/', function() {
  this.fill('form#loginform', {
    'log': 'admin',
    'pwd': 'password'
  }, true);

  this.click('#wp-submit');

  console.log('Logging in...');
});

You’ll notice that we’re running this on casper.start() instead of inside it’s own individual test. Setting up your session on casper.start() in your `start.js` file makes the session available to other files in your test suite, since it will always be run first.

I recommend taking a look at the Casper documentation for more information on scripting interactions.

Running Your Tests

Now that we've built a basic test suite, it's time to run our tests. In the command line, run $ grunt phantomcss.

PhantomCSS will automatically set the screenshots from your first run as the baselines to compare all future tests against.

If your test does fail, like the one above, PhantomCSS will output three different screenshots to your results folder. It will output the original, a `.diff.png`, and a `.fail.png`.

For example, we have changed the font size of text in an article page, but inadvertently decreased the font size in an archive view. PhantomCSS will give us these diffs to compare:

The Challenges

Building a visual regression testing suite is certainly not without its challenges. The two biggest challenges that I have encountered are dynamic content and distributing tests amongst a team.

Dynamic Content

The first, and in some ways most difficult, challenge that I have encountered is how exactly to handle dynamic content. The test suite is running through each of these pages, taking screenshots, and comparing them. If content is different, the test is going to fail.

If you're working with a team, odds are everyone will be testing against their own local environment. Testing against a single staging environment doesn't always fix the issue, because content may still change there; i.e., a randomly ordered set of related posts.

To solve this issue, there are two approaches that I’ve come to favor.

The first, and my favorite, approach is to use Javascript to replace content within the elements you're testing with a set of representative demo content.

Since these tests should not be deployed to your production server, you don't have to worry about the XSS vulnerabilities. As such, I like to use .html() in my tests to replace the dynamic content with static content from a JSON object that I include in my code repo, prior to taking the screenshot.

The second approach is to use a tool called Hologram or mdcss, which allow you to use comments in your CSS to create auto-generated style guides. This approach has more overhead, in that it requires the biggest shift in workflow, but has the added benefit of creating excellent documentation for your front-end components.

Distribution

The second major challenge that I encountered with regression testing is in determining the best way to distribute these tests amongst a team of engineers. So far in our tests we’ve hardcoded our testing URL, this will cause issues when working with a team where everyone may not be using the same URL for their local environment.

To get around this, my team and I have registered our $ grunt test task to accept a --url parameter, which is then saved to a file locally, using grunt.log.

// All a variable to be passed, eg. --url=http://test.dev
var localURL = grunt.option( 'url' );

/**
 * Register a custom task to save the local URL, which is then read by the PhantomCSS test file.
 * This file is saved so that "grunt test" can then be run in the future without passing your local URL each time.
 *
 * Note: Make sure test/visual/.local_url is added to your .gitignore
 *
 * Props to Zack Rothauser for this approach.
 */
grunt.registerTask('test', 'Runs PhantomCSS and stores the --url parameter', function() {
  if (localURL) {
    grunt.log.writeln( 'Local URL: ' + localURL );
    grunt.file.write( 'test/visual/.local_url', localURL );
  }

  grunt.task.run(['phantomcss']);
});

Then, at the top of your test file, you’ll use:

var fs = require('fs'), siteURL;

try {
  siteURL = fs.read( 'test/visual/.local_url' );
} catch(err) {
  siteURL = (typeof siteURL === 'undefined') ? 'http://local.wordpress.dev' : siteURL;
}

casper.thenOpen(siteURL + '/path/to/template');

Your suite will now look for the `.local_url` file whenever it is run, but if the file is not present, it will default to using `http://local.wordpress.dev`.

In Closing

There are a host of benefits that visual regression testing can bring to your projects. Rapid iteration and continuous integration are increasingly the mantra of today’s developers, it only makes sense to build yourself a safety net.

A visual regression testing suite is also great for working with people on open source projects. In fact, the WordPress project is working towards a pattern library with an accompanying regression testing suite. This test suite will provide the groundwork that allows the WordPress project to move forward with plans to restore sanity to their stylesheets.

Alternatives

PhantomCSS is not the only tool available, it’s simply the one that I felt was right for me, my team, and our workflow. If visual regression testing sounds cool to you but PhantomCSS doesn’t sound like your thing, or if you’re just interested in alternatives, I recommend taking a look at: