A Little Example of Data Massaging

I'm not sure if "data massaging" is a real thing, but that's how I think of what I'm about to describe.

Dave and I were thinking about a bit of a redesign for ShopTalk Show. Fresh coat of paint kinda thing. Always nice to do that from time to time. But we wanted to start from the inside out this time. It didn't sound very appealing to design around the data that we had. We wanted to work with cleaner data. We needed to massage the data that we had, so that it would open up more design possibilities.

We had fallen into the classic WordPress trap

Which is... just dumping everything into the default content area:

We used Markdown, which I think is smart, but still was a pile of rather unstructured content. An example:

If that content was structured entirely differently every time (like a blog post probably would be), that would be fine. But it wasn't. Each show has that same structure.

It's not WordPress' fault

We just didn't structure the data correctly. You can mess that up in any CMS.

To be fair, it probably took quite a while to fall into a steady structure. It's hard to set up data from day one when you don't know what that structure is going to be. Speaking of which...

The structure we needed

This is what one podcast episode needs as far as structured data:

  • Title of episode
  • Description of episode
  • Featured image of episode
  • MP3
    • URL
    • Running Time
    • Size in Bytes
  • A list of topics in the show with time stamps
  • A list of links
  • Optional: Guest(s)
    • Guest Name
    • Guest URL
    • Guest Twitter
    • Guest Bio
    • Guest Photo
  • Optional: Advertiser(s)
    • Advertiser Name
    • Advertiser URL
    • Advertiser Text
    • Advertiser Timestamp
  • Optional: Job Mention(s)
    • Job Company
    • Job Title
    • Job URL
    • Job Description
  • Optional: Transcript

Even that's not perfect

For example: we hand-number the episodes as part of the title, which means when we need that number individually we're doing string manipulation in the templates, which feels a bit janky.

Another example: guests aren't a programmatic construct to themselves. A guest isn't its own database record with an ID. Which means if a guest appears on multiple shows, that's duplicated data. Plus, it doesn't give us the ability to "display all shows with Rebecca Murphey" very easily, which is something we discussed wanting. There is probably some way to program out way out of this in the future, we're thinking.

Fortunately, that structure is easy to express in Advanced Custom Fields

Once you know what you need, ACF makes it pretty easy to build that out and apply it to whatever kind of page type you need to.

I'm aware that other CMS's encourage this kind of structuring by default. Cool. I think that's smart. You should be very proud of yourself for choosing YourFavoriteCMS.

In ACF, our "Field Group" ended up like this:

We needed "Repeater" fields for data like guests, where there is a structure that needs to repeat any number of times. That's a PRO feature of ACF, which seems like a genius move on their part.

Let the data massaging begin

Unfortunately, now that we had the correct structure, it doesn't mean that all the old data just instantly popped into place. There are a couple of ways we could have gone about this...

We could have split the design of show pages by date. If it was an old show, dump out the content like we always have. If it's a new show, use the nice data format. That feels like an even bigger mess than what we had, though.

We could have tried to program our way out of it. Perhaps some scripts we could run that would parse the old data, make intelligent guesses about what content should be ported to the new structure, and run it. Definitely, a non-trivial thing to write. Even if we could have written it, it may have taken more time than just moving the data by hand.

Or... we could move the data by hand. So that's what we ended up doing. Or rather, we hired someone to move the data for us. Thanks Max! Max Kohler was our data massager.

Hand moving really seemed like the way to go. It's essentially data entry work, but required a little thought and decision making (hence "massaging"), so it's the perfect sort of job to either do yourself or find someone who could use some extra hours.

Design is a lot easier with clean and structured data

With all the data nicely cleaned up, I was able to spit it out in a much more consistent and structured way in the design itself:

This latest design of ShopTalk Show is no masterpiece, but now that all this structural work is done, the next design we should be able to focus more on aesthetics and, perhaps, the more fun parts of visual design.