WordPress Block Transforms

Avatar of Chris Coyier
Chris Coyier on

This has been the year of Gutenberg for us here at CSS-Tricks. In fact, that’s a goal we set at the end of last year. We’re much further along that I thought we’d be, authoring all new content in the block editor¹, enabling the block editor for all content now. That means when we open most old posts, we see all the content in the “Classic” block. It looks like this:

A post written on CSS-Tricks before we were using the block editor.

The entire contents of the post is in a single block, so not exactly making any use of the block editor. It’s still “visual,” like the block editor, but it’s more like the old visual editor using TinyMCE. I never used that as it kinda forcefully mangled HTML in a way I didn’t like.

This is the #1 thing I was worried about

Transforming a Classic block into new blocks is as trivial as selecting the Classic block and selecting the “Convert to Blocks” option.

Select the option and the one block becomes many blocks.

How does the block editor handle block-izing old content, when we tell it to do that from the “Convert to Blocks” option? What if it totally screws up content during the conversion? Will we ever be able to switch?

The answer: it does a pretty darn good job. But… there are still issues. Not “bugs” but situations where we have custom HTML in our old content and it doesn’t know what to do with it — let alone how to convert it into exactly the blocks we wish it would. There is a way!

Basic Block Transforms

That’s where this idea of “Block Transforms” comes in. All (well, most?) native blocks have “to” and “from” transformations. You’re probably already familiar with how it manifests in the UI. Like a paragraph can transform “to” a quote and vice versa. Here’s a super meta screenshot of this very paragraph:

Those transforms aren’t magic; they are very explicitly coded. When you register a block, you specify the transforms. Say you were registering your own custom code block. You’d want to make sure that you could transform it…

  • From and to the default built-in code block, and probably a handful of others that might be useful.
  • Back to the built-in code block.

Which might look like:

registerBlockType("my/code-block", {
  title: __("My Code Block"),
  ...
  transforms: {
    from: [
      {
        type: "block",
        priority: 7,
        blocks: ["core/code", "core/paragraph", "core/preformatted"],
        transform: function (attributes) {
          return createBlock("my/code-block", {
            content: attributes.content,
          });
        },
      },
    ],
    to: [
      {
        type: "block",
        blocks: ["core/code"],
        transform: ({ content }) => createBlock("core/code", { content }),
      },
    ],
   
   ...

Those are transforms to and from other blocks. Fortunately, this is a pretty simple block where we’re just shuffling the content around. More complex blocks might need to pass around more data, but I haven’t had to deal with that yet.

The more magical stuff: Block Transforms from raw code

Here’s the moment of truth for old content:

The “Convert to Blocks” option.

In this situation, blocks are being created not from other blocks, but from raw code. Quite literally, the HTML is being looked at and choices are being made about what blocks to make from chunks of that HTML. This is where it’s amazing the block editor does such a good job with the choices, and also where things can go wrong and it can fail, make wrong block choices, or mangle content.

In our old content, a block of code (a super very important thing) in a post would look like this:

<pre rel="JavaScript"><code class="language-javascript" markup="tt">
  let html = `<div>cool</div>`;
</code></pre>

Sometimes the block conversion would do OK on those, turning it into a native code block. But there were a number of problems:

  1. I don’t want a native code block. I want that to be transformed into our own new code block (blogged about that here).
  2. I need some of the information in those attributes to inform settings on the new block, like what kind of code it is.
  3. The HTML in our old code blocks was not escaped and I need it to not choke on that.

I don’t have all the answers here, as this is an evolving process, but I do have some block transforms in place now that are working pretty well. Here’s what a “raw” transform (as opposed to a “block” transform) looks like:

registerBlockType("my/code-block", {
  title: __("My Code Block"),
  // ...
  transforms: {
    from: [
      {
        type: "block",
        priority: 7,
        // ...
      },
      {
        type: "raw",
        priority: 8,
        isMatch: (node) =>
          node.nodeName === "PRE" &&
          node.children.length === 1 &&
          node.firstChild.nodeName === "CODE",
        transform: function (node) {
          let pre = node;
          let code = node.querySelector("code");

          let codeType = "html";
          if (pre.classList.contains("language-css")) {
            codeType = "css";
          }
          if (pre.getAttribute("rel") === "CSS") {
            codeType = "css";
          }
          if (pre.classList.contains("language-javascript")) {
            codeType = "javascript";
          }
          if (code.classList.contains("language-javascript")) {
            codeType = "javascript";
          }
          // ... other data wrangling...

          return createBlock("csstricks/code-block", {
            content: code.innerHTML,
            codeType: codeType,
          });
        },
      },
    ],
    to: [
      // ... 
    ],
   
   // ...

}

That isMatch function runs on every node in the HTML it finds, so this is the big opportunity to return true from that in the special situations you need to. Note in the code above that I’m specifically looking for HTML that looks like <pre ...><code ...>. When that matches, the transform runs, and I can return a createBlock call that passes in data and content I extract from the node with JavaScript.

Another example: Pasting a URL

“Raw” transforms don’t only happen when you “Convert to Blocks.” They happen when you paste content into the block editor too. You’ve probably experienced this before. Say you have copied some table markup from somewhere and paste it into the block editor -— it will probably paste as a table. A YouTube URL might paste into an embed. This kind of thing is why copy/pasting from Word documents and the like tend to work so well with the block editor.

Say you want some special behavior when a certain type of URL is pasted into the editor. This was the situation I was in with our custom CodePen Embed block. I wanted it so if you pasted a codepen.io URL, it would use this custom block, instead of the default embed.

This is a “from” transform that looks like this:

{
  type: "raw",
  priority: 8, // higher number to beat out default
  isMatch: (node) =>
    node.nodeName === "P" &&
    node.innerText.startsWith("https://codepen.io/"),

  transform: function (node) {
    return createBlock("cp/codepen-gutenberg-embed-block", {
      penURL: node.innerText,
      penID: getPenID(node.innerText), // helper function
    });
  },
}

So…

Is it messy? A little. But it’s as powerful as you need it to be. If you have an old site with lots of bespoke HTML and shortcodes and stuff, then getting into block transforms is the only ticket out.

I’m glad I went to WPBlockTalk and caught K. Adam White’s talk on shortcodes because there was just one slide that clued me into that this was even possible. There is a little bit of documentation on it.

One thing I’d like to figure out is if it’s possible to run these transforms on all old content in the database. Seems a little scary, but also like it might be a good idea in some situations. Once I get my transformations really solid, I could see doing that so any old content ready-to-go in the block editor when opening it up. I just have no idea how to go about it.

I’m glad to be somewhat on top of this though, as I friggin love the block editor right now. It’s a pleasure to write in and build content with it. I like what Justin Tadlock said:

The block system is not going anywhere. WordPress has moved beyond the point where we should consider the block editor as a separate entity. It is an integral part of WordPress and will eventually touch more and more areas outside of the editing screen.

It’s here to stay. Embracing the block editor and bending it to our will is key.

  1. What are we calling it anyway? “Gutenberg” doesn’t seem right anymore. Feels like that will fade away, even though the development of it still happens in the Gutenberg plugin. I think I’ll just call it “the block editor” unless specifically referring to that plugin.