Responsible Markdown in Next.js | CSS-Tricks

Markdown truly is a great format. It’s close enough to plain text so that anyone can quickly learn it, and it’s structured enough that it can be parsed and eventually converted to you name it.

That being said: parsing, processing, enhancing, and converting Markdown needs code. Shipping all that code in the client comes at a cost. It’s not huge per se, but it’s still a few dozens of kilobytes of code that are used only to deal with Markdown and nothing else.

In this article, I want to explain how to keep Markdown out of the client in a Next.js application, using the Unified/Remark ecosystem (genuinely not sure which name to use, this is all super confusing).

General idea

The idea is to only use Markdown in the getStaticProps functions from Next.js so this is done during a build (or in a Next serverless function if using Vercel’s incremental builds), but never in the client. I guess getServerSideProps would also be fine, but I think getStaticProps is more likely to be the common use case.

This would return an AST (Abstract Syntax Tree, which is to say a big nested object describing our content) resulting from parsing and processing the Markdown content, and the client would only be responsible for rendering that AST into React components.

I guess we could even render the Markdown as HTML directly in getStaticProps and return that to render with dangerouslySetInnerHtml but we’re not that kind of people. Security matters. And also, flexibility of rendering Markdown the way we want with our components instead of it rendering as plain HTML. Seriously folks, do not do that. 😅

export const getStaticProps = async () => {
  // Get the Markdown content from somewhere, like a CMS or whatnot. It doesn’t
  // matter for the sake of this article, really. It could also be read from a
  // file.
  const markdown = await getMarkdownContentFromSomewhere()
  const ast = parseMarkdown(markdown)

  return { props: { ast } }
}

const Page = props => {
  // This would usually have your layout and whatnot as well, but omitted here
  // for sake of simplicity of course.
  return <MarkdownRenderer ast={props.ast} />
}

export default Page

Parsing Markdown

We are going to use the Unified/Remark ecosystem. We need to install unified and remark-parse and that’s about it. Parsing the Markdown itself is relatively straightforward:

import { unified } from 'unified'
import markdown from 'remark-parse'

const parseMarkdown = content => unified().use(markdown).parse(content)

export default parseMarkdown

Now, what took me a long while to understand is why my extra plugins, like remark-prism or remark-slug, did not work like this. This is because the .parse(..) method from Unified does not process the AST with plugins. As the name suggests, it only parses the string of Markdown content into a tree.

If we want Unified to apply our plugins, we need Unified to go through what they call the “run” phase. Normally, this is done by using the .process(..) method instead of the .parse(..) method. Unfortunately, .process(..) not only parses Markdown and applies plugins, but also stringifies the AST into another format (like HTML via remark-html, or JSX with remark-react). And this is not what we want, as we want to preserve the AST, but after it’s been processed by plugins.

| ........................ process ........................... |
| .......... parse ... | ... run ... | ... stringify ..........|

          +--------+                     +----------+
Input ->- | Parser | ->- Syntax Tree ->- | Compiler | ->- Output
          +--------+          |          +----------+
                              X
                              |
                       +--------------+
                       | Transformers |
                       +--------------+

So what we need to do is run both the parsing and running phases, but not the stringifying phase. Unified does not provide a method to do these 2 out of 3 phases, but it provides individual methods for every phase, so we can do it manually:

import { unified } from 'unified'
import markdown from 'remark-parse'
import prism from 'remark-prism'

const parseMarkdown = content => {
  const engine = unified().use(markdown).use(prism)
  const ast = engine.parse(content)

  // Unified‘s *process* contains 3 distinct phases: parsing, running and
  // stringifying. We do not want to go through the stringifying phase, since we
  // want to preserve an AST, so we cannot call `.process(..)`. Calling
  // `.parse(..)` is not enough though as plugins (so Prism) are executed during
  // the running phase. So we need to manually call the run phase (synchronously
  // for simplicity).
  // See: https://github.com/unifiedjs/unified#description
  return engine.runSync(ast)
}

Tada! We parsed our Markdown into a syntax tree. And then we ran our plugins on that tree (done here synchronously for sake of simplicity, but you could use .run(..) to do it asynchronously). But we did not convert our tree into some other syntax like HTML or JSX. We can do that ourselves, in the render.

Rendering Markdown

Now that we have our cool tree at the ready, we can render it the way we intend to. Let’s have a MarkdownRenderer component that receives the tree as an ast prop, and renders it all with React components.

const getComponent = node => {
  switch (node.type) {
    case 'root':
      return ({ children }) => <>{children}</>

    case 'paragraph':
      return ({ children }) => <p>{children}</p>

    case 'emphasis':
      return ({ children }) => <em>{children}</em>

    case 'heading':
      return ({ children, depth = 2 }) => {
        const Heading = `h${depth}`
        return <Heading>{children}</Heading>
      }

    case 'text':
      return ({ value }) => <>{value}</>

    /* Handle all types here … */

    default:
      console.log('Unhandled node type', node)
      return ({ children }) => <>{children}</>
  }
}

const Node = node => {
  const Component = getComponent(node)
  const { children } = node

  return children ? (
    <Component {...node}>
      {children.map((child, index) => (
        <Node key={index} {...child} />
      ))}
    </Component>
  ) : (
    <Component {...node} />
  )
}

const MarkdownRenderer = props => <Node {...props.ast} />

export default React.memo(MarkdownRenderer)

Most of the logic of our renderer lives in the Node component. It finds out what to render based on the type key of the AST node (this is our getComponent method handling every type of node), and then renders it. If the node has children, it recursively goes into the children; otherwise it just renders the component as a final leaf.

Cleaning up the tree

Depending on which Remark plugins we use, we might encounter the following problem when trying to render our page:

Error: Error serializing .content[0].content.children[3].data.hChildren[0].data.hChildren[0].data.hChildren[0].data.hChildren[0].data.hName returned from getStaticProps in “/”. Reason: undefined cannot be serialized as JSON. Please use null or omit this value.

This happens because our AST contains keys whose values are undefined, which is not something that can be safely serialized as JSON. Next gives us the solution: either we omit the value entirely, or if we need it somewhat, replace it with null.

We’re not going to fix every path by hand though, so we need to walk that AST recursively and clean it up. I found out that this happened when using remark-prism, a plugin to enable syntax highlighting for code blocks. The plugin indeed adds a [data] object to nodes.

What we can do is walk our AST before returning it to clean up these nodes:

const cleanNode = node => {
  if (node.value === undefined) delete node.value
  if (node.tagName === undefined) delete node.tagName
  if (node.data) {
    delete node.data.hName
    delete node.data.hChildren
    delete node.data.hProperties
  }

  if (node.children) node.children.forEach(cleanNode)

  return node
}

const parseMarkdown = content => {
  const engine = unified().use(markdown).use(prism)
  const ast = engine.parse(content)
  const processedAst = engine.runSync(parsed)

  cleanNode(processedAst)

  return processedAst
}

One last thing we can do to ship less data to the client is remove the position object which exists on every single node and holds the original position in the Markdown string. It’s not a big object (it has only two keys), but when the tree gets big, it adds up quickly.

const cleanNode = node => {
  delete node.position

Wrapping up

That’s it folks! We managed to restrict Markdown handling to the build-/server-side code so we don’t ship a Markdown runtime to the browser, which is unnecessarily costly. We pass a tree of data to the client, which we can walk and convert into whatever React components we want.

I hope this helps. :)

Comments

Titus

# August 15, 2021

Hi there! remark/unified maintainer here! I saw the Q on which term to use for what and thought I’d try and explain it. It makes total sense that it’s confusing but for anyone that’s interested, here goes:

unified is the name for the thing that sits underneath it all: the parse, run, stringify interface. It’s also the name users for everything (typically as unified collective).

remark is the markdown ecosystem: so if you have plugins working on a markdown AST, that’s remark.

In many cases you’re working on HTML as well, which is called rehype.

There are other AST ecosystems attached too, natural language, javascript, xml, with other names.

So if you start with markdown, you can use remark-parse and other remark plugins.
If you start with HTML, you can use rehype-parse and rehype plugins.
You can stop there, and use remark-stringify/rehype-stringify.
Or you can turn from one to the other, with remark-rehype or rehype-remark. And use the other ecosystems plugins!

Example: https://github.com/remarkjs/remark-rehype#use

Damon Blais

# August 31, 2021

So… following this guide wasn’t cut and dry unforunately. There are a number of things that didn’t quite work in the latest version.

First, in the parseMarkdown function, if I use runSync it doesn’t work. If I convert it to an async function and use run(ast) TypeScript complains very loudly, but the result at least works.

Argument of type 'import("./node_modules/@types/mdast/index").Root' is not assignable to parameter of type 'import("./node_modules/rehype-format/node_modules/@types/hast/index").Root'.
  Types of property 'children' are incompatible.
    Type 'Content[]' is not assignable to type 'RootContent[]'.
      Type 'Content' is not assignable to type 'RootContent'.
        Type 'Paragraph' is not assignable to type 'RootContent'.
          Property 'value' is missing in type 'Paragraph' but required in type 'Text'.ts(2345)
index.d.ts(75, 5): 'value' is declared here.

I’m not even going to talk about how none of the other Types were specified (not everyone uses TypeScript, so that’s not a fault of the guide really.)

For those who want types, here’s what I came up with:

type Node = {
  properties: { [key: string]: string }
  tagName?: string
  type: string
  value?: any
}

type keyable = {
  key: Key | null | undefined
}

const getComponent = (node?: NodeType) => {
  if (!node || !node.type) return null

  { ... }
}

// I tried typing node, it's a huge pain, I gave up.
const Node = (node: any) => { ... }

const MarkdownRenderer = ({ ast }: { ast: any }) => <Node {...ast} />

Which reminds me, getComponent needs to clean up the Node before returning a Fragment or things that use React’s strict mode scream. Basically, replace the return at the end with this (and then replace each switch case that returns a Fragment to break instead.)

  // erorr: Fragment only accepts 'key' props
  if (node.tagName != undefined) delete node.tagName
  if (node.type != undefined) delete node.type
  if (node.value != undefined) delete node.value
  return Fragment

Now that we’ve gotten all that out of the way, let’s talk about what the expected return from getComponent really is: a ReactElement function/class constructor.

Why does this matter?

Returning a string (such as ‘a’) doesn’t actually do what it should. It instead renders an empty a tag, with no properties. So, we need proper constructors for supported tags.

I had to replace the switch case with this:

  switch (node.type) {
    case '': // the root node is {} on load
    case 'comment':
      return null // don't render comments

    case 'root': // explodes without named root
      // eslint-disable-next-line no-case-declarations
      const root: FC = ({ children }) => <Fragment>{children}</Fragment>
      return root

    case 'text': // all Nodes without a tagName end up being of type 'text' not 'paragraph' -- I wonder what parser you were using that uses 'paragraph' ?
      return function text() {
        // expected text is located in node.value
        return <Fragment>{node.value}</Fragment>
      }

    // and now we come to HTML elements
    case 'element':
      // only render whitelisted elements
      switch (node.tagName) {
        case 'a':      return a
        case 'h1':     return h1
        case 'h2':     return h2
        case 'h3':     return h3
        case 'h4':     return h4
        case 'h5':     return h5
        case 'h6':     return h6
        case 'li':     return li
        case 'ol':     return ol
        case 'ul':     return ul
        case 'code':   return code
        case 'p':      return p
        case 'pre':    return pre
        case 'strong': return strong

        default:
          console.log('unhandled html tag', node)
      }

      console.log('removed unsafe HTML tag', node)
      return null

    default:
      console.log('unhandled node tag', node)
  }

And those components themselves?

  const h1: FC = ({ children }) => {
    return <h1>{children}</h1>
  }
  const h2: FC = ({ children }) => {
    return <h2>{children}</h2>
  }
  const h3: FC = ({ children }) => {
    return <h3>{children}</h3>
  }
  const h4: FC = ({ children }) => {
    return <h4>{children}</h4>
  }
  const h5: FC = ({ children }) => {
    return <h5>{children}</h5>
  }
  const h6: FC = ({ children }) => {
    return <h6>{children}</h6>
  }

  const li: FC<keyable> = ({ key, children }) => {
    return <li key={key}>{children}</li>
  }
  const ol: FC = ({ children }) => {
    return <ol>{children}</ol>
  }
  const ul: FC = ({ children }) => {
    return <ul>{children}</ul>
  }

  const p: FC = ({ children }) => {
    return <p>{children}</p>
  }

  const strong: FC = ({ children }) => {
    return <strong>{children}</strong>
  }

  // note: you need to define `a` inside `getComponent` so it can use the `node` variable from the parent context
  const a: FC = ({ children }) => {
    const classes = []

    // allow attr 'class': string
    if (typeof node.properties?.class === 'string') {
      classes.push(node.properties.class)
    } 

    // allow attr 'className': string
    if (typeof node.properties?.className === 'string') {
      classes.push(node.properties.className)
    }

    // allow attr 'className': string[]
    if (node.properties?.className?.length) {
      classes.push(...node.properties.className)
    }

    // NOTE: You only need to do this for Gatsby, NextJS and other PWA, SSR or SSG frameworks, or React routers that have their own Link component.
    return (
      <Link href={node.properties?.href}>
        <a className={classes.join(' ') || undefined}>{children}</a>
      </Link>
    )
  }

  // likewise, if you're using prism or another syntax highlighting plugin, you'll need to allow the code and pre tags to have a className. You need to define this inside `getComponent` for it to access node.
  const code: FC = ({ children }) => {
    return <code>{children}</code>
  }
  const pre: FC = ({ children }) => {
    return <code>{children}</code>
  }

Kitty Giraudel

Permalink to comment# August 31, 2021
Hello Damon, and thank you for taking the time to leave a comment. There is quite a lot to unpack, so allow me to go through things one by one:
- The article was originally authored for unified v9, and I forgot to mention it—I’m sorry. This is what is causing the failure with runSync. I just tried in v9 and it works fine, so the Unified API must have changed in v10 (which is what a major version is for, so fair enough I guess). I updated the article to mention that Unified should be installed in v9.
- I am personally not using TypeScript and never had, so there is only so much I can do on that front. As you said, not everyone uses TypeScript. Sorry you’re having problem with it though.
- Regarding Fragments, you’re totally right. I updated the code to use ({ children }) => <>{children}</> instead, so no props get passed to the fragment. This way, no need to move the clean up prior rendering as you suggested and it also becomes impervious to AST field addition.
- Returning strings such as p or em as part of getComponent works fine (just tested it). However, it’s not going to cut it for links as they receive a url key from the AST and needs to render the href attribute instead. I guess using a proper component definition is slightly safer, so I updated the article accordingly. And as mentioned, all types need to be implemented, as the code snippet only shows a few. I also added handling the text type for clarity.
Once again, thanks for your feedback! I hope the article is a little clearer now. :)
Kitty Giraudel

Permalink to comment# September 5, 2021

Coming back to add something I recently noticed: it seems everything works fine with unified v10 as well as long as the import is updated to use a named import instead of the default one (import { unified } from 'unified').
Charlie

Permalink to comment# August 12, 2022
Typescript tip: The node types are in mdast (dependency of remark).
```
import {
  Content as ContentAST,
  Root as RootAST,
  Heading as HeadingAST,
  Text as TextAST,
  List as ListAST,
} from 'mdast';

type NodeAST = RootAST | ContentAST;
```
I alias all the types so that they don’t conflict with the UI library I’m using.

Will

# September 6, 2021

I like the article, and I do think the approach is interesting to author in Markdown but send the AST to the client.

I have been playing with this recently myself. I have tried firstly rendering to HTML to see what the size was like. I am rendering math as well though, but 2 tests I did caused the markdown file to increase 18x and the other 35x. I can only guess the AST is similar since those trees can be very large. I think it’s a tradeoff of size over the network vs runtime parsing cost. For the moment I have chosen runtime parsing since I think it reduces the need for me to break up the pipeline.

Mosaad

# May 13, 2022

This was a wonderful detailed piece!

Helped me implement something similar where I receive HTML instead of Markdown but I still want to swap some elements with custom components such as Nextjs’ Image and Link.