Parsing Contentful's Rich Text Editor Response

RobLast Updated Dec 28th, 2022 • 12:02am UTC

The purpose of this post is to describe how I implemented the conversion of Contentful's rich text editor (RTE) response into the HTML markup which is allowing you to read this very text. There already exists an official package that I am sure is more robust, but I wanted a fun programming exercise to work through. We also do something similar at CNBC to convert some JSON data into the components that make up our Articles.

Why I Chose Contentful

Has GraphQL api: The original reason for even looking for a headless CMS was to create some content that can be returned through a GraphQL api. I needed to explore one of the most popular GraphQL implementations, Apollo.
Best of Breed Headless CMS: After a quick search for "best headless cms", Contentful was the first hit. In fact, I'm not even sure Contentful is one of the best options available, because I have never used any of the others. I just assumed Google knew. SEO matters.
Free: At the time, I was not considering building anything on top of whichever CMS I went with. I didn't want to pay just to get some data returned to test with.

Getting the Data

The context of this post is very meta - parsing Contentful's Rich Text Editor response to display the content of a blog post on a website made with Gatsby. The reason I bring this up, is because the api is slightly different when querying for data with Gatsby than with Contentful's built in GraphQL api. The query from Contentful might looks something like this: query GetBlogPost($id: String!) { contentfulBlogPost(contentful_id: {eq: $id}) { blogBody { raw } } } While the part of the actual query that is run at build time whenever I deploy my code is this: { allContentfulBlogPost { nodes { blogBody { raw } } } } There are more fields that I request, but those fields are meta data about each blog, and not required or referenced any further. This query is made in the "gatsby-node.js" file as part of the build and deployment process of my website. It is used in the createPages function. I recommend reading this and this if you are interested. You can also check out the source code for this website in one of the footer links, labeled src.

Response: { "data": { "contentfulBlogPost": { "blogBody": { "raw": "{\"nodeType\":\"document\",\"data\":{},\"content\":[{\"nodeType\":\"heading-1\",\"content\":[{\"nodeType\":\"text\",\"value\":\"H 1\",\"marks\":[],\"data\":{}}],\"data\":{}},{\"nodeType\":\"heading-2\",\"content\":[{\"nodeType\":\"text\",\"value\":\"H2\",\"marks\":[],\"data\":{}}],\"data\":{}},{\"nodeType\":\"heading-3\",\"content\":[{\"nodeType\":\"text\",\"value\":\"H3\",\"marks\":[],\"data\":{}}],\"data\":{}},{\"nodeType\":\"heading-4\",\"content\":[{\"nodeType\":\"text\",\"value\":\"H4\",\"marks\":[],\"data\":{}}],\"data\":{}},{\"nodeType\":\"heading-5\",\"content\":[{\"nodeType\":\"text\",\"value\":\"H5\",\"marks\":[],\"data\":{}}],\"data\":{}},{\"nodeType\":\"heading-6\",\"content\":[{\"nodeType\":\"text\",\"value\":\"H 6\",\"marks\":[],\"data\":{}}],\"data\":{}},{\"nodeType\":\"paragraph\",\"content\":[{\"nodeType\":\"text\",\"value\":\"This is the first blog post I have ever written. The purpose of this post is to just test to see if I can get data back when I send a graphQL request.\",\"marks\":[],\"data\":{}}],\"data\":{}},{\"nodeType\":\"paragraph\",\"content\":[{\"nodeType\":\"text\",\"value\":\"HELLO WORLD!\",\"marks\":[{\"type\":\"underline\"},{\"type\":\"italic\"},{\"type\":\"bold\"}],\"data\":{}}],\"data\":{}},{\"nodeType\":\"paragraph\",\"content\":[{\"nodeType\":\"text\",\"value\":\"const sum = (x, y) => {\\n return x + y\\n}\",\"marks\":[{\"type\":\"code\"}],\"data\":{}}],\"data\":{}},{\"nodeType\":\"paragraph\",\"content\":[{\"nodeType\":\"text\",\"value\":\"\",\"marks\":[],\"data\":{}},{\"nodeType\":\"entry-hyperlink\",\"content\":[{\"nodeType\":\"text\",\"value\":\"Second blog post\",\"marks\":[],\"data\":{}}],\"data\":{\"target\":{\"sys\":{\"id\":\"254rCr6d04lGYDZKLhMSSG\",\"type\":\"Link\",\"linkType\":\"Entry\"}}}},{\"nodeType\":\"text\",\"value\":\"\",\"marks\":[],\"data\":{}}],\"data\":{}},{\"nodeType\":\"paragraph\",\"content\":[{\"nodeType\":\"text\",\"value\":\"\",\"marks\":[],\"data\":{}},{\"nodeType\":\"hyperlink\",\"content\":[{\"nodeType\":\"text\",\"value\":\"GOOGLE\",\"marks\":[],\"data\":{}}],\"data\":{\"uri\":\"https://google.com\"}},{\"nodeType\":\"text\",\"value\":\"\",\"marks\":[],\"data\":{}}],\"data\":{}},{\"nodeType\":\"paragraph\",\"content\":[{\"nodeType\":\"text\",\"value\":\"B\",\"marks\":[{\"type\":\"bold\"}],\"data\":{}},{\"nodeType\":\"text\",\"value\":\"OL\",\"marks\":[],\"data\":{}},{\"nodeType\":\"text\",\"value\":\"D T\",\"marks\":[{\"type\":\"bold\"}],\"data\":{}},{\"nodeType\":\"text\",\"value\":\"E\",\"marks\":[{\"type\":\"bold\"},{\"type\":\"italic\"}],\"data\":{}},{\"nodeType\":\"text\",\"value\":\"S\",\"marks\":[{\"type\":\"bold\"},{\"type\":\"italic\"},{\"type\":\"underline\"}],\"data\":{}},{\"nodeType\":\"text\",\"value\":\"T\",\"marks\":[{\"type\":\"bold\"}],\"data\":{}}],\"data\":{}},{\"nodeType\":\"unordered-list\",\"content\":[{\"nodeType\":\"list-item\",\"content\":[{\"nodeType\":\"paragraph\",\"content\":[{\"nodeType\":\"text\",\"value\":\"ul 1\",\"marks\":[],\"data\":{}}],\"data\":{}}],\"data\":{}},{\"nodeType\":\"list-item\",\"content\":[{\"nodeType\":\"paragraph\",\"content\":[{\"nodeType\":\"text\",\"value\":\"ul 2\",\"marks\":[],\"data\":{}}],\"data\":{}}],\"data\":{}}],\"data\":{}},{\"nodeType\":\"ordered-list\",\"content\":[{\"nodeType\":\"list-item\",\"content\":[{\"nodeType\":\"paragraph\",\"content\":[{\"nodeType\":\"text\",\"value\":\"ol 1\",\"marks\":[],\"data\":{}}],\"data\":{}}],\"data\":{}},{\"nodeType\":\"list-item\",\"content\":[{\"nodeType\":\"paragraph\",\"content\":[{\"nodeType\":\"text\",\"value\":\"ol 2\",\"marks\":[],\"data\":{}}],\"data\":{}}],\"data\":{}}],\"data\":{}},{\"nodeType\":\"blockquote\",\"content\":[{\"nodeType\":\"paragraph\",\"content\":[{\"nodeType\":\"text\",\"value\":\"You miss 100% of the shots you don't take -Wayne Gretsky -Michael Scott\",\"marks\":[],\"data\":{}}],\"data\":{}}],\"data\":{}},{\"nodeType\":\"hr\",\"content\":[],\"data\":{}},{\"nodeType\":\"embedded-asset-block\",\"content\":[],\"data\":{\"target\":{\"sys\":{\"id\":\"62EpHlg35oVPcljgEKE1kX\",\"type\":\"Link\",\"linkType\":\"Asset\"}}}},{\"nodeType\":\"paragraph\",\"content\":[{\"nodeType\":\"text\",\"value\":\"\",\"marks\":[],\"data\":{}}],\"data\":{}}]}" } } }, "extensions": {} } What makes GraphQL so cool is how closely a response looks to the query itself. We define the shape of the data we want, including only the fields we need. As you can see, the majority of the response is in the raw field, which represents all the data in the RTE (headings, paragraphs, assets etc.). It is JSON serialized as a string. After parsing the raw string, we get an object that looks like this

The object contains some properties content, data and nodeType. content is an array and in the image the array is expanded in DevTools revealing what it contains - more objects. Each of these objects also contain properties content, data and nodeType. We are dealing with a tree! Each object (which I will refer to herein as "node") may or may not contain child nodes within its own content array.

I quickly sketched out what I mean for all you visual learners

The drawing shows some of the nodes in this post's response. Any child node attached to a parent node can be thought of as one of those objects in the content array. Each node that does not have any children is a "leaf" node, and all of these nodes have a nodeType of "text".

This data structure is similar to what we ultimately want our output to be - HTML. HTML is structured in the same way, with elements inside of other elements, which may also have elements inside of those elements. I can't do all the work for you, open your browser's DevTools, and go to the "Elements" tab to see what I mean.

Because this website uses Gatsby, ultimately I want to convert nodes into React elements. React will handle the heavy lifting of converting javascript into HTML DOM nodes in a manageable way. So how do we get from the JSON response to React Elements?

The Algorithm

At its core we want a function that will take a node and convert it into a React Element. Since we are working with a tree data structure, a recursive implementation seems very appropriate. I named this recursive function parseNode. It takes a node and destructures the properties out of it

const parseNode = (node) => { const { nodeType, content, data, value, marks } = node } Once again, we know that content is an array of all the child nodes. Well we want to convert all of those nodes into React Elements too! We will convert all of a node's children before we continue converting itself.

const mappedContent = content && content.map(parseNode)

That one line of code checks to see if this node has child nodes - i.e. not a "leaf" - and invokes the same function parseNode on each node in the array, returning a new array of React Elements. But some of those child nodes inevitably have children of their own, so we will convert all of those children first. All "leaf" nodes will be converted first.

If you follow the arrows, you will see the order in which each node is invoked to be parsed and converted. The arrows going down are new function invocations being added to the call stack, and vice versa. This algorithm uses Depth First Search to traverse the tree.

If you noticed in the sketch of the data above, each node has a nodeType property. This property indicates what type of React Element the node should map to. For example, a nodeType of "paragraph" should become a React element of type "p". Most of the nodeTypes that can be returned from Contentful's RTE are not special in anyway, and we simply use the React.createElement function. const basicNodeTypeToElementMap = new Map([ ['document', React.Fragment], ['paragraph', 'p'], ['heading-1', 'h1'], ['heading-2', 'h2'], ['heading-3', 'h3'], ['heading-4', 'h4'], ['heading-5', 'h5'], ['heading-6', 'h6'], ['unordered-list', 'ul'], ['ordered-list', 'ol'], ['list-item', 'li'], ['blockquote', 'blockquote'] ])

let element = null if (basicNodeTypeToElementMap.has(nodeType)) { element = React.createElement( basicNodeTypeToElementMap.get(nodeType), {}, mappedContent ) } We created a map of nodeType to React Element type. the nodeType of "document" is the root node which will map to the root React Element of type React.Fragment. This root element's props.children will have the same length as the content array.

Notice how similar the structures are of the initial data response in the above image and the React Elements in this image:

There are some nodeTypes that need special cases to handle the conversions. For example, nodes with nodeType of "hyperlink" should map to an "a" tag. These nodes use the data property, which is an object containing a uri property. The uri is what we use as the href in the "a" tag.

else if (nodeType === 'hyperlink') { element = <a href={data.uri}>{mappedContent}</a> } We used JSX instead of the native React.createElement function in the above example. Another case is the "hr" tag. This horizontal line cannot have any children. So the implementation for this one is simple

else if (nodeType === 'hr') { element = <hr /> }

There are a few more nodeTypes that require a special implementation. I am not going to break down all of them, as the code is publicly available. Hopefully this has been a good intro to parsing Contentful's RTE response, or any RTE for that matter!

Parsing Contentful's Rich Text Editor Response

Why I Chose Contentful

Getting the Data

The Algorithm

Contact Me