Thoughts On Markdown

Markdown is second nature for many of us. Looking back, I remember starting typing in Markdown not long after John Gruber released his first Perl-based parser back in 2004 after collaborating on the language with Aaron Swartz.

Markdown’s syntax is intended for one purpose: to be used as a format for writing for the web.

John Gruber

That’s almost 20 years ago — yikes! What started as a more writer- and reader-friendly syntax for HTML has become a darling for how to write and store technical prose for programmers and tech-savvy people.

Markdown is a signifier for the developer and text-tinkerer culture. But since its introduction, the world of digital content has also changed. While Markdown is still fine for some things, I don’t believe it’s should be the go-to for content anymore.

There are two main reasons for this:

  1. Markdown wasn’t designed to meet today’s needs of content.
  2. Markdown holds editorial experience back.

Of course, this stance is influenced by working for a platform for structured content. At Sanity.io, we spend most of our days thinking about how content as data unlocks a lot of value, and we spend a lot of time thinking deeply about editor experiences, and how to save people time, and make working with digital content delightful. So, there’s skin in the game, but I hope I’m able to portray that even though I’ll argue against Markdown as the go-to format for content, I still have a deep appreciation for its significance, application, and legacy.

Before my current gig, I worked as a technology consultant at an agency where we had to literally fight CMSes that locked our client’s content down by embedding it in presentation and complex data models (yes, even the open-source ones). I have observed people struggle with Markdown syntax, and be demotivated in their jobs as editors and content creators. We have spent hours (and client’s money) on building custom tag-renderers that were never used because people don’t have time or motivation to use the syntax. Even I, when highly motivated, have given up contributing to open-source documentation because the component-based Markdown implementation introduced too much friction.

But I also see the other side of the coin. Markdown comes with an impressive ecosystem and from a developer’s standpoint, there is an elegant simplicity to plain-text files and easy-to-parse syntax for people who are used to reading code. I once spent days building an impressive MultiMarkdown->LaTeX->real-time-PDF-preview-pipeline in Sublime Text for my academic writing. And it makes sense that a README.md file can be opened and edited in a code editor and rendered nicely on GitHub. There’s little doubt that Markdown brings convenience for developers in some use cases.

That is also why I want to build my advice against Markdown by looking back on why it was introduced in the first place, and by going through some of the major developments of content on the web. For many of us, I suspect Markdown is something we just take for granted as a “thing that exists.” But all technology has a history and is a product of human interaction. This is important to remember when you, the reader, develop technology for others to use.

Flavors And Specifications

Markdown was designed to make it easier for web writers to work with articles in an age where web publishing required writing HTML. So, the intent was to make it simpler to interface with text formatting in HTML. It wasn’t the first simplified syntax on the planet, but it was the one that gained the most traction over the years. Today, the usage of Markdown has grown far beyond its design intent to be a simpler way to read and write HTML, to become an approach of marking up plain text in a lot of different contexts. Sure, technologies and ideas can evolve beyond their intent, but the tension in today’s use of Markdown can be traced to this origin and the constraints put into its design.

For those who aren’t familiar with the syntax, take the following HTML content:

<p>The <a href=”https://daringfireball.net/projects/markdown/syntax#philosophy”>Markdown syntax</a> is designed to be <em>easy-to-read</em> and <em>easy-to.write</em>.</p>

With Markdown, you can express the same formatting as:

The Markdown syntax is designed to be _easy-to-read_ and _easy-to-write_.

It’s like a law of nature that technology adoption comes with the pressure to evolve and add features to it. Markdown’s increasing popularity meant that people wanted to adapt it for their use cases. They wanted more features like support for footnotes and tables. The original implementation came with an opinionated stance, which at the time were reasonable for what the design intent was:

For any markup that is not covered by Markdown’s syntax, you simply use HTML itself. There’s no need to preface it or delimit it to indicate that you’re switching from Markdown to HTML; you just use the tags.

John Gruber

In other words, if you want a table, then use <table></table>. You’ll find that this is still the case for the original implementation. One of Markdown’s spiritual successors, MDX, has taken the same principle but extended it to JSX, a JS-based templating language.

From Markdown To Markdown?

It can look like Markdown’s appeal for many wasn’t so much its tie-in to HTML, but the ergonomics of plaintext and simple syntax for formatting. Some content creators wanted to use Markdown for other use cases than simple articles on the web. Implementations like MultiMarkdown introduced affordances for academic writers who wanted to use plain text files but needed more features. Soon you would have a range of writing apps that accepted Markdown syntax, without necessarily turning it into HTML or even using the markdown syntax as a storage format.

In a lot of apps, you’ll find editors that give you a limited set of formatting options, and some of them are more “inspired” by the original syntax. In fact, one of the feedbacks I got on a draft of this article was that by now, “Markdown” should be lower-cased, since it has become so common, and to make it distinct from the original implementation. Because what we recognize as markdown has also become very diverse.

CommonMark: An Attempt To Tame Markdown

Like ice cream, Markdown comes in a lot of flavors, some more popular than others. When people started to fork the original implementation and add features to it, two things happened:

  1. It became more unpredictable what you as a writer could and couldn’t do with Markdown.
  2. Software developers had to make decisions of what implementation to adopt for their software. The original implementation also contained some inconsistencies that added friction for people who wanted to use it programmatically.

This started conversations about formalizing Markdown into a specification proper. Something that Gruber resisted, and still does, interestingly, because he recognized that people wanted to use Markdown for different purposes and “No one syntax would make all happy.” It’s an interesting stance considering that Markdown translates to HTML, which is a specification that evolves to accommodate different needs.

Even though the original implementation of Markdown is covered by a “BSD-like” license, it also reads “Neither the name Markdown nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.” We can safely assume that most products that use “Markdown” as part of their marketing materials haven’t acquired this written permission.

The most successful attempt to bring Markdown into a shared specification is what is today known as CommonMark. It was headed by Jeff Atwood (known for co-founding Stack Overflow and Discourse) and John McFarlane (a professor of philosophy at Berkely who’s behind Babelmark and pandoc). They initially launched it as “Standard Markdown,” but changed it to “CommonMark” after receiving criticism from Gruber. Whose stance was consistent, the intent of Markdown is to be a simple authoring syntax that translates to HTML:

@davewiner And that’s what’s flawed with CommonMark. They want to make things easier for programmers as a primary goal. They miss the point.

— John Gruber (@gruber) September 8, 2014

I think this also marked the point where Markdown had entered the public domain. Even though CommonMark isn’t branded as “Markdown,” (as per licensing) this specification is recognized and referred to as “markdown”. Today, you’ll find CommonMark as the underlying implementation for software like Discourse, GitHub, GitLab, Reddit, Qt, Stack Overflow, and Swift. Projects like unified.js bridges syntaxes by translating them into Abstract Syntax Trees, also rely on CommonMark for their markdown support.

CommonMark has brought a lot of unification around how markdown is implemented, and in a lot of ways has made it simpler for programmers to integrate markdown support in software. But it hasn’t brought the same unification to how markdown is written and used. Take GitHub Flavored Markdown (GFM). It’s based on CommonMark but extends it with more features (like tables, task lists, and strikethrough). Reddit describes its “Reddit Flavored Markdown” as “a variation of GFM,” and introduces features like syntax for marking up spoilers. I think we can safely conclude that both the group behind CommonMark and Gruber were right: it certainly helps with shared specifications, but yes, people want to use Markdown for different specific things.

Markdown As A Formatting Shortcut

Gruber resisted formalizing Markdown into a shared specification because he assumed it would make it less a tool for writers and more a tool for programmers. We have already seen that even with the broad adoption of a specification, we don’t automatically get a syntax that predictably works the same across different contexts. And specifications like CommonMark, popular as it is, also have limited success. An obvious example is Slack’s markdown implementation (called mrkdown) that translates *this* to strong/bold, and not emphasis/italic, and doesn’t support the [link](https://slack.com) syntax, but uses <link|https://slack.com> instead.

You’ll also find that you can use Markdown-like syntax to initialize formatting in rich text editors in software like Notion, Dropbox Paper, Craft, and to a degree, Google Docs (e.g. asterisk + space on a new line will transform to a bulleted list). What’s supported and what’s translated to what varies. So, you can’t necessarily take your muscle memory with you across these applications. For some people, this is fine, and they can adapt. For others, this is a papercut and it keeps them from using these features. Which asks the question, who was Markdown designed for, and who are its users today?

Who Are The Users Of Markdown Supposed To Be?

We have seen markdown exist in a tension between different use cases, audiences, and notions of whom its users are. What started as a markup language for HTML-proficient web writers specifically, became a darling for developer types.

In 2014, web writers started to move away from moving files through parsers in Perl and FTP. Content Management Systems (CMSs) like WordPress, Drupal, and Moveable Type (which I believe Gruber still uses) had steadily grown to become the go-to tools for web publishing. They offered affordances like rich text editors that web writers could use in their browsers.

These rich text editors still assumed HTML and Markdown as the underlying rich text syntax, but they took away some of the cognitive overhead by adding buttons to insert this syntax in the editor. And increasingly, writers weren’t and didn’t have to be versed in HTML. I bet if you did web development with CMSs in the 2010s, you probably had to deal with “junk HTML” that came through these editors when people pasted directly from Word.

Today, I will argue that Markdown’s primary users are developers and people who are interested in code. It’s not a coincidence that Slack made the WYSIWYG the default input mode once their software was used by more people outside of technical departments. And the fact that this was a controversial decision, so much that they had to bring it back as an option, shows how deep the love for markdown is in the developer community. There wasn’t much celebration of Slack trying to make it easier and more accessible for everyone. And this is the crux of the matter.

The Ideology Of Markdown

The fact that markdown has become the lingua franca writing style, and what most website frameworks cater to, is also the main reason I’ve been a bit skittish about publishing this. It’s often talked about as an inherent and undeniable good. Markdown has become a hallmark of being developer-friendly. Smart and skilled people have sunk a lot of collective hours in enabling markdown in all sorts of contexts. So, challenging its hegemony will surely annoy some. But hopefully, it can spawn some fruitful discussion about a thing that’s often taken for granted.

My impression is that the developer friendliness that people relate to Markdown has mostly to do with 3 factors:

  1. The comfortable abstraction of a plain text file.
  2. There is an ecosystem of tooling.
  3. You can keep your content close to your development workflow.

I’m not saying that these stances are wrong, but I’ll suggest that they come with trade-offs and some unreasonable assumptions.

The Simple Mental Model Of A Plain Text File

Databases are amazing things. But they have also had an earned reputation of being hard and inaccessible for frontend developers. I’ve known a lot of great developers who shy away from backend code and databases, because they represent complexity they don’t want to spend time on. Even with WordPress, which does a lot out of the box to keep you from having to deal with its database after setup, it was overhead of getting up and running.

Plain text files, however, are more tangible and are fairly simple to reason about (as long as you’re used to file management that is). Especially compared to a system that will break your content into multiple tables in a relational database with some proprietary structure. For limited use cases, like blog posts of simple rich text with images and links, markdown will get the job done. You can copy the file and stick it in a folder or check it into git. The content feels yours because of the tangibility of files. Even if they’re hosted on GitHub, which is a for-profit Software as a Service owned by Microsoft, and thus covered by their terms of service.

In the era where you actually had to spin up a local database to get your local development going and deal with syncing it with remote, the appeal of plain text files is understandable. But that era is pretty much gone with the emergence of backends as a service. Services and tools like Fauna, Firestore, Hasura, Prisma, PlanetScale, and Sanity’s Content Lake, invest heavily in developer experience. Even operating traditional databases on local development has become less of a hassle compared to just 10 years ago.

If you think about it, do you own your content less if it’s hosted in a database? And hasn’t the developer experience of dealing with databases become significantly simpler with the advent of SaaS tools? And is it fair to say that proprietary database technology impinges on the portability of your content? Today you can launch what’s essentially a Postgres database with no sysadmin skills, make your tables and columns, put your content inside of it, and at any time export it as a .sql dump.

The portability of content has much more to do with how you structure that content in the first place. Take WordPress, it’s fully open-source, you can host your own DB. It even has a standardized export format in XML. But anyone who has tried to move out of a mature WordPress install knows how little this helps if you’re trying to get away from WordPress.

A Vast Ecosystem… For Developers

We already touched on the vast markdown ecosystem. If you look at contemporary website frameworks, most of them assume markdown as a primary content format, some of them, the only format. For example, Hugo, the static site generator used by Smashing Magazine, still requires markdown files for paginated publishing. Meaning that if Smashing Magazine wants to use a CMS to store articles, it has to interact with markdown files, or convert all the content to markdown files. If you look in the documentation for Next.js, Nuxt.js, VuePress, Gatsby.js, and so on, markdown will figure prominently. It’s also the default syntax for README-files on GitHub, which also uses it for formatting in Pull Request notes and comments.

There are some honorable mentions of initiatives to bring the ergonomics of markdown to the masses. Netlify CMS and TinaCMS (the spiritual descendant of Forestry) will give you user interfaces where the markdown syntax is mostly abstracted away for editors. You will commonly find that markdown-based editors in CMSes give you preview functionality for the formatting. Some editors, like Notion’s, will let you paste markdown syntax, and they will translate it to their native formatting. But I think it’s safe to say, that the energy that has gone to innovate for markdown hasn’t favored people who aren’t into writing its syntax. It hasn’t trickled up the stack, as it were.

Content Workflows Or Developer Workflows?

For a developer who makes their blog, using markdown files reduces some of the overhead of getting it up and running, since frameworks often come with built-in parsing or commonly offer it as part of starter code. And there is nothing extra to sign up for. You can use git to commit these files alongside your code. If you are comfortable with git diffs, you’ll even have revision control like you’re used to with programming. In other words, since markdown files are in plain text, they can be integrated with your developer workflow.

But beyond this, the developer experience soon gets more complex. And you end up compromising on your team’s user experience as content creators, and our own developer experience being stuck with markdown to solve problems that are way beyond its design intent.

Yes, it might be cool if you get your content team to use git and check in their changes, but at the same time, is this the best use of their time? Do you really want your editors to bump against merge conflicts or how to rebase branches? Git is hard enough for developers who use it every day. And does this setup really represent the best workflow for people who are primarily working with content? Isn’t this a case where developer experience has trumped editor experience, and isn’t the cost, the time and effort that could go into making something better for users?

Because the expectations and needs from content and editing environments have evolved, I don’t think markdown will do it for us. I don’t see how some of the developer ergonomics end up favoring non-developers, and I think even for developers, markdown is holding our own content creation and needs back. Because content on the web has significantly changed since the early 2000s.

From Paragraphs To Blocks

Markdown has always had the option of opting out to HTML if you wanted more complex things. This worked well when the author was also the webmaster, or at least knew HTML. It also worked well because websites usually were mostly HTML and CSS. The way you designed websites was mostly by creating whole page layouts. You could transform Markdown to the HTML markup and put it up alongside your style.css file. Of course, we had CMSes and static site generators in the 2000s too, but they mostly worked the same, by inserting the HTML content inside of templates without any passing of “props” between the components.

But most of us don’t really author HTML like in the old days anymore. Content on the web has evolved from mostly being articles with simple rich text formatting to composed multimedia and specialized components often with user interactivity (which is a fancy way of saying “newsletter signup call to actions”).

From Articles To Apps

In the early 2010s, Web 2.0 was in its heyday, and Software as a Service-companies began to use the web for data-heavy applications. HTML, CSS, and JavaScript were increasingly used to drive interactive UIs. Twitter open-sourced Bootstrap, their framework for building more consistent and resilient user interfaces. This drove what we can call the “componentization” of web design. It shifted the way we build for the web in a fundamental way.

The various CSS frameworks that emerged in this era (e.g. Bootstrap and Foundation) tended to use standardized class names and assumed specific HTML structures to make it less hard to make resilient and responsive user interfaces. With the web design philosophy of Atomic Design and class-name conventions like Block-Element-Modifier (BEM) the default was shifted from thinking page-layout first, to seeing pages as a collection of repeatable and compatible design elements.

Whatever content you have inside of markdown is not compatible with this. Unless you down the rabbit hole of interjecting the markdown parsers, and tweaked it to output the syntax you wanted (more on this later). No wonder, Markdown was designed to be simple rich text articles of native HTML elements that you would target with a stylesheet.

This is still an issue for people who use Markdown to drive content for their sites.

The Embeddable Web

But something also happened to our content as well. Not only could we start finding it outside of the semantic <article> HTML-tags, but it started to contain more… stuff. A lot of our content moved out from our LiveJournals and blogs and into social media: Facebook, Twitter, tumblr, YouTube. To get the snippets of content back into our articles, we needed to be able to embed them. The HTML convention started using the <iframe> tag to channel the video player from YouTube or even insert a tweet-box in between your paragraphs of text. Some systems started abstracting this into “short-codes”, most often brackets containing some keyword to identify what block of content it should represent, and some key-value attributes. For example, dev.to have enabled syntax from the templating language liquid to be inserted into their Markdown editor:

{% youtube dQw4w9WgXcQ %}

Of course, this requires you to use a customized Markdown parser, and have special logic to make sure the right HTML was inserted when the syntax was turned into HTML. And your content creators will have to remember these codes (unless there was some kind of toolbar to automatically insert them). And if a bracket gets deleted or messed up, that might break the site.

But what about MDX?

An attempt to solve the need for block content is MDX, presented with the tagline “Markdown for the component era.” MDX lets you use the JSX templating language, as well as JavaScript, interlaced in markdown syntax. There is a lot of impressive engineering in the community around MDX, including Unified.js, which specializes in parsing various syntaxes into Abstract Syntax Trees (ASTs), so that they are more accessible to be used programmatically. Note, that the standardization of markdown would make the work for the folks behind Unified.js and its users simpler, because there are fewer edge cases to cater for.

MDX certainly brings better developer experience in integrating components into Markdown. But it doesn’t bring better editor experience, because it adds a lot of cognitive overhead to content production and editing:

import {Chart} from './snowfall.js'
export const year = 2018

# Last year’s snowfall

In {year}, the snowfall was above average.
It was followed by a warm spring which caused
flood conditions in many of the nearby rivers.

<Chart year={year} color="#fcb32c" />

The amount of assumed knowledge just for this simple example is substantial. You need to know about ES6 modules, JavaScript variables, JSX templating syntax, and how to use props, hex codes, and data types, and you need to be familiar with what components you can use, and how to use them. And you need to type it correctly and in an environment that gives you some kind of feedback. I have no doubt that there will be more accessible authoring tools on top of MDX, it feels like solving for something that doesn’t need to be a problem in the first place.

Unless you are extremely diligent in how you compose and name your MDX components, it also ties your content to a specific presentation. Just take the example above brought from the MDX front page. You’ll find a hard-coded color hex for the chart. When you redesign your site, that color might not be compatible with your new design system. Of course, there’s nothing keeping you from abstracting this and using the prop color=”primary”, but there’s also nothing in the tool that nudges you to make wise decisions like this.

Embedding specific presentation concerns in your content has increasingly become a liability and something that will get in the way of adapting, iterating, and moving quickly with your content. It locks it down in ways that are much more subtle than having content in a database. You risk ending up in the same place as moving out of a mature WordPress install with plugins. It is cumbersome to unmix structure and presentation.

The Demand For Structured Content

With more complex sites and user journeys, we also see the need to present the same pieces of content throughout a website. If you’re running an e-commerce site, you want to embed product information in many places outside a single product page. If you run a modern marketing site, you want to be able to share the same copy across multiple personalized views.

To do this efficiently and reliable you will need to adapt structured content. That means your content needs to be embedded with metadata and chunked up in ways that make it possible to parse for intent. If a developer just sees “page” with “content,” that makes it very difficult to include the right things in the right places. If they can get to all “product descriptions” with an API or a query, that makes everything easier.

With markdown, you’re limited to expressing taxonomies and structured content either to some sort of folder organization (making it hard to put the same piece of content in multiple taxonomies) or you need to augment the syntax with something else.

Jekyll, an early Static Site Generator (SSG) built for markdown files, introduced “Front Matter” as a way to add metadata to posts using YAML (a simple key-value format that uses spaces to create scope) between three dashes at the top of the file. So, now you’ll have two syntaxes to deal with. YAML also has a reputation for being mischievous (especially if you’re from Norway). Nevertheless, other SSGs have adopted this convention, as well as git-based CMSes that use markdown as their content format.

When you have to add additional syntax to your plain files to get some of the affordances of structured content, you may start to wonder if it’s really worth it. And who the format is for and who it excludes.

If you think about it, a lot of what we do on the web is not only consuming content, we’re creating it! I’m currently writing this lengthy article in an advanced word processor in my browser.

There’s a growing expectation that you should also be able to author block content in modern content applications. People have started to get used to delightful user experiences that works and looks nice, and where you aren’t expected to have to learn specialized syntax. Medium popularized the notion that you could have delightful and intuitive content creation on the web. And speaking of “notion”, the popular note app has gone all in on block content, and lets users mix max from a wide range of different types. Most of these blocks goes beyond markdown, and the native elements of HTML.

It’s notable that Notion, describing their process to make their content accessible through their highly anticipated API, makes a point out of chosing their content format, that:

Documents from one Markdown editor will often parse and render differently in another application. The inconsistency tends to be manageable for simple documents, but it's a big problem for Notion's rich library of blocks and inline formatting options, many of which are simply not supported in any widely-used Markdown implementation.

Notion went with a JSON based format that let them express as structured data. Their argument is that it makes it easier and more predictable to interact with for developers who want to build their own presentation of the block content that comes out of Notion’s APIs.

If Not Markdown, Then What?

I suspect that the prominence of Markdown has held back innovation and progress for digital content. So, when I argue that we should stop choosing it as a primary way to store content, it’s hard to give a straight answer to what should replace it. What we do know, however, is what we should expect from modern content formats and authoring tools.

Let’s Invest In Accessible Authoring Experiences

Using markdown requires you to learn syntax, and often multiple syntaxes and bespoke tags to be practical with modern expectations. Today, that feels like a completely unnecessary expectation to put on most people. I wish we could direct more energy into making accessible and delightful editorial experiences that produces modern portable content formats.

Even though it’s notoriously difficult to build great block content editors, there are a couple of viable options out there that can be extended and customized for your use case (for example Slate.js, Quill.js, or Prosemirror). Then again, investing in the communities around these tools might also help their development further.

Increasingly, people will expect authoring tools to be accessible, real-time, and collaborative. Why should one have to push a save button on the web in 2021? Why shouldn’t it be possible to make a change in a document without risking a race condition, because your colleague happened to have the document open in a tab? Should we expect authors to have to deal with merge conflicts? And shouldn’t we make it easy for content creators to work with structured content with visual affordances that make sense?

To be a bit polemical: the last decade’s innovations in reactive JavaScript frameworks and UI components are perfect for creating awesome authoring tools. Instead of using them to transpile Markdown to HTML and into an abstract syntax tree to then integrate it in a JavaScript template language that outputs HTML.

Block Content Should Follow A Specification

I haven’t mentioned WYSIWYG editors for HTML. Because they are the wrong thing. Modern block content editors should preferably interoperate with a specified format. The aforementioned editors do at least have a sensible internal document model that can be transformed into something more portable. If you look at the content management system landscape, you start to see various JSON-based block content formats emerge. Some of them are still tied to HTML assumptions or overly concerned with character positions. And none of them aren’t really offered as a generic specification.

At Sanity.io, we decided early that the block content format should never assume HTML as neither input nor output, and that we could use algorithms to synchronize text strings. More importantly, was it that block content and rich text should be deeply typed and queryable. The result was the open specification Portable Text. Its structure not only makes it flexible enough to accommodate custom data structures as blocks and inline spans; it’s also fully queryable with open-source query languages like GROQ.

Portable Text isn’t design to be written or be easily readable in its raw form; it’s designed to be produced by an user interface, manipulated by code, and to be serialized and rendered where ever it needs to go. For example, you can use it to express content for voice assistants.

{
  "style": "normal",
  "_type": "block",
  "children": [
    {
      "_type": "span",
      "marks": ["a-key", "emphasis"],
      "text": "some text"
    }
  ],
  "markDefs": [
    {
      "_key": "a-key",
      "_type": "markType",
      "extraData": "some data"
    }
  ]
}

An interesting side-effect of turning block content into structured data is exactly that: It becomes data! And data can be queried and processed. That can be highly useful and practical, and it lets you ask your content repository questions that would be otherwise harder and more errorprone in formats like Markdown.

For example, if I for some reason wanted to know what programming languages we’ve covered in examples on Sanity’s blog, that’s within reach with a short query. You can imagine how trivial it is to build specialized tools and views on top of this that can be helpful for content editors:

distinct(
  *["code" in body[]._type]
      .body[_type == "code"]
      .language
)
// output
[
  "text",
  "javascript",
  "json",
  "html",
  "markdown",
  "sh",
  "groq",
  "jsx",
  "bash",
  "css",
  "typescript",
  "tsx",
  "scss"
]

Example: Get a distinct list of all programming languages that you have code blocks of.

Portable Text is also serializable, meaning that you can recursively loop through it, and make an API that exposes its nodes in callback functions mapped to block types, marked-up spans, and so on. We have spent the last years learning a lot about how it works and how it can be improved, and plan to take it to 1.0 in the near future. The next step is to offer an editor experience outside of Sanity Studio. As we have learned from Markdown, the design intent is important.

Of course, whatever the alternative to markdown is, it doesn’t need to be Portable Text, but it needs to be portable text. And it needs to share a lot of its characteristics. There have been a couple of other JSON-based block content format popping up the last few years, but a lot of them seem to bring with them a lot of “HTMLism.” The convenience is understandable, since a lot of content still ends up on the web serialized into HTML, but the convenience limits the portability and the potential for reuse.

You can disregard my short pitch for something we made at Sanity, as long as you embrace the idea of structured content and formats that let you move between systems in a fundamental manner. For example, a goal for Portable Text will be improved compatibility with Unified.js, so it’s easier to travel between formats.

Embracing The Legacy Of Markdown

Markdown in all its flavors, interpretations, and forks won’t go away. I suspect that plain text files will always have a place in developers’ note apps, blogs, docs, and digital gardens. As a writer who has used markdown for almost two decades, I’ve become accustomed to “markdown shortcuts” that are available in many rich text editors and am frequently stumped from Google Docs’ lack of markdownisms. But I’m not sure if the next generation of content creators and even developers will be as bought in on markdown, and nor should they have to be.

I also think that markdown captured a culture of savvy tinkerers who love text, markup, and automation. I’d love to see that creative energy expand and move into collectively figuring out how we can make better and more accessible block content editors, and building out an ecosystem around specifications that can express block content that’s agnostic to HTML. Structured data formats for block content might not have the same plain text ergonomics, but they are highly “tinkerable” and open for a lot of creativity of expression and authoring.

If you are a developer, product owner, or a decision-maker, I really want you to be circumspect of how you want to store and format your content going forward. If you’re going for markdown, at least consider the following trade-offs:

Markdown is not great for the developer experience in modern stacks:

  • It can be a hassle to parse and validate, even with great tooling.
  • Even if you adopt CommonMark, you aren’t guaranteed compatibility with tooling or people’s expectations.
  • It’s not great for structured content, YAML frontmatter only takes you so far.

Markdown is not great for editorial experience:

  • Most content creators don’t want to learn syntax, their time is better spent on other things.
  • Most markdown systems are brittle, especially when people get syntax wrong (which they will).
  • It’s hard to accommodate great collaborative user experiences for block content on top of markdown.

Markdown is not great in block content age, and shouldn’t be forced into it. Block content needs to:

  • Be untangled from HTMLisms and presentation agnostic.
  • Accommodate structured content, so it can be easily used wherever it needs to be used.
  • Have stable specification(s), so it’s possible to build on.
  • Support real-time collaborative systems.

What’s common for people like me who challenge the prevalence of markdown, and those who are really into the simple way of expressing text formating is an appreciation of how we transcribe intent into code. That’s where I think we can all meet. But I do think it’s time to look at the landscape and the emerging content formats that try to encompass modern needs, and ask how we can make sure that we build something that truly caters to editorial experience, and that can speak to developer experience as well.

I want to express my gratitude to Titus Wormer (@wooorm) for his insightful feedback on my first draft of this post, and for the great work he and the Unified.js team have done for the web community.

How to Make Taxonomy Pages With Gatsby and Sanity.io

In this tutorial, we’ll cover how to make taxonomy pages with Gatsby with structured content from Sanity.io. You will learn how to use Gatsby’s Node creation APIs to add fields to your content types in Gatsby’s GraphQL API. Specifically, we’re going to create category pages for the Sanity’s blog starter.

That being said, there is nothing Sanity-specific about what we’re covering here. You’re able to do this regardless of which content source you may have. We’re just reaching for Sanity.io for the sake of demonstration.

Get up and running with the blog

If you want to follow this tutorial with your own Gatsby project, go ahead and skip to the section for creating a new page template in Gatsby. If not, head over to sanity.io/create and launch the Gatsby blog starter. It will put the code for Sanity Studio and the Gatsby front-end in your GitHub account and set up the deployment for both on Netlify. All the configuration, including example content, will be in place so that you can dive right into learning how to create taxonomy pages.

Once the project is iniated, make sure to clone the new repository on GitHub to local, and install the dependencies:

git clone git@github.com:username/your-repository-name.git
cd your-repository-name
npm i

If you want to run both Sanity Studio (the CMS) and the Gatsby front-end locally, you can do so by running the command npm run dev in a terminal from the project root. You can also cd into the web folder and just run Gatsby with the same command.

You should also install the Sanity CLI and log in to your account from the terminal: npm i -g @sanity/cli && sanity login. This will give you tooling and useful commands to interact with Sanity projects. You can add the --help flag to get more information on its functionality and commands.

We will be doing some customization to the gatsby-node.js file. To see the result of the changes, restart Gatsby’s development server. This is done in most systems by hitting CTRL + C in the terminal and running npm run dev again.

Getting familiar with the content model

Look into the /studio/schemas/documents folder. There are schema files for our main content types: author, category, site settings, and posts. Each of the files exports a JavaScript object that defines the fields and properties of these content types. Inside of post.js is the field definition for categories:

{
  name: 'categories',
  type: 'array',
  title: 'Categories',
  of: [
    {
      type: 'reference',
      to: {
        type: 'category'
      }
    }
  ]
},

This will create an array field with reference objects to category documents. Inside of the blog’s studio it will look like this:

An array field with references to category documents in the blog studio
An array field with references to category documents in the blog studio

Adding slugs to the category type

Head over to /studio/schemas/documents/category.js. There is a simple content model for a category that consists of a title and a description. Now that we’re creating dedicated pages for categories, it would be handy to have a slug field as well. We can define that in the schema like this:

// studio/schemas/documents/category.js
export default {
  name: 'category',
  type: 'document',
  title: 'Category',
  fields: [
    {
      name: 'title',
      type: 'string',
      title: 'Title'
    },
    {
      name: 'slug',
      type: 'slug',
      title: 'Slug',
      options: {
        // add a button to generate slug from the title field
        source: 'title'
      }
    },
    {
      name: 'description',
      type: 'text',
      title: 'Description'
    }
  ]
}

Now that we have changed the content model, we need to update the GraphQL schema definition as well. Do this by executing npm run graphql-deploy (alternatively: sanity graphql deploy) in the studio folder. You will get warnings about breaking changes, but since we are only adding a field, you can proceed without worry. If you want the field to accessible in your studio on Netlify, check the changes into git (with git add . && git commit -m"add slug field") and push it to your GitHub repository (git push origin master).

Now we should go through the categories and generate slugs for them. Remember to hit the publish button to make the changes accessible for Gatsby! And if you were running Gatsby’s development server, you’ll need to restart that too.

Quick sidenote on how the Sanity source plugin works

When starting Gatsby in development or building a website, the source plugin will first fetch the GraphQL Schema Definitions from Sanity deployed GraphQL API. The source plugin uses this to tell Gatsby which fields should be available to prevent it from breaking if the content for certain fields happens to disappear. Then it will hit the project’s export endpoint, which streams all the accessible documents to Gatsby’s in-memory datastore.

In order words, the whole site is built with two requests. Running the development server, will also set up a listener that pushes whatever changes come from Sanity to Gatsby in real-time, without doing additional API queries. If we give the source plugin a token with permission to read drafts, we’ll see the changes instantly. This can also be experienced with Gatsby Preview.

Adding a category page template in Gatsby

Now that we have the GraphQL schema definition and some content ready, we can dive into creating category page templates in Gatsby. We need to do two things:

  • Tell Gatsby to create pages for the category nodes (that is Gatsby’s term for “documents”).
  • Give Gatsby a template file to generate the HTML with the page data.

Begin by opening the /web/gatsby-node.js file. Code will already be here that can be used to create the blog post pages. We’ll largely leverage this exact code, but for categories. Let’s take it step-by-step:

Between the createBlogPostPages function and the line that starts with exports.createPages, we can add the following code. I’ve put in comments here to explain what’s going on:

// web/gatsby-node.js

// ...

async function createCategoryPages (graphql, actions) {
  // Get Gatsby‘s method for creating new pages
  const {createPage} = actions
  // Query Gatsby‘s GraphAPI for all the categories that come from Sanity
  // You can query this API on http://localhost:8000/___graphql
  const result = await graphql(`{
    allSanityCategory {
      nodes {
        slug {
          current
        }
        id
      }
    }
  }
  `)
  // If there are any errors in the query, cancel the build and tell us
  if (result.errors) throw result.errors

  // Let‘s gracefully handle if allSanityCatgogy is null
  const categoryNodes = (result.data.allSanityCategory || {}).nodes || []

  categoryNodes
    // Loop through the category nodes, but don't return anything
    .forEach((node) => {
      // Desctructure the id and slug fields for each category
      const {id, slug = {}} = node
      // If there isn't a slug, we want to do nothing
      if (!slug) return

      // Make the URL with the current slug
      const path = `/categories/${slug.current}`

      // Create the page using the URL path and the template file, and pass down the id
      // that we can use to query for the right category in the template file
      createPage({
        path,
        component: require.resolve('./src/templates/category.js'),
        context: {id}
      })
    })
}

Last, this function is needed at the bottom of the file:

// /web/gatsby-node.js

// ...

exports.createPages = async ({graphql, actions}) => {
  await createBlogPostPages(graphql, actions)
  await createCategoryPages(graphql, actions) // <= add the function here
}

Now that we have the machinery to create the category page node in place, we need to add a template for how it actually should look in the browser. We’ll base it on the existing blog post template to get some consistent styling, but keep it fairly simple in the process.

// /web/src/templates/category.js
import React from 'react'
import {graphql} from 'gatsby'
import Container from '../components/container'
import GraphQLErrorList from '../components/graphql-error-list'
import SEO from '../components/seo'
import Layout from '../containers/layout'

export const query = graphql`
  query CategoryTemplateQuery($id: String!) {
    category: sanityCategory(id: {eq: $id}) {
      title
      description
    }
  }
`
const CategoryPostTemplate = props => {
  const {data = {}, errors} = props
  const {title, description} = data.category || {}

  return (
    <Layout>
      <Container>
        {errors && <GraphQLErrorList errors={errors} />}
        {!data.category && <p>No category data</p>}
        <SEO title={title} description={description} />
        <article>
          <h1>Category: {title}</h1>
          <p>{description}</p>
        </article>
      </Container>
    </Layout>
  )
}

export default CategoryPostTemplate

We are using the ID that was passed into the context in gatsby-node.js to query the category content. Then we use it to query the title and description fields that are on the category type. Make sure to restart with npm run dev after saving these changes, and head over to localhost:8000/categories/structured-content in the browser. The page should look something like this:

A barebones category page with a site title, Archive link, page title, dummy content and a copyright in the footer.
A barebones category page

Cool stuff! But it would be even cooler if we actually could see what posts that belong to this category, because, well, that’s kinda the point of having categories in the first place, right? Ideally, we should be able to query for a “pages” field on the category object.

Before we learn how to that, we need to take a step back to understand how Sanity’s references work.

Querying Sanity’s references

Even though we’re only defining the references in one type, Sanity’s datastore will index them “bi-directionally.” That means creating a reference to the “Structured content” category document from a post lets Sanity know that the category has these incoming references and will keep you from deleting it as long as the reference exists (references can be set as “weak” to override this behavior). If we use GROQ, we can query categories and join posts that have them like this (see the query and result in action on groq.dev):

*[_type == "category"]{
  _id,
  _type,
  title,
  "posts": *[_type == "post" && references(^._id)]{
    title,
    slug
  }
}
// alternative: *[_type == "post" && ^._id in categories[]._ref]{

This ouputs a data structure that lets us make a simple category post template:

[
  {
    "_id": "39d2ca7f-4862-4ab2-b902-0bf10f1d4c34",
    "_type": "category",
    "title": "Structured content",
    "posts": [
      {
        "title": "Exploration powered by structured content",
        "slug": {
          "_type": "slug",
          "current": "exploration-powered-by-structured-content"
        }
      },
      {
        "title": "My brand new blog powered by Sanity.io",
        "slug": {
          "_type": "slug",
          "current": "my-brand-new-blog-powered-by-sanity-io"
        }
      }
    ]
  },
  // ... more entries
]

That’s fine for GROQ, what about GraphQL?

Here‘s the kicker: As of yet, this kind of query isn’t possible with Gatsby’s GraphQL API out of the box. But fear not! Gatsby has a powerful API for changing its GraphQL schema that lets us add fields.

Using createResolvers to edit Gatsby’s GraphQL API

Gatsby holds all the content in memory when it builds your site and exposes some APIs that let us tap into how it processes this information. Among these are the Node APIs. It’s probably good to clarify that when we are talking about “node” in Gatsby — not to be confused with Node.js. The creators of Gatsby have borrowed “edges and nodes” from Graph theory where “edges” are the connections between the “nodes” which are the “points” where the actual content is located. Since an edge is a connection between nodes, it can have a “next” and “previous” property.

The edges with next and previous, and the node with fields in GraphQL’s API explorer
The edges with next and previous, and the node with fields in GraphQL’s API explorer

The Node APIs are used by plugins first and foremost, but they can be used to customize how our GraphQL API should work as well. One of these APIs is called createResolvers. It’s fairly new and it lets us tap into how a type’s nodes are created so we can make queries that add data to them.

Let’s use it to add the following logic:

  • Check for ones with the SanityCategory type when creating the nodes.
  • If a node matches this type, create a new field called posts and set it to the SanityPost type.
  • Then run a query that filters all posts that has lists a category that matches the current category’s ID.
  • If there are matching IDs, add the content of the post nodes to this field.

Add the following code to the /web/gatsby-node.js file, either below or above the code that’s already in there:

// /web/gatsby-node.js
// Notice the capitalized type names
exports.createResolvers = ({createResolvers}) => {
  const resolvers = {
    SanityCategory: {
      posts: {
        type: ['SanityPost'],
        resolve (source, args, context, info) {
          return context.nodeModel.runQuery({
            type: 'SanityPost',
            query: {
              filter: {
                categories: {
                  elemMatch: {
                    _id: {
                      eq: source._id
                    }
                  }
                }
              }
            }
          })
        }
      }
    }
  }
  createResolvers(resolvers)
}

Now, let’s restart Gatsby’s development server. We should be able to find a new field for posts inside of the sanityCategory and allSanityCategory types.

A GraphQL query for categories with the category title and the titles of the belonging posts

Adding the list of posts to the category template

Now that we have the data we need, we can return to our category page template (/web/src/templates/category.js) and add a list with links to the posts belonging to the category.

// /web/src/templates/category.js
import React from 'react'
import {graphql, Link} from 'gatsby'
import Container from '../components/container'
import GraphQLErrorList from '../components/graphql-error-list'
import SEO from '../components/seo'
import Layout from '../containers/layout'
// Import a function to build the blog URL
import {getBlogUrl} from '../lib/helpers'

// Add “posts” to the GraphQL query
export const query = graphql`
  query CategoryTemplateQuery($id: String!) {
    category: sanityCategory(id: {eq: $id}) {
      title
      description
      posts {
        _id
        title
        publishedAt
        slug {
          current
        }
      }
    }
  }
`
const CategoryPostTemplate = props => {
  const {data = {}, errors} = props
  // Destructure the new posts property from props
  const {title, description, posts} = data.category || {}

  return (
    <Layout>
      <Container>
        {errors && <GraphQLErrorList errors={errors} />}
        {!data.category && <p>No category data</p>}
        <SEO title={title} description={description} />
        <article>
          <h1>Category: {title}</h1>
          <p>{description}</p>
          {/*
            If there are any posts, add the heading,
            with the list of links to the posts
          */}
          {posts && (
            <React.Fragment>
              <h2>Posts</h2>
              <ul>
                { posts.map(post => (
                  <li key={post._id}>
                    <Link to={getBlogUrl(post.publishedAt, post.slug)}>{post.title}</Link>
                  </li>))
                }
              </ul>
            </React.Fragment>)
          }
        </article>
      </Container>
    </Layout>
  )
}

export default CategoryPostTemplate

This code will produce this simple category page with a list of linked posts – just liked we wanted!

The category page with the category title and description, as well as a list of its posts

Go make taxonomy pages!

We just completed the process of creating new page types with custom page templates in Gatsby. We covered one of Gatsby’s Node APIs called createResolver and used it to add a new posts field to the category nodes.

This should give you what you need to make other types of taxonomy pages! Do you have multiple authors on your blog? Well, you can use the same logic to create author pages. The interesting thing with the GraphQL filter is that you can use it to go beyond the explicit relationship made with references. It can also be used to match other fields using regular expressions or string comparisons. It’s fairly flexible!

The post How to Make Taxonomy Pages With Gatsby and Sanity.io appeared first on CSS-Tricks.

How To Make A Speech Synthesis Editor

How To Make A Speech Synthesis Editor

How To Make A Speech Synthesis Editor

Knut Melvær

When Steve Jobs unveiled the Macintosh in 1984, it said “Hello” to us from the stage. Even at that point, speech synthesis wasn’t really a new technology: Bell Labs developed the vocoder as early as in the late 30s, and the concept of a voice assistant computer made it into people’s awareness when Stanley Kubrick made the vocoder the voice of HAL9000 in 2001: A Space Odyssey (1968).

It wasn’t before the introduction of Apple’s Siri, Amazon Echo, and Google Assistant in the mid 2015s that voice interfaces actually found their way into a broader public’s homes, wrists, and pockets. We’re still in an adoption phase, but it seems that these voice assistants are here to stay.

In other words, the web isn’t just passive text on a screen anymore. Web editors and UX designers have to get accustomed to making content and services that should be spoken out loud.

We’re already moving fast towards using content management systems that let us work with our content headlessly and through APIs. The final piece is to make editorial interfaces that make it easier to tailor content for voice. So let’s do just that!

What Is SSML

While web browsers use W3C’s specification for HyperText Markup Language (HTML) to visually render documents, most voice assistants use Speech Synthesis Markup Language (SSML) when generating speech.

A minimal example using the root element <speak>, and the paragraph (<p>) and sentence (<s>) tags:

<speak>
  <p>
    <s>This is the first sentence of the paragraph.</s>
    <s>Here’s another sentence.</s>
  </p>
</speak>
Press play to listen to the snippet:

Where SSML gets existing is when we introduce tags for <emphasis> and <prosody> (pitch):

<speak>
  <p>
    <s>Put some <emphasis strength="strong">extra weight on these words</emphasis></s>
    <s>And say <prosody pitch="high" rate="fast">this a bit higher and faster</prosody>!</s>
  </p>
</speak>
Press play to listen to the snippet:

SSML has more features, but this is enough to get a feel for the basics. Now, let’s take a closer look at the editor that we will use to make the speech synthesis editing interface.

The Editor For Portable Text

To make this editor, we’ll use the editor for Portable Text that features in Sanity.io. Portable Text is a JSON specification for rich text editing, that can be serialized into any markup language, such as SSML. This means you can easily use the same text snippet in multiple places using different markup languages.

Sanity.io’s default editor for Portable Text
Sanity.io’s default editor for Portable Text (Large preview)

Installing Sanity

Sanity.io is a platform for structured content that comes with an open-source editing environment built with React.js. It takes two minutes to get it all up and running.

Type npm i -g @sanity/cli && sanity init into your terminal, and follow the instructions. Choose “empty”, when you’re prompted for a project template.

If you don’t want to follow this tutorial and make this editor from scratch, you can also clone this tutorial’s code and follow the instructions in README.md.

When the editor is downloaded, you run sanity start in the project folder to start it up. It will start a development server that use Hot Module Reloading to update changes as you edit its files.

How To Configure Schemas In Sanity Studio

Creating The Editor Files

We’ll start by making a folder called ssml-editor in the /schemas folder. In that folder, we’ll put some empty files:

/ssml-tutorial/schemas/ssml-editor
                        ├── alias.js
                        ├── emphasis.js
                        ├── annotations.js
                        ├── preview.js
                        ├── prosody.js
                        ├── sayAs.js
                        ├── blocksToSSML.js
                        ├── speech.js
                        ├── SSMLeditor.css
                        └── SSMLeditor.js

Now we can add content schemas in these files. Content schemas are what defines the data structure for the rich text, and what Sanity Studio uses to generate the editorial interface. They are simple JavaScript objects that mostly require just a name and a type.

We can also add a title and a description to make a bit nicer for editors. For example, this is a schema for a simple text field for a title:

export default {
  name: 'title',
  type: 'string',
  title: 'Title',
  description: 'Titles should be short and descriptive'
}
Sanity Studio with a title field and an editor for Portable Text
The studio with our title field and the default editor (Large preview)

Portable Text is built on the idea of rich text as data. This is powerful because it lets you query your rich text, and convert it into pretty much any markup you want.

It is an array of objects called “blocks” which you can think of as the “paragraphs”. In a block, there is an array of children spans. Each block can have a style and a set of mark definitions, which describe data structures distributed on the children spans.

Sanity.io comes with an editor that can read and write to Portable Text, and is activated by placing the block type inside an array field, like this:

// speech.js
export default {
  name: 'speech',
  type: 'array',
  title: 'SSML Editor',
  of: [
    { type: 'block' }
  ]
}

An array can be of multiple types. For an SSML-editor, those could be blocks for audio files, but that falls outside of the scope of this tutorial.

The last thing we want to do is to add a content type where this editor can be used. Most assistants use a simple content model of “intents” and “fulfillments”:

  • Intents Usually a list of strings used by the AI model to delineate what the user wants to get done.
  • Fulfillments This happens when an “intent” is identified. A fulfillment often is — or at least — comes with some sort of response.

So let’s make a simple content type called fulfillment that use the speech synthesis editor. Make a new file called fulfillment.js and save it in the /schema folder:

// fulfillment.js
export default {
  name: 'fulfillment',
  type: 'document',
  title: 'Fulfillment',
  of: [
    {
      name: 'title',
      type: 'string',
      title: 'Title',
      description: 'Titles should be short and descriptive'
    },
    {
      name: 'response',
      type: 'speech'
    }
  ]
}

Save the file, and open schema.js. Add it to your studio like this:

// schema.js
import createSchema from 'part:@sanity/base/schema-creator'
import schemaTypes from 'all:part:@sanity/base/schema-type'
import fullfillment from './fullfillment'
import speech from './speech'

export default createSchema({
  name: 'default',
  types: schemaTypes.concat([
    fullfillment,
    speech,
  ])
})

If you now run sanity start in your command line interface within the project’s root folder, the studio will start up locally, and you’ll be able to add entries for fulfillments. You can keep the studio running while we go on, as it will auto-reload with new changes when you save the files.

Adding SSML To The Editor

By default, the block type will give you a standard editor for visually oriented rich text with heading styles, decorator styles for emphasis and strong, annotations for links, and lists. Now we want to override those with the audial concepts found in SSML.

We begin with defining the different content structures, with helpful descriptions for the editors, that we will add to the block in SSMLeditorSchema.js as configurations for annotations. Those are “emphasis”, “alias”, “prosody”, and “say as”.

Emphasis

We begin with “emphasis”, which controls how much weight is put on the marked text. We define it as a string with a list of predefined values that the user can choose from:

// emphasis.js
export default {
  name: 'emphasis',
  type: 'object',
  title: 'Emphasis',
  description:
    'The strength of the emphasis put on the contained text',
  fields: [
    {
      name: 'level',
      type: 'string',
      options: {
        list: [
          { value: 'strong', title: 'Strong' },
          { value: 'moderate', title: 'Moderate' },
          { value: 'none', title: 'None' },
          { value: 'reduced', title: 'Reduced' }
        ]
      }
    }
  ]
}

Alias

Sometimes the written and the spoken term differ. For instance, you want to use the abbreviation of a phrase in a written text, but have the whole phrase read aloud. For example:

<s>This is a <sub alias="Speech Synthesis Markup Language">SSML</sub> tutorial</s>
Press play to listen to the snippet:

The input field for the alias is a simple string:

// alias.js
export default {
  name: 'alias',
  type: 'object',
  title: 'Alias (sub)',
  description:
    'Replaces the contained text for pronunciation. This allows a document to contain both a spoken and written form.',
  fields: [
    {
      name: 'text',
      type: 'string',
      title: 'Replacement text',
    }
  ]
}

Prosody

With the prosody property we can control different aspects how text should be spoken, like pitch, rate, and volume. The markup for this can look like this:

<s>Say this with an <prosody pitch="x-low">extra low pitch</prosody>, and this <prosody rate="fast" volume="loud">loudly with a fast rate</prosody></s>
Press play to listen to the snippet:

This input will have three fields with predefined string options:

// prosody.js
export default {
  name: 'prosody',
  type: 'object',
  title: 'Prosody',
  description: 'Control of the pitch, speaking rate, and volume',
  fields: [
    {
      name: 'pitch',
      type: 'string',
      title: 'Pitch',
      description: 'The baseline pitch for the contained text',
      options: {
        list: [
          { value: 'x-low', title: 'Extra low' },
          { value: 'low', title: 'Low' },
          { value: 'medium', title: 'Medium' },
          { value: 'high', title: 'High' },
          { value: 'x-high', title: 'Extra high' },
          { value: 'default', title: 'Default' }
        ]
      }
    },
    {
      name: 'rate',
      type: 'string',
      title: 'Rate',
      description:
        'A change in the speaking rate for the contained text',
      options: {
        list: [
          { value: 'x-slow', title: 'Extra slow' },
          { value: 'slow', title: 'Slow' },
          { value: 'medium', title: 'Medium' },
          { value: 'fast', title: 'Fast' },
          { value: 'x-fast', title: 'Extra fast' },
          { value: 'default', title: 'Default' }
        ]
      }
    },
    {
      name: 'volume',
      type: 'string',
      title: 'Volume',
      description: 'The volume for the contained text.',
      options: {
        list: [
          { value: 'silent', title: 'Silent' },
          { value: 'x-soft', title: 'Extra soft' },
          { value: 'medium', title: 'Medium' },
          { value: 'loud', title: 'Loud' },
          { value: 'x-loud', title: 'Extra loud' },
          { value: 'default', title: 'Default' }
        ]
      }
    }
  ]
}

Say As

The last one we want to include is <say-as>. This tag lets us exercise a bit more control over how certain information is pronounced. We can even use it to bleep out words if you need to redact something in voice interfaces. That’s @!%&© useful!

<s>Do I have to <say-as interpret-as="expletive">frakking</say-as> <say-as interpret-as="verbatim">spell</say-as> it out for you!?</s>
Press play to listen to the snippet:
// sayAs.js
export default {
  name: 'sayAs',
  type: 'object',
  title: 'Say as...',
  description: 'Lets you indicate information about the type of text construct that is contained within the element. It also helps specify the level of detail for rendering
  the contained text.',
  fields: [
    {
      name: 'interpretAs',
      type: 'string',
      title: 'Interpret as...',
      options: {
        list: [
          { value: 'cardinal', title: 'Cardinal numbers' },
          {
            value: 'ordinal',
            title: 'Ordinal numbers (1st, 2nd, 3th...)'
          },
          { value: 'characters', title: 'Spell out characters' },
          { value: 'fraction', title: 'Say numbers as fractions' },
          { value: 'expletive', title: 'Blip out this word' },
          {
            value: 'unit',
            title: 'Adapt unit to singular or plural'
          },
          {
            value: 'verbatim',
            title: 'Spell out letter by letter (verbatim)'
          },
          { value: 'date', title: 'Say as a date' },
          { value: 'telephone', title: 'Say as a telephone number' }
        ]
      }
    },
    {
      name: 'date',
      type: 'object',
      title: 'Date',
      fields: [
        {
          name: 'format',
          type: 'string',
          description: 'The format attribute is a sequence of date field character codes. Supported field character codes in format are {y, m, d} for year, month, and day (of the month) respectively. If the field code appears once for year, month, or day then the number of digits expected are 4, 2, and 2 respectively. If the field code is repeated then the number of expected digits is the number of times the code is repeated. Fields in the date text may be separated by punctuation and/or spaces.'
        },
        {
          name: 'detail',
          type: 'number',
          validation: Rule =>
            Rule.required()
              .min(0)
              .max(2),
          description: 'The detail attribute controls the spoken form of the date. For detail='1' only the day fields and one of month or year fields are required, although both may be supplied'
        }
      ]
    }
  ]
}

Now we can import these in an annotations.js file, which makes things a bit tidier.

// annotations.js
export {default as alias} from './alias'
export {default as emphasis} from './emphasis'
export {default as prosody} from './prosody'
export {default as sayAs} from './sayAs'

Now we can import these annotation types into our main schemas:

// schema.js
import createSchema from "part:@sanity/base/schema-creator"
import schemaTypes from "all:part:@sanity/base/schema-type"
import fulfillment from './fulfillment'
import speech from './ssml-editor/speech'
import {
  alias,
  emphasis,
  prosody,
  sayAs
} from './annotations'

export default createSchema({
  name: "default",
  types: schemaTypes.concat([
    fulfillment,
    speech,
    alias,
    emphasis,
    prosody,
    sayAs
  ])
})

Finally, we can now add these to the editor like this:

// speech.js
export default {
  name: 'speech',
  type: 'array',
  title: 'SSML Editor',
  of: [
    {
      type: 'block',
      styles: [],
      lists: [],
      marks: {
        decorators: [],
        annotations: [
          {type: 'alias'},
          {type: 'emphasis'},
          {type: 'prosody'},
          {type: 'sayAs'}
        ]
      }
    }
  ]
}

Notice that we also added empty arrays to styles, and decorators. This disables the default styles and decorators (like bold and emphasis) since they don’t make that much sense in this specific case.

Customizing The Look And Feel

Now we have the functionality in place, but since we haven’t specified any icons, each annotation will use the default icon, which makes the editor hard to actually use for authors. So let’s fix that!

With the editor for Portable Text it’s possible to inject React components both for the icons and for how the marked text should be rendered. Here, we’ll just let some emoji do the work for us, but you could obviously go far with this, making them dynamic and so on. For prosody we’ll even make the icon change depending on the volume selected. Note that I omitted the fields in these snippets for brevity, you shouldn’t remove them in your local files.

// alias.js
import React from 'react'

export default {
  name: 'alias',
  type: 'object',
  title: 'Alias (sub)',
  description: 'Replaces the contained text for pronunciation. This allows a document to contain both a spoken and written form.',
  fields: [
    /* all the fields */
  ],
  blockEditor: {
    icon: () => '🔤',
    render: ({ children }) => <span>{children} 🔤</span>,
  },
};
// emphasis.js
import React from 'react'

export default {
  name: 'emphasis',
  type: 'object',
  title: 'Emphasis',
  description: 'The strength of the emphasis put on the contained text',
  fields: [
    /* all the fields */
  ],
  blockEditor: {
    icon: () => '🗯',
    render: ({ children }) => <span>{children} 🗯</span>,
  },
};

// prosody.js
import React from 'react'

export default {
  name: 'prosody',
  type: 'object',
  title: 'Prosody',
  description: 'Control of the pitch, speaking rate, and volume',
  fields: [
    /* all the fields */
  ],
  blockEditor: {
    icon: () => '🔊',
    render: ({ children, volume }) => (
      <span>
        {children} {['x-loud', 'loud'].includes(volume) ? '🔊' : '🔈'}
      </span>
    ),
  },
};
// sayAs.js
import React from 'react'

export default {
  name: 'sayAs',
  type: 'object',
  title: 'Say as...',
  description: 'Lets you indicate information about the type of text construct that is contained within the element. It also helps specify the level of detail for rendering the contained text.',
  fields: [
    /* all the fields */
  ],
  blockEditor: {
    icon: () => '🗣',
    render: props => <span>{props.children} 🗣</span>,
  },
};

The customized SSML editor
The editor with our custom SSML marks (Large preview)

Now you have an editor for editing text that can be used by voice assistants. But wouldn’t it be kinda useful if editors also could preview how the text actually will sound like?

Adding A Preview Button Using Google’s Text-to-Speech

Native speech synthesis support is actually on its way for browsers. But in this tutorial, we’ll use Google’s Text-to-Speech API which supports SSML. Building this preview functionality will also be a demonstration of how you serialize Portable Text into SSML in whatever service you want to use this for.

Wrapping The Editor In A React Component

We begin with opening the SSMLeditor.js file and add the following code:

// SSMLeditor.js
import React, { Fragment } from 'react';
import { BlockEditor } from 'part:@sanity/form-builder';

export default function SSMLeditor(props) {
  return (
    <Fragment>
      <BlockEditor {...props} />
    </Fragment>
  );
}

We have now wrapped the editor in our own React component. All the props it needs, including the data it contains, are passed down in real-time. To actually use this component, you have to import it into your speech.js file:

// speech.js
import React from 'react'
import SSMLeditor from './SSMLeditor.js'

export default {
  name: 'speech',
  type: 'array',
  title: 'SSML Editor',
  inputComponent: SSMLeditor,
  of: [
    {
      type: 'block',
      styles: [],
      lists: [],
      marks: {
        decorators: [],
        annotations: [
          { type: 'alias' },
          { type: 'emphasis' },
          { type: 'prosody' },
          { type: 'sayAs' },
        ],
      },
    },
  ],
}

When you save this and the studio reloads, it should look pretty much exactly the same, but that’s because we haven’t started tweaking the editor yet.

Convert Portable Text To SSML

The editor will save the content as Portable Text, an array of objects in JSON that makes it easy to convert rich text into whatever format you need it to be. When you convert Portable Text into another syntax or format, we call that “serialization”. Hence, “serializers” are the recipes for how the rich text should be converted. In this section, we will add serializers for speech synthesis.

You have already made the blocksToSSML.js file. Now we’ll need to add our first dependency. Begin by running the terminal command npm init -y inside the ssml-editor folder. This will add a package.json where the editor’s dependencies will be listed.

Once that’s done, you can run npm install @sanity/block-content-to-html to get a library that makes it easier to serialize Portable Text. We’re using the HTML-library because SSML has the same XML syntax with tags and attributes.

This is a bunch of code, so do feel free to copy-paste it. I’ll explain the pattern right below the snippet:

// blocksToSSML.js
import blocksToHTML, { h } from '@sanity/block-content-to-html'

const serializers = {
  marks: {
    prosody: ({ children, mark: { rate, pitch, volume } }) =>
      h('prosody', { attrs: { rate, pitch, volume } }, children),
    alias: ({ children, mark: { text } }) =>
      h('sub', { attrs: { alias: text } }, children),
    sayAs: ({ children, mark: { interpretAs } }) =>
      h('say-as', { attrs: { 'interpret-as': interpretAs } }, children),
    break: ({ children, mark: { time, strength } }) =>
      h('break', { attrs: { time: '${time}ms', strength } }, children),
    emphasis: ({ children, mark: { level } }) =>
      h('emphasis', { attrs: { level } }, children)
  }
}

export const blocksToSSML = blocks => blocksToHTML({ blocks, serializers })

This code will export a function that takes the array of blocks and loop through them. Whenever a block contains a mark, it will look for a serializer for the type. If you have marked some text to have emphasis, it this function from the serializers object:

emphasis: ({ children, mark: { level } }) =>
      h('emphasis', { attrs: { level } }, children)

Maybe you recognize the parameter from where we defined the schema? The h() function lets us defined an HTML element, that is, here we “cheat” and makes it return an SSML element called <emphasis>. We also give it the attribute level if that is defined, and place the children elements within it — which in most cases will be the text you have marked up with emphasis.

{
    "_type": "block",
    "_key": "f2c4cf1ab4e0",
    "style": "normal",
    "markDefs": [
        {
            "_type": "emphasis",
            "_key": "99b28ed3fa58",
            "level": "strong"
        }
    ],
    "children": [
        {
            "_type": "span",
            "_key": "f2c4cf1ab4e01",
            "text": "Say this strongly!",
            "marks": [
                "99b28ed3fa58"
            ]
        }
    ]
}

That is how the above structure in Portable Text gets serialized to this SSML:

<emphasis level="strong">Say this strongly</emphasis>

If you want support for more SSML tags, you can add more annotations in the schema, and add the annotation types to the marks section in the serializers.

Now we have a function that returns SSML markup from our marked up rich text. The last part is to make a button that lets us send this markup to a text-to-speech service.

Adding A Preview Button That Speaks Back To You

Ideally, we should have used the browser’s speech synthesis capabilities in the Web API. That way, we would have gotten away with less code and dependencies.

As of early 2019, however, native browser support for speech synthesis is still in its early stages. It looks like support for SSML is on the way, and there is proof of concepts of client-side JavaScript implementations for it.

Chances are that you are going to use this content with a voice assistant anyways. Both Google Assistant and Amazon Echo (Alexa) support SSML as responses in a fulfillment. In this tutorial, we will use Google’s text-to-speech API, which also sounds good and support several languages.

Start by obtaining an API key by signing up for Google Cloud Platform (it will be free for the first 1 million characters you process). Once you’re signed up, you can make a new API key on this page.

Now you can open your PreviewButton.js file, and add this code to it:

// PreviewButton.js
import React from 'react'
import Button from 'part:@sanity/components/buttons/default'
import { blocksToSSML } from './blocksToSSML'

// You should be careful with sharing this key
// I put it here to keep the code simple
const API_KEY = '<yourAPIkey>'
const GOOGLE_TEXT_TO_SPEECH_URL = 'https://texttospeech.googleapis.com/v1beta1/text:synthesize?key=' + API_KEY

const speak = async blocks => {
  // Serialize blocks to SSML
  const ssml = blocksToSSML(blocks)
  // Prepare the Google Text-to-Speech configuration
  const body = JSON.stringify({
    input: { ssml },
    // Select the language code and voice name (A-F)
    voice: { languageCode: 'en-US', name: 'en-US-Wavenet-A' },
    // Use MP3 in order to play in browser
    audioConfig: { audioEncoding: 'MP3' }
  })
  // Send the SSML string to the API
  const res = await fetch(GOOGLE_TEXT_TO_SPEECH_URL, {
    method: 'POST',
    body
  }).then(res => res.json())
  // Play the returned audio with the Browser’s Audo API
  const audio = new Audio('data:audio/wav;base64,' + res.audioContent)
  audio.play()
}

export default function PreviewButton (props) {
  return <Button style={{ marginTop: '1em' }} onClick={() => speak(props.blocks)}>Speak text</Button>
}

I’ve kept this preview button code to a minimal to make it easier to follow this tutorial. Of course, you could build it out by adding state to show if the preview is processing or make it possible to preview with the different voices that Google’s API supports.

Add the button to SSMLeditor.js:

// SSMLeditor.js
import React, { Fragment } from 'react';
import { BlockEditor } from 'part:@sanity/form-builder';
import PreviewButton from './PreviewButton';

export default function SSMLeditor(props) {
  return (
    <Fragment>
      <BlockEditor {...props} />
      <PreviewButton blocks={props.value} />
    </Fragment>
  );
}

Now you should be able to mark up your text with the different annotations, and hear the result when pushing “Speak text”. Cool, isn’t it?

You’ve Created A Speech Synthesis Editor, And Now What?

If you have followed this tutorial, you have been through how you can use the editor for Portable Text in Sanity Studio to make custom annotations and customize the editor. You can use these skills for all sorts of things, not only to make a speech synthesis editor. You have also been through how to serialize Portable Text into the syntax you need. Obviously, this is also handy if you’re building frontends in React or Vue. You can even use these skills to generate Markdown from Portable Text.

We haven’t covered how you actually use this together with a voice assistant. If you want to try, you can use much of the same logic as with the preview button in a serverless function, and set it as the API endpoint for a fulfillment using webhooks, e.g. with Dialogflow.

If you’d like me to write a tutorial on how to use the speech synthesis editor with a voice assistant, feel free to give me a hint on Twitter or share in the comments section below.

Further Reading on SmashingMag:

Smashing Editorial (dm, ra, yk, il)