Many sites with lots of written content employ specially crafted print style sheets. That way, a user can print out the relevant content without wasting paper on navigation, ads, or anything else not germane.
I thought I would share how I went about creating Articulate.js with the hopes that it could provide the reader with ideas on how to apply this technology in different ways.
Articulate.js uses the Speech Synthesis interface of the Web Speech API. It is currently supported in all the major browsers, including the latest versions of Edge, Safari, Chrome, Opera, Firefox, iOS Safari, and Chrome for Android.
There are two
window objects of the Speech Synthesis interface that are used to enable the browser to speak:
SpeechSynthesisUtterance. The first step is to create an instance of the
SpeechSynthesisUtterance object and designate the text you would like spoken. If desired, you can set additional properties such as the rate, pitch, volume, and voice.
To begin speaking, this object is passed as a parameter to the
SpeechSynthesis.speak() method. Other playback functionality, such as pausing, resuming, and cancelling, are all methods of the
SpeechSynthesis object. A useful demo from Microsoft lets you play around with these features.
At the end of this article, there are many resources listed that walk you through the intricacies of this functionality and provide additional examples.
The lightweight Articulate.js plugin allows you to leverage the powerful selector options of jQuery to specify which parts of the website to speak. For example, depending on how the page is organized, a single line of code, like the following, can direct the browser to speak the entire contents of an article or blog post:
Here’s an example that targets only the primary headers and paragraphs:
$('h1, h2, p').articulate('speak');
Internally, Articulate.js clones the matched set of elements and all their descendant elements and text nodes. It then parses this clone using a default set of rules, deciding what should be spoken and ignored, then adding the appropriate pauses to make everything sound more like a narrative.
These are the basic methods along with a CodePen example:
||Speaks aloud the specified DOM element(s) and their descendants|
||Pauses the speaking|
||Resumes the speaking after it has been paused|
||Stops the speaking permanently|
You can only have one
SpeechSynthesisUtterance instance active at a time, which is why a jQuery selector is not needed for pausing, resuming, or stopping. As mentioned before, these methods act upon the
Also, the browser will only stop speaking when there’s no more text to be read or when a “stop” call is executed. If the speaking is paused, it must be resumed or stopped before anything else can be spoken.
Adjusting the rate, pitch, and volume can be done. By using an input slider, adjustments can give the user some added control. While the system default rate is 1, after much testing, I bumped it up slightly to 1.1 as that seemed to provide a more natural speaking speed. That’s subjective, of course, and can be overridden.
||Sets the rate of the speaking voice
Default = 1.1
Range = [0.1 – 10]
||Sets the pitch of the speaking voice
Default = 1.0
Range = [0 – 2]
||Sets the volume of the speaking voice
Default = 1.0
Range = [0 – 1]
There are many more options available to the developer, but let’s talk for a moment about what happens under the hood when Articulate.js is asked to speak something on the page.
The Speech Synthesis interface that Articulate.js leverages will read aloud, in a most literal fashion, any string of text it’s provided. Some symbols it will enunciate (e.g. it will say “percent” when it encounters “%”); others, it will ignore (e.g. the quote symbol is left unspoken). Its cadence is dictated primarily by commas, which elicit a small pause, and periods, whose pause is slightly longer.
With that in mind, quite a bit of manipulation is needed to prepare a web page for speaking. Unfortunately, one simply can’t concatenate all the selected text nodes in the DOM as that would result in a lot of run-on text (e.g. lists), include content that isn’t appropriate for reading aloud in a coherent fashion (e.g. tables), and ignore items that should be described (e.g. images).
Articulate.js handles this by applying, among others, the following rules to the DOM elements specified in the jQuery selector:
- Delete HTML tags that may contain text nodes, but shouldn’t be spoken, such as
<s>. A list of 21 tags are designated to be ignored as the default.
- Find instances of
<br>tags and append each with either a period or comma. This is to ensure that a pause occurs when spoken since these elements are often visually represented without punctuation.
- Insert descriptive text gathered from the
altattributes of images,
<caption>tags from tables, and
<figcaption>tags from figures.
After this is completed, what’s left is converted to a long text string that now requires further manipulation, including:
<q>tags and pairs of smart quotes and insert the text “quote” at the start and “unquote” at the end to distinguish them when spoken.
- Add starting and ending text to designate lists and block quotes.
- Find em dashes and insert a comma in its place to elicit a short pause.
- Remove remaining HTML tags and comments.
- Remove remaining line breaks and carriage returns as well as lingering HTML special characters.
At this point, the string of text is ready to be sent to the Speech Synthesis interface to be spoken by the browser. If you were to look at this string, you would see instances of multiple periods, commas, and spaces — that’s OK — as it won’t affect how it sounds. That is, one or more commas or periods grouped together won’t create even longer pauses.
- Specify HTML tags to be spoken that would otherwise be ignored, and vice versa.
- Perform a search and replace within the text, which is helpful for abbreviations. For example, you can specify that all instances of “i.e.” to be spoken as “that is”.
- Specify blocks of text to be ignored. For example, a sentence that reads “click here for more information” does not need to be spoken.
- Specify words to be spelled out.
- Specify copy in specially crafted comment tags to be spoken that is otherwise hidden on the screen.
You will notice that the Speech Synthesis interface is subtly different across browsers and operating systems. For example, the default rate of speech will sound somewhat faster on an iPhone as opposed to its desktop implementations. Developers can provide input sliders or radio buttons for users to fine-tune their experience.
In addition, depending on the operating system and device, browsers expose different voices to the Speech Synthesis interface. As seen in the demo from Microsoft mentioned earlier, these voices can be selected to override the default “native” voice. But, for simplicity sake, Articulate.js only uses the default voice — later versions will allow that parameter to be modified as well.
The inspiration for Articulate.js came from the idea that with a simple click, I can enjoy having articles read to me when it’s not convenient or desirable to be staring at a screen — particularly when using my phone. Maybe when lying in the park with my eyes closed or while I’m preoccupied with preparing dinner. The goal was to allow developers to make the appropriate customizations so that it sounds less like a screen reader and more like a friend is reading the web page to you.
Articulate.js can be used as a voice option for anything on the web page, from enunciating a single word to conveying content not displayed on the screen. If you’re interested, download the source code and experiment. And, most importantly, have fun with it!
The commented source code and minified versions of Articulate.js can be downloaded at its Github home. Full documentation can be found there as well.