{"id":298334,"date":"2019-11-12T07:47:14","date_gmt":"2019-11-12T14:47:14","guid":{"rendered":"https:\/\/css-tricks.com\/?p=298334"},"modified":"2021-01-27T07:06:11","modified_gmt":"2021-01-27T15:06:11","slug":"making-an-audio-waveform-visualizer-with-vanilla-javascript","status":"publish","type":"post","link":"https:\/\/css-tricks.com\/making-an-audio-waveform-visualizer-with-vanilla-javascript\/","title":{"rendered":"Making an Audio Waveform Visualizer with Vanilla JavaScript"},"content":{"rendered":"\n

As a UI<\/abbr> designer, I\u2019m constantly reminded of the value of knowing how to code. I pride myself on thinking of the developers on my team while designing user interfaces. But sometimes, I step on a technical landmine.<\/p>\n\n\n\n

A few years ago, as the design director of wsj.com, I was helping to re-design the Wall Street Journal\u2019s podcast directory<\/a>. One of the designers on the project was working on the podcast player, and I came upon Megaphone\u2019s embedded player.<\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n

I previously worked at SoundCloud and knew that these kinds of visualizations were useful for users who skip through audio. I wondered if we could achieve a similar look for the player on the Wall Street Journal\u2019s site.<\/p>\n\n\n\n

The answer from engineering: definitely not. Given timelines and restraints, it wasn\u2019t a possibility for that project. We eventually shipped the redesigned pages with a much simpler podcast player.<\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n

But I was hooked on the problem. Over nights and weekends, I hacked away trying to achieve this effect. I learned a lot about how audio works on the web, and ultimately was able to achieve the look with less than 100 lines of JavaScript!<\/p>\n\n\n\n

It turns out that this example is a perfect way to get acquainted with the Web Audio API, and how to visualize audio data using the Canvas API.<\/p>\n\n\n\n\n\n\n

But first, a lesson in how digital audio works<\/h3>\n\n\n

In the real, analog world, sound is a wave. As sound travels from a source (like a speaker) to your ears, it compresses and decompresses air in a pattern that your ears and brain hear as music, or speech, or a dog\u2019s bark, etc. etc.<\/p>\n\n\n\n

\"\"
An analog sound wave is a smooth, continuous function.<\/figcaption><\/figure>\n\n\n\n

But in a computer\u2019s world of electronic signals, sound isn\u2019t a wave. To turn a smooth, continuous wave into data that it can store, computers do something called sampling<\/em>. Sampling means measuring the sound waves hitting a microphone thousands of times every second, then storing those data points. When playing back audio, your computer reverses the process: it recreates the sound, one tiny split-second of audio at a time.<\/p>\n\n\n\n

\"\"
A digital sound file is made up of tiny slices of the original audio, roughly re-creating the smooth continuous wave.<\/figcaption><\/figure>\n\n\n\n

The number of data points in a sound file depends on its sample rate<\/em>. You might have seen this number before; the typical sample rate for mp3 files is 44.1 kHz. This means that, for every second of audio, there are 44,100 individual data points. For stereo files, there are 88,200 every second \u2014 44,100 for the left channel, and 44,100 for the right. That means a 30-minute podcast has 158,760,000 individual data points describing the audio!<\/p>\n\n\n

How can a web page read an mp3?<\/h3>\n\n\n

Over the past nine years, the W3C (the folks who help maintain web standards) have developed the Web Audio API<\/a> to help web developers work with audio. The Web Audio API is a very deep topic; we\u2019ll hardly crack the surface in this essay. But it all starts with something called the AudioContext.<\/p>\n\n\n\n

Think of the AudioContext like a sandbox for working with audio. We can initialize it with a few lines of JavaScript:<\/p>\n\n\n\n

\/\/ Set up audio context\nwindow.AudioContext = window.AudioContext || window.webkitAudioContext;\nconst audioContext = new AudioContext();\nlet currentBuffer = null;<\/code><\/pre>\n\n\n\n

The first line after the comment is a necessary because Safari has implemented AudioContext as webkitAudioContext<\/code>.<\/p>\n\n\n\n

Next, we need to give our new audioContext<\/code> the mp3 file we\u2019d like to visualize. Let\u2019s fetch it using\u2026 fetch()<\/code>!<\/p>\n\n\n\n

const visualizeAudio = url => {\n  fetch(url)\n    .then(response => response.arrayBuffer())\n    .then(arrayBuffer => audioContext.decodeAudioData(arrayBuffer))\n    .then(audioBuffer => visualize(audioBuffer));\n};<\/code><\/pre>\n\n\n\n

This function takes a URL, fetches it, then transforms the Response<\/code> object a few times.<\/p>\n\n\n\n

  • First, it calls the arrayBuffer()<\/code> method, which returns \u2014 you guessed it \u2014 an ArrayBuffer<\/code>! An ArrayBuffer<\/code> is just a container for binary data; it\u2019s an efficient way to move lots of data around in JavaScript.<\/li>
  • We then send the ArrayBuffer<\/code> to our audioContext<\/code> via the decodeAudioData()<\/code> method. decodeAudioData()<\/code> takes an ArrayBuffer<\/code> and returns an AudioBuffer<\/code>, which is a specialized ArrayBuffer<\/code> for reading audio data. Did you know that browsers came with all these convenient objects? I definitely did not when I started this project.<\/li>
  • Finally, we send our AudioBuffer<\/code> off to be visualized.<\/li><\/ul>\n\n\n

    Filtering the data<\/h3>\n\n\n

    To visualize our AudioBuffer<\/code>, we need to reduce the amount of data we\u2019re working with. Like I mentioned before, we started off with millions of data points, but we\u2019ll have far fewer in our final visualization.<\/p>\n\n\n\n

    First, let\u2019s limit the channels<\/em> we are working with. A channel represents the audio sent to an individual speaker. In stereo sound, there are two channels; in 5.1 surround sound, there are six. AudioBuffer<\/code> has a built-in method to do this: getChannelData()<\/code>. Call audioBuffer.getChannelData(0)<\/code>, and we\u2019ll be left with one channel\u2019s worth of data.<\/p>\n\n\n\n

    Next, the hard part: loop through the channel\u2019s data, and select a smaller set of data points. There are a few ways we could go about this. Let\u2019s say I want my final visualization to have 70 bars; I can divide up the audio data into 70 equal parts, and look at a data point from each one.<\/p>\n\n\n\n

    const filterData = audioBuffer => {\n  const rawData = audioBuffer.getChannelData(0); \/\/ We only need to work with one channel of data\n  const samples = 70; \/\/ Number of samples we want to have in our final data set\n  const blockSize = Math.floor(rawData.length \/ samples); \/\/ Number of samples in each subdivision\n  const filteredData = [];\n  for (let i = 0; i < samples; i++) {\n    filteredData.push(rawData[i * blockSize]); \n  }\n  return filteredData;\n}<\/code><\/pre>\n\n\n\n
    \"\"<\/figure>\n\n\n\n
    \"\"
    This was the first approach I took. To get an idea of what the filtered data looks like, I put the result into a spreadsheet and charted it.<\/figcaption><\/figure>\n\n\n\n

    The output caught me off guard! It doesn\u2019t look like the visualization we\u2019re emulating at all. There are lots of data points that are close to, or at zero. But that makes a lot of sense: in a podcast, there is a lot of silence between words and sentences. By only looking at the first sample in each of our blocks, it\u2019s highly likely that we\u2019ll catch a very quiet moment.<\/p>\n\n\n\n

    Let\u2019s modify the algorithm to find the average<\/em> of the samples. And while we\u2019re at it, we should take the absolute value of our data, so that it\u2019s all positive.<\/p>\n\n\n\n

    const filterData = audioBuffer => {\n  const rawData = audioBuffer.getChannelData(0); \/\/ We only need to work with one channel of data\n  const samples = 70; \/\/ Number of samples we want to have in our final data set\n  const blockSize = Math.floor(rawData.length \/ samples); \/\/ the number of samples in each subdivision\n  const filteredData = [];\n  for (let i = 0; i < samples; i++) {\n    let blockStart = blockSize * i; \/\/ the location of the first sample in the block\n    let sum = 0;\n    for (let j = 0; j < blockSize; j++) {\n      sum = sum + Math.abs(rawData[blockStart + j]) \/\/ find the sum of all the samples in the block\n    }\n    filteredData.push(sum \/ blockSize); \/\/ divide the sum by the block size to get the average\n  }\n  return filteredData;\n}<\/code><\/pre>\n\n\n\n

    Let\u2019s see what that data looks like.<\/p>\n\n\n\n

    \"\"<\/figure>\n\n\n\n

    This is great. There\u2019s only one thing left to do: because we have so much silence in the audio file, the resulting averages of the data points are very small. To make sure this visualization works for all audio files, we need to normalize<\/em> the data; that is, change the scale of the data so that the loudest samples measure as 1.<\/p>\n\n\n\n

    const normalizeData = filteredData => {\n  const multiplier = Math.pow(Math.max(...filteredData), -1);\n  return filteredData.map(n => n * multiplier);\n}<\/code><\/pre>\n\n\n\n

    This function finds the largest data point in the array with Math.max()<\/code>, takes its inverse with Math.pow(n, -1)<\/code>, and multiplies each value in the array by that number. This guarantees that the largest data point will be set to 1, and the rest of the data will scale proportionally.<\/p>\n\n\n\n

    Now that we have the right data, let\u2019s write the function that will visualize it.<\/p>\n\n\n

    Visualizing the data<\/h3>\n\n\n

    To create the visualization, we\u2019ll be using the JavaScript Canvas API. This API draws graphics into an HTML <\/canvas><\/code> element. The first step to using the Canvas API is similar to the Web Audio API.<\/p>\n\n\n\n

    const draw = normalizedData => {\n  \/\/ Set up the canvas\n  const canvas = document.querySelector(\"canvas\");\n  const dpr = window.devicePixelRatio || 1;\n  const padding = 20;\n  canvas.width = canvas.offsetWidth * dpr;\n  canvas.height = (canvas.offsetHeight + padding * 2) * dpr;\n  const ctx = canvas.getContext(\"2d\");\n  ctx.scale(dpr, dpr);\n  ctx.translate(0, canvas.offsetHeight \/ 2 + padding); \/\/ Set Y = 0 to be in the middle of the canvas\n};<\/code><\/pre>\n\n\n\n

    This code finds the <canvas><\/code> element on the page, and checks the browser\u2019s pixel ratio (essentially the screen\u2019s resolution) to make sure our graphic will be drawn at the right size. We then get the context of the canvas (its individual set of methods and values). We calculate the pixel dimensions of the canvas, factoring in the pixel ratio and adding in some padding. Lastly, we change the coordinates system of the <canvas><\/code>, by default (0,0) is in the top-left of the box, but we can save ourselves a lot of math by setting (0, 0) to be in the middle of the left edge.<\/p>\n\n\n\n

    \"\"<\/figure>\n\n\n\n

    Now let\u2019s draw some lines! First, we\u2019ll create a function that will draw an individual segment.<\/p>\n\n\n\n

    const drawLineSegment = (ctx, x, y, width, isEven) => {\n  ctx.lineWidth = 1; \/\/ how thick the line is\n  ctx.strokeStyle = \"#fff\"; \/\/ what color our line is\n  ctx.beginPath();\n  y = isEven ? y : -y;\n  ctx.moveTo(x, 0);\n  ctx.lineTo(x, y);\n  ctx.arc(x + width \/ 2, y, width \/ 2, Math.PI, 0, isEven);\n  ctx.lineTo(x + width, 0);\n  ctx.stroke();\n};<\/code><\/pre>\n\n\n\n

    The Canvas API uses an concept called \u201cturtle graphics.\u201d Imagine that the code is a set of instructions being given to a turtle with a marker. In basic terms, the drawLineSegment()<\/code> function works as follows:<\/p>\n\n\n\n

    1. Start at the center line, x = 0<\/code>.<\/li>
    2. Draw a vertical line. Make the height of the line relative to the data.<\/li>
    3. Draw a half-circle the width of the segment.<\/li>
    4. Draw a vertical line back to the center line.<\/li><\/ol>\n\n\n\n

      Most of the commands are straightforward: ctx.moveTo()<\/code> and ctx.lineTo()<\/code> move the turtle to the specified coordinate, without drawing or while drawing, respectively.<\/p>\n\n\n\n

      Line 5, y = isEven ? -y : y<\/code>, tells our turtle whether to draw down or up from the center line. The segments alternate between being above and below the center line so that they form a smooth wave. In the world of the Canvas API, negative y<\/var> values are further up than positive ones.<\/strong> This is a bit counter-intuitive, so keep it in mind as a possible source of bugs.<\/p>\n\n\n\n

      On line 8, we draw a half-circle. ctx.arc()<\/code> takes six parameters:<\/p>\n\n\n\n

      • The x<\/var> and y<\/var> coordinates of the center of the circle<\/li>
      • The radius of the circle<\/li>
      • The place in the circle to start drawing (Math.PI<\/code> or \u03c0<\/var> is the location, in radians, of 9 o\u2019clock)<\/li>
      • The place in the circle to finish drawing (0<\/code> in radians represents 3 o\u2019clock)<\/li>
      • A boolean value telling our turtle to draw either counterclockwise (if true<\/code>) or clockwise (if false<\/code>). Using isEven<\/code> in this last argument means that we\u2019ll draw the top half of a circle \u2014 clockwise from 9 o\u2019clock to 3 clock \u2014 for even-numbered segments, and the bottom half for odd-numbered segments.<\/li><\/ul>\n\n\n\n
        \"\"<\/figure>\n\n\n\n

        OK, back to the draw()<\/code> function.<\/p>\n\n\n\n

        const draw = normalizedData => {\n  \/\/ Set up the canvas\n  const canvas = document.querySelector(\"canvas\");\n  const dpr = window.devicePixelRatio || 1;\n  const padding = 20;\n  canvas.width = canvas.offsetWidth * dpr;\n  canvas.height = (canvas.offsetHeight + padding * 2) * dpr;\n  const ctx = canvas.getContext(\"2d\");\n  ctx.scale(dpr, dpr);\n  ctx.translate(0, canvas.offsetHeight \/ 2 + padding); \/\/ Set Y = 0 to be in the middle of the canvas\n\n  \/\/ draw the line segments\n  const width = canvas.offsetWidth \/ normalizedData.length;\n  for (let i = 0; i < normalizedData.length; i++) {\n    const x = width * i;\n    let height = normalizedData[i] * canvas.offsetHeight - padding;\n    if (height < 0) {\n        height = 0;\n    } else if (height > canvas.offsetHeight \/ 2) {\n        height = height > canvas.offsetHeight \/ 2;\n    }\n    drawLineSegment(ctx, x, height, width, (i + 1) % 2);\n  }\n};<\/code><\/pre>\n\n\n\n

        After our previous setup code, we need to calculate the pixel width of each line segment. This is the canvas\u2019s on-screen width, divided by the number of segments we\u2019d like to display.<\/p>\n\n\n\n

        Then, a for-loop goes through each entry in the array and draws a line segment using the function we defined earlier. We set the x value to the current iteration\u2019s index, times the segment width. height, the desired height of the segment comes from multiplying our normalized data by the canvas\u2019s height, minus the padding we set earlier. We check a few cases: subtracting the padding might have pushed height<\/code> into the negative, so we re-set that to zero. If the height of the segment will result in a line being drawn off the top of the canvas, we re-set the height to a maximum value.<\/p>\n\n\n\n

        We pass in the segment width, and for the isEven<\/code> value, we use a neat trick: (i + 1) % 2<\/code> means \u201cfind the remainder of i + 1<\/code> divided by 2.\u201d We check i + 1<\/code> because our counter starts at 0. If i + 1<\/code> is even, its remainder will be zero (or false). If i<\/code> is odd, its remainder will be 1<\/code> or true.<\/p>\n\n\n\n

        And that\u2019s all she wrote. Let\u2019s put it all together. Here\u2019s the whole script, in all its glory.<\/p>\n\n\n\n