The Future of Voice Search: A Text to Voice Software Experiment

apple headphones on laptop

Should you record an audio version of each piece of content you produce? 

As voice search is expected to make up 50% of all web searches by 2020, I wanted to set out and understand what this means for the future of SEO and content marketing. 

While it’s still a fairly new frontier, this Google Home case study revealed that many of the same factors that rank for traditional searches are helping websites show up in audio searches.

For example, websites that load quickly get the upper hand in both voice and traditional searches. But, there are differences in voice searches that brands can take advantage of now.

In this post, I’ll discuss the importance of voice search, how companies are using audio today and analyze three different AI software tools that can be used to convert text to audio. 

Looking at the Future of Voice Search

guy broadcasting on microphone wearing poop emoji hat

The way people record, consume, and search for content has changed. One example of this is the rapid growth of podcast listenership. The industry has grown so much that ad revenue from podcasts is expected to reach $1.6 billion by 2022.

Now, anyone can start, record or listen to a podcast from anywhere (Yes, even this guy above wearing a 💩 emoji hat.)

The trend toward audio content has to do with the freedom and convenience it offers. Audio lets consumers multitask in a way that reading doesn’t allow. Users can commute, work, or exercise all while listening to their favorite audio content. As podcasts and audiobooks continue to grow, so does voice search.

Voice search lets people talk to a device instead of typing their questions into a search engine. Apple’s Siri, Microsoft Cortana, Amazon Alexa, and Google Assistant all allow users to search through voice.

By 2023, it’s predicted that 8 billion digital voice assistants like the ones mentioned above will be in use.

In fact, I’m reminded of a comment I saw while I was scrolling through LinkedIn the other day. Someone made a comment that they had to change their Alexa safe word because their toddler had figured it out. The same way kids have learned to swipe through iPads, they’re learning to use voice assistants. It’s become a part of our daily lives.

You can expect to see audio making a greater impact on SEO as more people use voice search and search engines reward content that’s optimized for it.

How Companies are Currently Using Text to Voice Software

Companies are already utilizing text to voice software in a variety of ways to improve the user experience. Applications range from voice-activated customer service to instructional tutorials.

For example, GoAnimate (now known as Vyond) is a web-based, video creation platform that lets its users make videos with drag-and-drop tools. These videos often rely on a narrator and GoAnimate uses Amazon Polly to let their users create voices in multiple languages, tones, and accents. The text to voice software turns text into a lifelike voiceover for the videos.

Previously, GoAnimate was spending $2,000 per month on text to voice software. The switch to Amazon Polly let them to cut their costs to $100 per month for similar software. With this in mind for my experiment, I decided to look for text to speech options that were affordable for any size business.

About the Text to Voice Software Experiment

While you could record your own content for this experiment, I wanted to test out some of the AI text to voice software that is already out there. I started by picking the same article – in this case my recent content mill experiment post – to convert to audio with three different platforms.

At 16,000 characters and 2,800+ words, it’s easy to scan with short paragraphs and has a conversational tone that should translate well to audio. It also includes numbers, business names, and abbreviations like USD that can really put the conversion software to the test.

I used this same post and converted it to audio through three popular services:

  • Amazon Polly
  • Natural Reader
  • IBM Watson 

I used eight metrics to grade each one’s performance:

Audio Quality

  • Was the speaker delivery human sounding v. robotic? 
  • Were the voices interesting and natural-sounding?

Language/Accent

  • How many languages were available? 
  • Were there different dialects and accents?

Customization

  • Could the playback speed be adjusted? 
  • Can you customize pronunciation or add lexicons?

Flexibility

  • Could the audio be converted to different file formats or just an MP3? 
  • Could it be shared as a link? Embedded directly in WordPress? 

Ease of Use

  • How easy was it to get started and convert the text to audio? 
  • Was the interface simple to navigate or was it made for developers?

Value for Money

  • Given the cost, what level of quality was the service and resulting audio?

Scalability

  • How easy is it to scale up the amount of content? Is the service still a good value at a larger scale?

Support

  • Were there informational forums, resources, and tech support available? Were these easy to access or only available to premium members?

Each metric was graded on a scale of 1 (terrible) – 5 (amazing). Then, all the scores were weighted for a cumulative score.

Analyzing the Results

The software are ordered below from lowest to highest-scoring.

Each one was tested at the free or personal use level to give you an understanding of what’s possible with a limited budget for each service.

IBM Watson Text to Speech

  • Audio Quality – 4/5
  • Language/Accent – 2/5
  • Customization – 3/5
  • Flexibility – 4/5
  • Ease of Use – 1/5
  • Value for Money – 3/5
  • Scalability – 4/5
  • Support – 3/5

Cumulative score: 3.00

This software has a Lite pricing plan that’s free for up to 10,000 characters per month. After 10,000 characters (nearly the full amount of my blog post) it costs $.02 USD per thousand characters. (Not a bad deal!) 

Once you enter the paid tiers, you get access to customization capabilities like the ability to personalize the voice to make it sound more natural. However, the languages were limited to English, French, Italian, Spanish, Japanese, German and Portuguese. For comparison, Amazon Polly offers 28 languages.

The IBM Cloud does give you access to a broader catalog of services that developers will find useful. You can even do the opposite of what we’re working on in this experiment and use IBM to convert audio to text.

Getting started with IBM Watson requires a bit of coding. The dashboard itself is user-friendly. It keeps your support cases, apps, and services all in one, clean interface. However, converting to audio can be complex and requires an understanding of web development.

The text to speech demo is helpful here and lets you test different voices. This Medium post also gives a round-up of IBM Watson voice examples that you can listen to.

Though IBM wasn’t the most user-friendly or straightforward with its pricing, it did offer a wide variety of services and customization options. However, I found that it wasn’t intuitive and someone with a developer background would do better with this platform. 

Recommended IBM Watson English, US voices: Lisa and Michael

Amazon Polly

  • Audio Quality – 5/5
  • Language/Accent – 4/5
  • Customization – 3/5
  • Flexibility – 2/5
  • Ease of Use – 2/5
  • Value for Money – 3/5
  • Scalability – 3/5
  • Support – 4/5

Cumulative score: 3.25

My experience with Amazon Polly was a bit rough at the start. When you sign up for Amazon Polly, you have to sign up through your Amazon Web Services (AWS) account NOT your Amazon account. 

Even though I planned on using the free tier only for this experiment, I had to give personal information and my credit card number to get started.

Once I signed in to the console, it was clear that this site is geared toward developers. Like IBM Watson, Amazon Polly is part of a larger catalog. The AWS Management Console had Amazon Polly located in the Machine Learning section, but included many additional categories.

Once you go into Amazon Polly, you can only listen and download a small amount of plain text to convert into audio up to 3,000 characters at a time. If the text is any larger, like my 16,000 character blog post, you have to save the file to an S3 bucket. This is a task that may prove challenging to non-developers.

Pro Tip: You have to pay to store items in S3 buckets.

For this experiment, we shortened the length of the text in our post. In addition, the free tier is only available for 12 months, so it will begin charging after a year of use. It’s similar to how Amazon Prime trials are designed to get you hooked before charging your card.

As far as audio quality goes, Amazon Polly is one of the best that I tested. In addition to having 28 languages available, Amazon Polly offers many different accents. You choose the voice by language and region. For English, you can pick from Australian, British, Indian, US, and Welsh regional accents. This can come in handy for companies that need to reach English speaking audiences in both America and the UK. They can use the accent necessary for each region.

These were the two voices chosen for this experiment:

Overall, Amazon Polly had multiple customization options and resources, but the complicated pricing and difficulty of use hurt its score in the experiment.

Recommended Amazon Polly English, US voices: Joanna and Matthew

Natural Reader

  • Audio Quality – 3/5
  • Language/Accent – 2/5
  • Customization – 3/5
  • Flexibility – 4/5
  • Ease of Use – 5/5
  • Value for Money – 5/5
  • Scalability – 4/5
  • Support – 4/5

Cumulative score: 3.75

Natural Reader was by far the easiest software to use. Not only did it have a free personal use option, but you don’t have to be logged on to use their software online. You can also install Natural Reader on your browser.

A nice added bonus is the ability to toggle the font to Dyslexic Font. With an estimated 20% of the population experiencing symptoms of dyslexia, this can be a helpful reading aid. This feature isn’t available with the other services.

Natural Reader is simple to use. Just paste, play, and download. You can even upload documents to convert to audio in pdf, pages, png, or jpg format with a premium account. Many of the free voices sounded robotic, but premium and commercial accounts can access premium voices or intelligent AI voices.

You can listen with premium voices at any time, but you can only upload an MP3 file if you have a premium account. You can even alter the speed of the voices, but you need a paid account to modify pronunciation.

Though Natural Readers was the most user friendly of the software in the experiment, it’s customization options were more limited. It had up to 10 languages for personal or premium use, but there were multiple accents for each language. Also, the service is limited to creating audio files and converting text into dyslexic-friendly font. This can be great if this is the only service you’re looking for, but developers will find the full catalog with AWS or IBM more suited to their needs.

Recommended Natural Reader English, US Free voice: Samantha
Audio file

Hire a voice actor or narrate the content yourself

If you prefer not to use software to convert text to audio, you have other options. You can hire a voice actor on a site like Fiverr. However, you can run into quality issues with cheap freelance sites like I discovered in my content mill experiment.

You can also record the audio yourself. This would give you more control of the audio, but it can also be time consuming and expensive if you need to buy audio equipment. This post from Dan over at TropicalMBA shares all you need to consider when recording audio or podcasts. 

Optimizing your content for voice search 

Based off this experiment, here are some ways to optimize your content for voice search.

Focus on the speed of your site

If your site is slow, then it’s unlikely to rank. This applies to traditional searches and voice searches. The average voice search result loads in just 4.6 seconds. That’s 52% faster than the average web page in a traditional search.

Write in a conversational tone with simple sentences

It helps to read the copy out loud once its written so you can tell if it sounds natural as an audio piece. Keep sentences short and concise. Write so Google or Amazon can take pieces from the content to answer questions.

Add audio recordings to blog posts

Use one of the platforms from the experiment to convert your blog posts into audio files.

Mark Manson is an excellent example of how a blogger can take advantage of audio to promote their work. Mark’s audio versions of his articles can be listened to as podcast episodes.

SoundCloud, like Mark uses above, makes it easy to embed audio files in your web page or blog. Depending on where you website is hosted, you can also use WordPress plugins or CMS plugins to embed the files.

Add audio recordings to FAQ pages 

FAQ pages are great for audio. This is because they contain questions that users are likely to search for when looking for product or service information.

For example, an email marketing SaaS company could optimize their FAQ section as well as Knowledgebase articles. Questions should be shortened and written the way someone would search for them by voice.

Instead of:

  1. When switching to the new email marketing software how do you import an email subscriber list from the old email marketing platform? 

Use this:

  1. How do I import emails to the new marketing software?

The second question is concise and more likely to show up in search results. The first question is unnatural and stuffed with keywords. It’s unlikely to sound like what a real person would ask in a voice search. Say them both out loud and hear the difference for yourself.

Conclusion 

As voice search continues to grow in popularity, websites that optimize their content for voice  today may have a leg up on their competition. 

A lot of these optimizations are just good writing practices in general such as writing in a conversational tone, using simple sentences and less technical or marketing jargon.

You can take the first steps by starting with your most popular content or some of your FAQs, and convert it to audio files and optimize it for voice search.

Jessica Malnik works with B2B SaaS and professional service firms to build marketing moat that compound over time using her signature content framework. As both a strategist and executor, she helps clients develop strategic content marketing roadmaps, scale content production, and provide guidance on campaigns and individual pieces.
Subscribe to the Newsletter
Join over 2,200 subscribers who receive tips on remote leadership and improving their content marketing strategy.