What’s different in testing a voice project, and how can you make it go smoothly?
With 40% of UK homes expected to have an Amazon Echo by 2019, brands are looking for new ways to engage with their audiences through this rapidly growing channel. Designing rewarding and engaging experiences for voice has a lot in common with ‘traditional’ digital projects like web and mobile, but there are key differences as well.
We’ve worked with clients in a range of industries on voice projects, including the BBC, Channel 4, Honeywell, and a major UK bank. Most of our clients are new to voice, and often look to us to find out how voice experiences differ not only for the user, but for their organisation as well. Key challenges and considerations include:
But, in our experience one of the biggest challenges with voice projects is user acceptance testing. Our clients are often unsure what to expect, what to look out for, and the best way to test out the experiences we build.
“One of the biggest changes our clients encounter with voice projects is user acceptance testing. ”
So, for anyone new to voice skills, this article sets out to provide some of our best tips to help you succeed more quickly in the testing stages of your project.
If you don’t have a functional skill ready to test, but still want to get a feel for it, why not give it a try on one of our most recent projects, Channel 4’s Human Test for Alexa and Google Assistant.
Both Amazon and Google offer simulators in their development portals which developers can use to quickly test out your skill in text-only format. This is quick and effective, but it’s impractical to give all of your UAT testers access to an in-use development portal, and while the simulators catch 99% of issues, there are a small number of cases where the assistant will perform differently on actual voice devices compared with the simulator.
For the first batch of testing we recommend testing out on Alexa and Google Assistant’s companion apps. Both apps are capable of showing exactly what the assistant heard, providing a record for developers to check against, and helping clear up some assistant misunderstandings.
To test out beta skills, UAT testers need to be invited to your development environment. These invites can be sent out through Amazon and Google’s developer portals, with Amazon users receiving an invite to try out a demo skill by email, and Google testers visiting https://console.actions.google.com/, selecting the app, then selecting ‘Simulator’ from the left-hand menu and choosing their language and surface to test on.
As you test through your skill, the screen will display what Alexa has heard after each utterance.
Note: the Alexa companion app will display the history of all words said to any device connected to the same email account. This means you can test on an Echo speaker, and still have a record of Alexa’s understanding, but it can be impractical in a commercial setting to have each tester set up with a device and phone both connected to the same email address.
As you talk through your app, the screen will display what the Assistant has understood after each utterance, as well as displaying the assistant’s response.
Note: you can see all words heard by the Assistant through any device connected to the same email address by visiting myactivity.google.com, but the interface is a little clunky, and as with Alexa it can be impractical in a commercial setting to have each tester set up with both a device and phone connected to the same email address.
Providing planning documentation to your testers, such as a skill flow diagram, as well as the content plan is essential to ensuring your entire skill is covered, moreso than ‘traditional’ digital projects where users can typically see most of the user paths available to them.
Bonus tip - for certain skills it can be better to supply content plans and flow documentation only after a tester has gone through your experience several times. This can be useful in detecting user answers you never planned for, and ensuring your skill’s opening lines provide sufficient information for the user to begin using your experience.
While both assistants are underpinned by pretty powerful natural language recognition and processing technologies, they aren’t infallible and will misunderstand words. Sometimes your skill won’t do what you want or expect it to do, but this may not be your fault, rather the assistant itself misunderstanding the input. This is where testing on mobiles is particularly useful as it allows you to see exactly what was understood.
“While both assistants are underpinned by pretty powerful natural language recognition and processing technologies, they aren’t infallible. ”
It may be that one of your expected utterances is frequently misunderstood, and in this case you might need to tweak the wording of the question you are putting to users to point them towards using words which are easier for the assistants to understand.
Another issue to watch out for is homonyms - words which sound identical. You may be expecting ‘sun’ as a response, but the assistant picks up ‘son’ as the utterance. Adding ‘son’ as a synonym for ‘sun’ fixes would fix this in this instance, and again testing on apps helps pick up homonyms you may not have accounted for when planning your skill.
The way you say things will not be the way everyone says things. It can be tricky to get testers of different accents, but where this is possible, or where you are testing with a wide range of testers, it can be useful to have testers supply their accent type to you. This can help when trying to understand or replicate bugs - testers may be pronouncing words which the assistant may interpret as another word altogether. Understanding these can help fine-tune your script to nudge users towards using words which have less variance across accents.
Both Alexa and Google Assistant are improving every day and both can now be trained to fine-tune themselves to their owner’s particular accents and voiceprint. While this doesn’t make a huge impact, you may encounter a voice device struggling to cope with a wide range of testers using the devices in a short space of time. Where possible, have the same testers use the same devices to avoid this.
While the vast majority of your users will use your skill ‘in good faith’ and give reasonable answers to the assistant’s questions, there will be users who will try and push your skill to its limits, and users who will give answers for which you had not accounted. In these instances your skill should ‘fail gracefully’ either carrying the user onward through the interaction or providing hints for things they can say.
“Your skill should ‘fail gracefully’ either carrying the user onward through the interaction or providing hints for things they can say. ”
To test for these scenarios, at Screenmedia we do ‘the potato test’ - answering ‘potato’ to every question to ensure the assistant behaves as expected (assuming ‘potato’ isn’t an expected utterance!). Of course you can use any wording to test fail paths, as long as it won’t be something the assistant expects. We have had testers repeat their favourite actors, Scottish mountains, and even the pledge of allegiance to the assistants to test out failure paths.
Note: don’t make your fail testing words too obscure, if the assistant can’t recognise it as a word, it will act as if it has heard nothing, potentially killing your skill.
While testing on mobiles is convenient, don’t forget to test out your skills in real-life scenarios. Most people interact with voice assistants through smart speakers, which aren’t typically sitting right next to them. Where possible, you should test small smart speakers (e.g. Echo Dot, Google Home Mini) at a distance of 5 feet, and larger smart speakers (e.g. regular Google Home, Amazon Echo) at a distance of 7 feet.
All Alexa skills have default intents required by Amazon and Google such as ‘help’, which tells a user about the skill, and ‘cancel’, ‘stop’, and ‘exit’, which all end a session. If your skill requires the user to say words which sound similar to these, it might be worth reconsidering your question and expected responses to guide users into using words which are more distinct than these. Remember, even if your accent doesn’t make these words sound similar, others might.
“Remember, even if your accent doesn’t make words sound similar, others might”
For larger teams testing voice experiences, recording each session adds very little extra work, but can be a huge help for developers, allowing them to check the exact utterance used and the exact response supplied by the assistant. In our experience testers are unlikely to provide exact, accurate records of what they said to an assistant even when prompted to do so. Recording testing sessions solves this and allows for much more rapid debugging.
There’s a good chance your skill will have a lot of responses coded into it. By having a unique ID for each response, and providing these to testers within your testing content plan, locating issues is greatly sped up, and leaves less room for misinterpretation if a tester has to describe the location of a bug, rather than provide an exact ID.
Remember to always make sure your voice devices are connected to a strong Internet connection. Poor connections may be unnecessarily raised as latency bugs, or actual latency issues may be passed off as poor connectivity.
With many brands exploring voice projects, it can seem like a daunting task with a lot of learning to be done, either for project teams getting to grips with voice for the first time, or project managers and testers grappling with the best method to ensure their new voice skill is foolproof. While not a comprehensive list, following these tips will set you on the path to a smooth testing phase, after which lie certification and launch!
If you’re thinking about how your brand could use Alexa and the Google Assistant to reach your audiences, get in touch and we’d love to talk.