Voice UX for
Automotive AI Systems
Voice UX for
Automotive AI Systems
Voice UX for
Automotive AI Systems
The road doesn't forgive a second.
Your voice assistant should never need one.
The road doesn't forgive a second.
Your voice assistant should never need one.
"One of the most rigorous UX research documentation I have come across on a portfolio"
-






Overview
If you look at your phone for just 2 seconds while driving at 60 km/h, you travel 33 metres blind. A good infotainment system must keep your eyes on the road at all times, helping you focus on the primary task of driving. In the UX team at Cerence AI, it was my job to help design cars that were intuitive, user-friendly and reliable. I worked closely with engineers and QA to analyse patterns, document best practices and surface critical issues for the Cerence Voice Assistant.
Goals
Standardize UX evaluation across voice assistants
Ensuring high quality language localisation
Build and maintain centralised UX knowledge repository
Role
UX Design Specialist
Responsibilities
End-to-End UX & UI Design Process
Collaborators
UI Designers, Quality Assurance, Product Managers, Software Engineers
UI Designers, Quality Assurance, Product Managers, Software Engineers
Timeline
May 2023- June 2025

Why the end user validation team existed
Why the end user validation team existed
Releasing a product without checking how real users experience it can lead to serious usability issues and damage the brand. Cerence AI created the End User Validation team to make sure the product works not just technically, but also makes sense to real users in real-world situations.
UX Expert Reviews
We evaluated voice assistant features across domains using usability principles to identify what worked, what didn’t, and where improvements were needed.
Language Localisation
We tested the voice assistant across different languages to ensure users could interact naturally and accurately in their own language.
quick_reference_all
ux Expert review
Heuristic Evaluation and Competitive Analysis
I conducted structured heuristic evaluations on various Voice Assistants to identify usability gaps, interaction inconsistencies, and opportunities for improving conversational design and system feedback.
USABILITY PRINCIPLES
How well does the voice assistant align with established usability heuristics like clarity, consistency, and error handling?
INTERACTION QUALITY
How intuitive and efficient are the conversational flows across different domains such as navigation, HVAC, and communication?
ERROR & RECOVERY
How effectively does the system handle misinterpretations, guide users, and support recovery from failures?
quick_reference_all
Language Localisation
Getting voice UX right in every language
A phrase that feels natural in English can land as robotic, rude, or ambiguous once it is spoken in German, French, or Mandarin — and the same goes for how the system understands the driver. I led localisation reviews across multiple markets, working with linguists, QA, and engineering to catch the places where tone, prompt length, grammar, or cultural convention were quietly breaking the experience.
Prompt & Tone
Are system prompts phrased in a way that feels natural in the target language?
RECOGNITION COVERAGE
Does the assistant reliably understand regional accents, dialects, and the many ways people naturally phrase the same request?
CULTURAL FIT
Do greetings, confirmations, units, address formats, and error messages match local convention, so the experience feels designed for the market rather than translated into it?
Why does recognition accuracy vary by language?
Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.
These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.
Research is often slow, repetitive work — and I built a capacity for that in this role. There is something satisfying about going deep into the data where a small detail surfaces which changes how the entire team sees the product.
Every product company is ultimately user-facing, whether the end user is a business or a person behind a wheel. Working in this team gave me a chance to see how much a small, carefully chosen word can either earn a driver's trust or quietly lose it.
Conclusion
As a personal project I also built a NLP sentiment analysis algorithm for emotion detection with Voice AI, inspired by a project we were working on with Mercedes-Benz.
quick_reference_all
Language Localisation
Getting voice UX right in every language
A phrase that feels natural in English can land as robotic, rude, or ambiguous once it is spoken in German, French, or Mandarin — and the same goes for how the system understands the driver. I led localisation reviews across multiple markets, working with linguists, QA, and engineering to catch the places where tone, prompt length, grammar, or cultural convention were quietly breaking the experience.
Prompt & Tone
Are system prompts phrased in a way that feels natural in the target language?
RECOGNITION COVERAGE
Does the assistant reliably understand regional accents, dialects, and the many ways people naturally phrase the same request?
CULTURAL FIT
Do greetings, confirmations, units, address formats, and error messages match local convention, so the experience feels designed for the market rather than translated into it?
Why does recognition accuracy vary by language?
Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.
Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.
These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.
These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.
Prompt localisation — before & after
Translating a prompt word for word does not make it localised. Register, tone, and cultural weight all shift — and a prompt that sounds fine in English can feel robotic or off to a native speaker.
Translating a prompt word for word does not make it localised. Register, tone, and cultural weight all shift — and a prompt that sounds fine in English can feel robotic or off to a native speaker.
We tested the most common prompts with native speakers across each market, scored them, and rewrote the ones that broke — until the system felt natural in every language it spoke.
We tested the most common prompts with native speakers across each market, scored them, and rewrote the ones that broke — until the system felt natural in every language it spoke.
Conclusion
Research is often slow, repetitive work — and I built a capacity for that in this role. There is something satisfying about going deep into the data where a small detail surfaces which changes how the entire team sees the product.
Every product company is ultimately user-facing, whether the end user is a business or a person behind a wheel. Working in this team gave me a chance to see how much a small, carefully chosen word can either earn a driver's trust or quietly lose it.
As a personal project I also built a NLP sentiment analysis algorithm for emotion detection that tracks Audio logs, inspired by a project we were working on with Mercedes-Benz.
Voice UX for
Automotive AI Systems
Voice UX for
Automotive AI Systems
Voice UX for
Automotive AI Systems
The road doesn't forgive a second.
Your voice assistant should never need one.
The road doesn't forgive a second.
Your voice assistant should never need one.
"One of the most rigorous UX research documentation I have come across on a portfolio"
-






Overview
If you look at your phone for just 2 seconds while driving at 60 km/h, you travel 33 metres blind. A good infotainment system must keep your eyes on the road at all times, helping you focus on the primary task of driving. In the UX team at Cerence AI, it was my job to help design cars that were intuitive, user-friendly and reliable. I worked closely with engineers and QA to analyse patterns, document best practices and surface critical issues for the Cerence Voice Assistant.
Goals
Standardize UX evaluation across voice assistants
Ensuring high quality language localisation
Build and maintain centralised UX knowledge repository
Role
UX Design Specialist
Responsibilities
End-to-End UX & UI Design Process
Collaborators
UI Designers, Quality Assurance, Product Managers, Software Engineers
UI Designers, Quality Assurance, Product Managers, Software Engineers
Timeline
May 2023- June 2025

Why the end user validation team existed
Why the end user validation team existed
Releasing a product without checking how real users experience it can lead to serious usability issues and damage the brand. Cerence AI created the End User Validation team to make sure the product works not just technically, but also makes sense to real users in real-world situations.
UX Expert Reviews
We evaluated voice assistant features across domains using usability principles to identify what worked, what didn’t, and where improvements were needed.
Language Localisation
We tested the voice assistant across different languages to ensure users could interact naturally and accurately in their own language.
quick_reference_all
ux Expert review
Heuristic Evaluation and Competitive Analysis
I conducted structured heuristic evaluations on various Voice Assistants to identify usability gaps, interaction inconsistencies, and opportunities for improving conversational design and system feedback.
USABILITY PRINCIPLES
How well does the voice assistant align with established usability heuristics like clarity, consistency, and error handling?
INTERACTION QUALITY
How intuitive and efficient are the conversational flows across different domains such as navigation, HVAC, and communication?
ERROR & RECOVERY
How effectively does the system handle misinterpretations, guide users, and support recovery from failures?
quick_reference_all
Language Localisation
Getting voice UX right in every language
A phrase that feels natural in English can land as robotic, rude, or ambiguous once it is spoken in German, French, or Mandarin — and the same goes for how the system understands the driver. I led localisation reviews across multiple markets, working with linguists, QA, and engineering to catch the places where tone, prompt length, grammar, or cultural convention were quietly breaking the experience.
Prompt & Tone
Are system prompts phrased in a way that feels natural in the target language?
RECOGNITION COVERAGE
Does the assistant reliably understand regional accents, dialects, and the many ways people naturally phrase the same request?
CULTURAL FIT
Do greetings, confirmations, units, address formats, and error messages match local convention, so the experience feels designed for the market rather than translated into it?
Why does recognition accuracy vary by language?
Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.
These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.
Research is often slow, repetitive work — and I built a capacity for that in this role. There is something satisfying about going deep into the data where a small detail surfaces which changes how the entire team sees the product.
Every product company is ultimately user-facing, whether the end user is a business or a person behind a wheel. Working in this team gave me a chance to see how much a small, carefully chosen word can either earn a driver's trust or quietly lose it.
Conclusion
As a personal project I also built a NLP sentiment analysis algorithm for emotion detection with Voice AI, inspired by a project we were working on with Mercedes-Benz.
quick_reference_all
Language Localisation
Getting voice UX right in every language
A phrase that feels natural in English can land as robotic, rude, or ambiguous once it is spoken in German, French, or Mandarin — and the same goes for how the system understands the driver. I led localisation reviews across multiple markets, working with linguists, QA, and engineering to catch the places where tone, prompt length, grammar, or cultural convention were quietly breaking the experience.
Prompt & Tone
Are system prompts phrased in a way that feels natural in the target language?
RECOGNITION COVERAGE
Does the assistant reliably understand regional accents, dialects, and the many ways people naturally phrase the same request?
CULTURAL FIT
Do greetings, confirmations, units, address formats, and error messages match local convention, so the experience feels designed for the market rather than translated into it?
Why does recognition accuracy vary by language?
Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.
Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.
These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.
These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.
Prompt localisation — before & after
Translating a prompt word for word does not make it localised. Register, tone, and cultural weight all shift — and a prompt that sounds fine in English can feel robotic or off to a native speaker.
Translating a prompt word for word does not make it localised. Register, tone, and cultural weight all shift — and a prompt that sounds fine in English can feel robotic or off to a native speaker.
We tested the most common prompts with native speakers across each market, scored them, and rewrote the ones that broke — until the system felt natural in every language it spoke.
We tested the most common prompts with native speakers across each market, scored them, and rewrote the ones that broke — until the system felt natural in every language it spoke.
Conclusion
Research is often slow, repetitive work — and I built a capacity for that in this role. There is something satisfying about going deep into the data where a small detail surfaces which changes how the entire team sees the product.
Every product company is ultimately user-facing, whether the end user is a business or a person behind a wheel. Working in this team gave me a chance to see how much a small, carefully chosen word can either earn a driver's trust or quietly lose it.
As a personal project I also built a NLP sentiment analysis algorithm for emotion detection that tracks Audio logs, inspired by a project we were working on with Mercedes-Benz.
Voice UX for
Automotive AI Systems
Voice UX for
Automotive AI Systems
Voice UX for
Automotive AI Systems
The road doesn't forgive a second.
Your voice assistant should never need one.
The road doesn't forgive a second.
Your voice assistant should never need one.
"One of the most rigorous UX research documentation I have come across on a portfolio"
-






Overview
If you look at your phone for just 2 seconds while driving at 60 km/h, you travel 33 metres blind. A good infotainment system must keep your eyes on the road at all times, helping you focus on the primary task of driving. In the UX team at Cerence AI, it was my job to help design cars that were intuitive, user-friendly and reliable. I worked closely with engineers and QA to analyse patterns, document best practices and surface critical issues for the Cerence Voice Assistant.
Goals
Standardize UX evaluation across voice assistants
Ensuring high quality language localisation
Build and maintain centralised UX knowledge repository
Role
UX Design Specialist
Responsibilities
End-to-End UX & UI Design Process
Collaborators
UI Designers, Quality Assurance, Product Managers, Software Engineers
UI Designers, Quality Assurance, Product Managers, Software Engineers
Timeline
May 2023- June 2025

Why the end user validation team existed
Why the end user validation team existed
Releasing a product without checking how real users experience it can lead to serious usability issues and damage the brand. Cerence AI created the End User Validation team to make sure the product works not just technically, but also makes sense to real users in real-world situations.
UX Expert Reviews
We evaluated voice assistant features across domains using usability principles to identify what worked, what didn’t, and where improvements were needed.
Language Localisation
We tested the voice assistant across different languages to ensure users could interact naturally and accurately in their own language.
quick_reference_all
ux Expert review
Heuristic Evaluation and Competitive Analysis
I conducted structured heuristic evaluations on various Voice Assistants to identify usability gaps, interaction inconsistencies, and opportunities for improving conversational design and system feedback.
USABILITY PRINCIPLES
How well does the voice assistant align with established usability heuristics like clarity, consistency, and error handling?
INTERACTION QUALITY
How intuitive and efficient are the conversational flows across different domains such as navigation, HVAC, and communication?
ERROR & RECOVERY
How effectively does the system handle misinterpretations, guide users, and support recovery from failures?
quick_reference_all
Language Localisation
Getting voice UX right in every language
A phrase that feels natural in English can land as robotic, rude, or ambiguous once it is spoken in German, French, or Mandarin — and the same goes for how the system understands the driver. I led localisation reviews across multiple markets, working with linguists, QA, and engineering to catch the places where tone, prompt length, grammar, or cultural convention were quietly breaking the experience.
Prompt & Tone
Are system prompts phrased in a way that feels natural in the target language?
RECOGNITION COVERAGE
Does the assistant reliably understand regional accents, dialects, and the many ways people naturally phrase the same request?
CULTURAL FIT
Do greetings, confirmations, units, address formats, and error messages match local convention, so the experience feels designed for the market rather than translated into it?
Why does recognition accuracy vary by language?
Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.
These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.
Research is often slow, repetitive work — and I built a capacity for that in this role. There is something satisfying about going deep into the data where a small detail surfaces which changes how the entire team sees the product.
Every product company is ultimately user-facing, whether the end user is a business or a person behind a wheel. Working in this team gave me a chance to see how much a small, carefully chosen word can either earn a driver's trust or quietly lose it.
Conclusion
As a personal project I also built a NLP sentiment analysis algorithm for emotion detection with Voice AI, inspired by a project we were working on with Mercedes-Benz.
quick_reference_all
Language Localisation
Getting voice UX right in every language
A phrase that feels natural in English can land as robotic, rude, or ambiguous once it is spoken in German, French, or Mandarin — and the same goes for how the system understands the driver. I led localisation reviews across multiple markets, working with linguists, QA, and engineering to catch the places where tone, prompt length, grammar, or cultural convention were quietly breaking the experience.
Prompt & Tone
Are system prompts phrased in a way that feels natural in the target language?
RECOGNITION COVERAGE
Does the assistant reliably understand regional accents, dialects, and the many ways people naturally phrase the same request?
CULTURAL FIT
Do greetings, confirmations, units, address formats, and error messages match local convention, so the experience feels designed for the market rather than translated into it?
Why does recognition accuracy vary by language?
Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.
Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.
These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.
These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.
Prompt localisation — before & after
Translating a prompt word for word does not make it localised. Register, tone, and cultural weight all shift — and a prompt that sounds fine in English can feel robotic or off to a native speaker.
Translating a prompt word for word does not make it localised. Register, tone, and cultural weight all shift — and a prompt that sounds fine in English can feel robotic or off to a native speaker.
We tested the most common prompts with native speakers across each market, scored them, and rewrote the ones that broke — until the system felt natural in every language it spoke.
We tested the most common prompts with native speakers across each market, scored them, and rewrote the ones that broke — until the system felt natural in every language it spoke.
Conclusion
Research is often slow, repetitive work — and I built a capacity for that in this role. There is something satisfying about going deep into the data where a small detail surfaces which changes how the entire team sees the product.
Every product company is ultimately user-facing, whether the end user is a business or a person behind a wheel. Working in this team gave me a chance to see how much a small, carefully chosen word can either earn a driver's trust or quietly lose it.
As a personal project I also built a NLP sentiment analysis algorithm for emotion detection that tracks Audio logs, inspired by a project we were working on with Mercedes-Benz.