Voice UX for

Automotive AI Systems

Voice UX for

Automotive AI Systems

Voice UX for

Automotive AI Systems

The road doesn't forgive a second.
Your voice assistant should never need one.

The road doesn't forgive a second.
Your voice assistant should never need one.

"One of the most rigorous UX research documentation I have come across on a portfolio"


-

Overview

If you look at your phone for just 2 seconds while driving at 60 km/h, you travel 33 metres blind. A good infotainment system must keep your eyes on the road at all times, helping you focus on the primary task of driving. In the UX team at Cerence AI, it was my job to help design cars that were intuitive, user-friendly and reliable. I worked closely with engineers and QA to analyse patterns, document best practices and surface critical issues for the Cerence Voice Assistant.

Goals

  1. Standardize UX evaluation across voice assistants

  2. Ensuring high quality language localisation

  3. Build and maintain centralised UX knowledge repository

Role

UX Design Specialist

Responsibilities

End-to-End UX & UI Design Process

Collaborators

UI Designers, Quality Assurance, Product Managers, Software Engineers

UI Designers, Quality Assurance, Product Managers, Software Engineers

Timeline

May 2023- June 2025

Why the end user validation team existed

Why the end user validation team existed

Releasing a product without checking how real users experience it can lead to serious usability issues and damage the brand. Cerence AI created the End User Validation team to make sure the product works not just technically, but also makes sense to real users in real-world situations.

UX Expert Reviews

We evaluated voice assistant features across domains using usability principles to identify what worked, what didn’t, and where improvements were needed.

Language Localisation

We tested the voice assistant across different languages to ensure users could interact naturally and accurately in their own language.

quick_reference_all

ux Expert review

Heuristic Evaluation and Competitive Analysis

I conducted structured heuristic evaluations on various Voice Assistants to identify usability gaps, interaction inconsistencies, and opportunities for improving conversational design and system feedback.

USABILITY PRINCIPLES

How well does the voice assistant align with established usability heuristics like clarity, consistency, and error handling?

INTERACTION QUALITY

How intuitive and efficient are the conversational flows across different domains such as navigation, HVAC, and communication?

ERROR & RECOVERY

How effectively does the system handle misinterpretations, guide users, and support recovery from failures?

quick_reference_all

Language Localisation

Getting voice UX right in every language

A phrase that feels natural in English can land as robotic, rude, or ambiguous once it is spoken in German, French, or Mandarin — and the same goes for how the system understands the driver. I led localisation reviews across multiple markets, working with linguists, QA, and engineering to catch the places where tone, prompt length, grammar, or cultural convention were quietly breaking the experience.

Prompt & Tone

Are system prompts phrased in a way that feels natural in the target language?

RECOGNITION COVERAGE

Does the assistant reliably understand regional accents, dialects, and the many ways people naturally phrase the same request?

CULTURAL FIT

Do greetings, confirmations, units, address formats, and error messages match local convention, so the experience feels designed for the market rather than translated into it?

Why does recognition accuracy vary by language?

Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.

These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.

Research is often slow, repetitive work — and I built a capacity for that in this role. There is something satisfying about going deep into the data where a small detail surfaces which changes how the entire team sees the product.


Every product company is ultimately user-facing, whether the end user is a business or a person behind a wheel. Working in this team gave me a chance to see how much a small, carefully chosen word can either earn a driver's trust or quietly lose it.

Conclusion

As a personal project I also built a NLP sentiment analysis algorithm for emotion detection with Voice AI, inspired by a project we were working on with Mercedes-Benz.

quick_reference_all

Language Localisation

Getting voice UX right in every language

A phrase that feels natural in English can land as robotic, rude, or ambiguous once it is spoken in German, French, or Mandarin — and the same goes for how the system understands the driver. I led localisation reviews across multiple markets, working with linguists, QA, and engineering to catch the places where tone, prompt length, grammar, or cultural convention were quietly breaking the experience.

Prompt & Tone

Are system prompts phrased in a way that feels natural in the target language?

RECOGNITION COVERAGE

Does the assistant reliably understand regional accents, dialects, and the many ways people naturally phrase the same request?

CULTURAL FIT

Do greetings, confirmations, units, address formats, and error messages match local convention, so the experience feels designed for the market rather than translated into it?

Why does recognition accuracy vary by language?

Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.

Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.

These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.

These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.

Prompt localisation — before & after

Translating a prompt word for word does not make it localised. Register, tone, and cultural weight all shift — and a prompt that sounds fine in English can feel robotic or off to a native speaker.

Translating a prompt word for word does not make it localised. Register, tone, and cultural weight all shift — and a prompt that sounds fine in English can feel robotic or off to a native speaker.

We tested the most common prompts with native speakers across each market, scored them, and rewrote the ones that broke — until the system felt natural in every language it spoke.

We tested the most common prompts with native speakers across each market, scored them, and rewrote the ones that broke — until the system felt natural in every language it spoke.

Conclusion

Research is often slow, repetitive work — and I built a capacity for that in this role. There is something satisfying about going deep into the data where a small detail surfaces which changes how the entire team sees the product.


Every product company is ultimately user-facing, whether the end user is a business or a person behind a wheel. Working in this team gave me a chance to see how much a small, carefully chosen word can either earn a driver's trust or quietly lose it.

As a personal project I also built a NLP sentiment analysis algorithm for emotion detection that tracks Audio logs, inspired by a project we were working on with Mercedes-Benz.

Voice UX for

Automotive AI Systems

Voice UX for

Automotive AI Systems

Voice UX for

Automotive AI Systems

The road doesn't forgive a second.
Your voice assistant should never need one.

The road doesn't forgive a second.
Your voice assistant should never need one.

"One of the most rigorous UX research documentation I have come across on a portfolio"


-

Overview

If you look at your phone for just 2 seconds while driving at 60 km/h, you travel 33 metres blind. A good infotainment system must keep your eyes on the road at all times, helping you focus on the primary task of driving. In the UX team at Cerence AI, it was my job to help design cars that were intuitive, user-friendly and reliable. I worked closely with engineers and QA to analyse patterns, document best practices and surface critical issues for the Cerence Voice Assistant.

Goals

  1. Standardize UX evaluation across voice assistants

  2. Ensuring high quality language localisation

  3. Build and maintain centralised UX knowledge repository

Role

UX Design Specialist

Responsibilities

End-to-End UX & UI Design Process

Collaborators

UI Designers, Quality Assurance, Product Managers, Software Engineers

UI Designers, Quality Assurance, Product Managers, Software Engineers

Timeline

May 2023- June 2025

Why the end user validation team existed

Why the end user validation team existed

Releasing a product without checking how real users experience it can lead to serious usability issues and damage the brand. Cerence AI created the End User Validation team to make sure the product works not just technically, but also makes sense to real users in real-world situations.

UX Expert Reviews

We evaluated voice assistant features across domains using usability principles to identify what worked, what didn’t, and where improvements were needed.

Language Localisation

We tested the voice assistant across different languages to ensure users could interact naturally and accurately in their own language.

quick_reference_all

ux Expert review

Heuristic Evaluation and Competitive Analysis

I conducted structured heuristic evaluations on various Voice Assistants to identify usability gaps, interaction inconsistencies, and opportunities for improving conversational design and system feedback.

USABILITY PRINCIPLES

How well does the voice assistant align with established usability heuristics like clarity, consistency, and error handling?

INTERACTION QUALITY

How intuitive and efficient are the conversational flows across different domains such as navigation, HVAC, and communication?

ERROR & RECOVERY

How effectively does the system handle misinterpretations, guide users, and support recovery from failures?

quick_reference_all

Language Localisation

Getting voice UX right in every language

A phrase that feels natural in English can land as robotic, rude, or ambiguous once it is spoken in German, French, or Mandarin — and the same goes for how the system understands the driver. I led localisation reviews across multiple markets, working with linguists, QA, and engineering to catch the places where tone, prompt length, grammar, or cultural convention were quietly breaking the experience.

Prompt & Tone

Are system prompts phrased in a way that feels natural in the target language?

RECOGNITION COVERAGE

Does the assistant reliably understand regional accents, dialects, and the many ways people naturally phrase the same request?

CULTURAL FIT

Do greetings, confirmations, units, address formats, and error messages match local convention, so the experience feels designed for the market rather than translated into it?

Why does recognition accuracy vary by language?

Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.

These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.

Research is often slow, repetitive work — and I built a capacity for that in this role. There is something satisfying about going deep into the data where a small detail surfaces which changes how the entire team sees the product.


Every product company is ultimately user-facing, whether the end user is a business or a person behind a wheel. Working in this team gave me a chance to see how much a small, carefully chosen word can either earn a driver's trust or quietly lose it.

Conclusion

As a personal project I also built a NLP sentiment analysis algorithm for emotion detection with Voice AI, inspired by a project we were working on with Mercedes-Benz.

quick_reference_all

Language Localisation

Getting voice UX right in every language

A phrase that feels natural in English can land as robotic, rude, or ambiguous once it is spoken in German, French, or Mandarin — and the same goes for how the system understands the driver. I led localisation reviews across multiple markets, working with linguists, QA, and engineering to catch the places where tone, prompt length, grammar, or cultural convention were quietly breaking the experience.

Prompt & Tone

Are system prompts phrased in a way that feels natural in the target language?

RECOGNITION COVERAGE

Does the assistant reliably understand regional accents, dialects, and the many ways people naturally phrase the same request?

CULTURAL FIT

Do greetings, confirmations, units, address formats, and error messages match local convention, so the experience feels designed for the market rather than translated into it?

Why does recognition accuracy vary by language?

Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.

Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.

These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.

These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.

Prompt localisation — before & after

Translating a prompt word for word does not make it localised. Register, tone, and cultural weight all shift — and a prompt that sounds fine in English can feel robotic or off to a native speaker.

Translating a prompt word for word does not make it localised. Register, tone, and cultural weight all shift — and a prompt that sounds fine in English can feel robotic or off to a native speaker.

We tested the most common prompts with native speakers across each market, scored them, and rewrote the ones that broke — until the system felt natural in every language it spoke.

We tested the most common prompts with native speakers across each market, scored them, and rewrote the ones that broke — until the system felt natural in every language it spoke.

Conclusion

Research is often slow, repetitive work — and I built a capacity for that in this role. There is something satisfying about going deep into the data where a small detail surfaces which changes how the entire team sees the product.


Every product company is ultimately user-facing, whether the end user is a business or a person behind a wheel. Working in this team gave me a chance to see how much a small, carefully chosen word can either earn a driver's trust or quietly lose it.

As a personal project I also built a NLP sentiment analysis algorithm for emotion detection that tracks Audio logs, inspired by a project we were working on with Mercedes-Benz.

Voice UX for

Automotive AI Systems

Voice UX for

Automotive AI Systems

Voice UX for

Automotive AI Systems

The road doesn't forgive a second.
Your voice assistant should never need one.

The road doesn't forgive a second.
Your voice assistant should never need one.

"One of the most rigorous UX research documentation I have come across on a portfolio"


-

Overview

If you look at your phone for just 2 seconds while driving at 60 km/h, you travel 33 metres blind. A good infotainment system must keep your eyes on the road at all times, helping you focus on the primary task of driving. In the UX team at Cerence AI, it was my job to help design cars that were intuitive, user-friendly and reliable. I worked closely with engineers and QA to analyse patterns, document best practices and surface critical issues for the Cerence Voice Assistant.

Goals

  1. Standardize UX evaluation across voice assistants

  2. Ensuring high quality language localisation

  3. Build and maintain centralised UX knowledge repository

Role

UX Design Specialist

Responsibilities

End-to-End UX & UI Design Process

Collaborators

UI Designers, Quality Assurance, Product Managers, Software Engineers

UI Designers, Quality Assurance, Product Managers, Software Engineers

Timeline

May 2023- June 2025

Why the end user validation team existed

Why the end user validation team existed

Releasing a product without checking how real users experience it can lead to serious usability issues and damage the brand. Cerence AI created the End User Validation team to make sure the product works not just technically, but also makes sense to real users in real-world situations.

UX Expert Reviews

We evaluated voice assistant features across domains using usability principles to identify what worked, what didn’t, and where improvements were needed.

Language Localisation

We tested the voice assistant across different languages to ensure users could interact naturally and accurately in their own language.

quick_reference_all

ux Expert review

Heuristic Evaluation and Competitive Analysis

I conducted structured heuristic evaluations on various Voice Assistants to identify usability gaps, interaction inconsistencies, and opportunities for improving conversational design and system feedback.

USABILITY PRINCIPLES

How well does the voice assistant align with established usability heuristics like clarity, consistency, and error handling?

INTERACTION QUALITY

How intuitive and efficient are the conversational flows across different domains such as navigation, HVAC, and communication?

ERROR & RECOVERY

How effectively does the system handle misinterpretations, guide users, and support recovery from failures?

quick_reference_all

Language Localisation

Getting voice UX right in every language

A phrase that feels natural in English can land as robotic, rude, or ambiguous once it is spoken in German, French, or Mandarin — and the same goes for how the system understands the driver. I led localisation reviews across multiple markets, working with linguists, QA, and engineering to catch the places where tone, prompt length, grammar, or cultural convention were quietly breaking the experience.

Prompt & Tone

Are system prompts phrased in a way that feels natural in the target language?

RECOGNITION COVERAGE

Does the assistant reliably understand regional accents, dialects, and the many ways people naturally phrase the same request?

CULTURAL FIT

Do greetings, confirmations, units, address formats, and error messages match local convention, so the experience feels designed for the market rather than translated into it?

Why does recognition accuracy vary by language?

Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.

These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.

Research is often slow, repetitive work — and I built a capacity for that in this role. There is something satisfying about going deep into the data where a small detail surfaces which changes how the entire team sees the product.


Every product company is ultimately user-facing, whether the end user is a business or a person behind a wheel. Working in this team gave me a chance to see how much a small, carefully chosen word can either earn a driver's trust or quietly lose it.

Conclusion

As a personal project I also built a NLP sentiment analysis algorithm for emotion detection with Voice AI, inspired by a project we were working on with Mercedes-Benz.

quick_reference_all

Language Localisation

Getting voice UX right in every language

A phrase that feels natural in English can land as robotic, rude, or ambiguous once it is spoken in German, French, or Mandarin — and the same goes for how the system understands the driver. I led localisation reviews across multiple markets, working with linguists, QA, and engineering to catch the places where tone, prompt length, grammar, or cultural convention were quietly breaking the experience.

Prompt & Tone

Are system prompts phrased in a way that feels natural in the target language?

RECOGNITION COVERAGE

Does the assistant reliably understand regional accents, dialects, and the many ways people naturally phrase the same request?

CULTURAL FIT

Do greetings, confirmations, units, address formats, and error messages match local convention, so the experience feels designed for the market rather than translated into it?

Why does recognition accuracy vary by language?

Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.

Traditional NLP models translate the intent from the target language into English before processing it — but in doing so, meaning is often lost. In German, "Ruf Mama an" is a natural, spoken command. Translated literally, it becomes "Call Mama on" — grammatically broken in English, and therefore misclassified by the model. The system heard the words but lost the intent.

These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.

These translation inconsistencies compound across dialects and domains. To measure this, we tested the most common utterances per language variant and benchmarked recognition accuracy — giving us a clear picture of where the model needed to improve before a real driver ever sat behind the wheel.

Prompt localisation — before & after

Translating a prompt word for word does not make it localised. Register, tone, and cultural weight all shift — and a prompt that sounds fine in English can feel robotic or off to a native speaker.

Translating a prompt word for word does not make it localised. Register, tone, and cultural weight all shift — and a prompt that sounds fine in English can feel robotic or off to a native speaker.

We tested the most common prompts with native speakers across each market, scored them, and rewrote the ones that broke — until the system felt natural in every language it spoke.

We tested the most common prompts with native speakers across each market, scored them, and rewrote the ones that broke — until the system felt natural in every language it spoke.

Conclusion

Research is often slow, repetitive work — and I built a capacity for that in this role. There is something satisfying about going deep into the data where a small detail surfaces which changes how the entire team sees the product.


Every product company is ultimately user-facing, whether the end user is a business or a person behind a wheel. Working in this team gave me a chance to see how much a small, carefully chosen word can either earn a driver's trust or quietly lose it.

As a personal project I also built a NLP sentiment analysis algorithm for emotion detection that tracks Audio logs, inspired by a project we were working on with Mercedes-Benz.