Available MRCP grammars

Unless otherwise specified, these grammars work in French as well as in English.

Multi-grammars

Support of multiple grammars is available using command RECOGNIZE.

Note that grammars can not be weighted. The first matching grammar will be returned. In case several grammar match at the same time, the first one listed will be returned.

Default

Raw ASR, without any interpretation and using only words in letters (no digits).

Grammar name:

  • builtin:grammar/none
  • builtin:speech/transcribe: alias to builtin:grammar/none,
  • builtin:speech/none: alias to builtin:speech/transcribe.

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>je veux changer mon billet</instance>
<input mode="speech" timestamp-start="2020-12-22T11:11:45.620+01:00" timestamp-end="2020-12-22T11:11:47.060+01:00" confidence="1.00">je veux changer mon billet</input>
</interpretation>
</result>

Address

Address grammar for postal addresses.

In order to validate a complete-match, it is necessary that these 3 elements are expressed:

  • Street
  • City
  • Zip code

It is possible to announce these 3 elements in any order.

The language model is based on real address dictionaries. So you have to avoid imaginary addresses if you want to measure the performance of the grammar.

About the modes:

  • In normal mode: it is possible not to give the 3 elements: street, city and zip code. If one or more are missing, you will get a "partial-match". The "no-match" will only occur if the ASR is not able to detect anything.
  • In hotword mode: if one or more of the 3 elements are missing, then you will get a "no-match". The hotword mode only supports "complete" or "no-match".

At the moment this feature is only available in French and for addresses in France.

Raw mode (deprecated)

Please use the structured mode instead.

Grammar name:

  • builtin:speech/address (will be removed in 2022)

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>37 rue du docteur leroy 72000 le mans</instance>
<input mode="speech" timestamp-start="2021-02-19T10:20:21.114+01:00" timestamp-end="2021-02-19T10:20:25.224+01:00" confidence="1.00">trente-sept rue du docteur leroy soixante-douze mille le mans</input>
</interpretation>
</result>

Structured Address

Instead of returning a plain text result, this grammar returns an XML document with different fields for street number, street, city and postal code. It's quite robust. You should use this grammar for maximum acuracy and detailled address information.

At the moment this feature is only available in French and for addresses in France.

Grammar name:

  • builtin:speech/postal_address

The older alias builtin:speech/address?struct is deprecated and will be removed in 2022.

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>
<address>
<number>37</number>
<street>rue du docteur leroy</street>
<zipcode>72000</zipcode>
<city>le mans</city>
</address>
</instance>
<input mode="speech" timestamp-start="2021-03-31T11:12:12.606+02:00" timestamp-end="2021-03-31T11:12:16.716+02:00" confidence="1.00">alors j'habite au trente-sept rue du docteur leroy au mans et euh le code postal c'est soixante-douze mille</input>
</interpretation>
</result>

Boolean

The Yes or No grammar. Compatible with Hotword mode.

Is boolean the right builtin for your interaction? If you're asking a closed-ended question, there is always a possibility to reword it to get a boolean anwser. If it's an open-ended question check the other builtins.

Analyze the user input to determine if they accept/confirm or refuse/deny.

When the grammar matches, the interpretation contains either "yes" or "no".

Grammar name:

  • builtin:speech/boolean

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>yes</instance>
<input mode="speech" timestamp-start="2020-12-22T11:11:45.620+01:00" timestamp-end="2020-12-22T11:11:47.060+01:00" confidence="1.00">euh tout à fait</input>
</interpretation>
</result>

Date

The date grammar is able to parse and understand dates in various formats, including relative dates. It returns a structured date in XML format, with a flag indicating if the date is valid (actually exists) or not. You can "hint" the grammar to favor past dates or future dates when part of the date is implied by the user (e.g. "on Monday", "on 10/10"…).

About the modes:

  • In normal mode: it is possible not to give the 3 elements: day, month and year. If the year is missing, the grammar will assume the nearest one that satisfies the time "hint". If the day or month cannot be found, then you will get a "partial-match". The "no-match" will only occur if the ASR is not able to detect anything.
  • In hotword mode: if the day or the month are missing and cannot be inferred, then you will get a "no-match". The hotword mode only supports "complete" or "no-match".

About the time hint:

  • Use the parameter time_hint as shown in the example below
  • past value, which is the default one, will favor dates in the past
  • future value will favor dates in the future

Note that time_hint=future will return a date in the past, for example if the user says a date that is explicitly in the past. At the moment this feature is only fully supported in French, and partially supported (numeric dates only) in European English.

Grammar name:

  • builtin:speech/date, which is equivalent to builtin:speech/date?time_hint=past
  • builtin:speech/date?time_hint=future

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>
<date>
<day>10</day>
<month>10</month>
<year>2010</year>
<valid>yes</valid>
</date>
</instance>
<input mode="speech" timestamp-start="2022-02-01T14:58:19.257+00:00" timestamp-end="2022-02-01T14:58:20.817+00:00" confidence="1.00">le dix dix dix</input>
</interpretation>
</result>

Examples of fully understood dates in French (match):

  • cinq avril mille neuf cent quatre-vingt
  • quinze mai quatre-vingt
  • trois août dix-neuf cent soixante
  • onze onze cinquante (result depends on the time hint)
  • vingt zéro six mille neuf cent six
  • aujourd'hui
  • demain
  • hier
  • après demain
  • avant hier
  • lundi prochain
  • mardi dernier
  • dans cinq jours
  • il y a huit jours
  • lundi (result depends on the time hint)
  • 10 août (result depends on the time hint)

Examples of dates triggering partial matches:

  • janvier 2015
  • mars 80
  • zéro six soixante-dix-huit
  • en mars (year will be guessed according to the time hint)

Keyword spotting

Match user input for some keyword from a set you define. The interpretation returns the matched keyword. Compatible with Hotword mode.

This grammar is really useful for example to offer end-users with choices at the beginning of their interaction with the IVR.

Grammar name:

  • builtin:speech/keywords?alternatives= + mutually exclusive keywords separated by |
  • Example: builtin:speech/keywords?alternatives=facture|commande|compte|conseiller

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>commande</instance>
<input mode="speech" timestamp-start="2021-02-19T10:20:21.114+01:00" timestamp-end="2021-02-19T10:20:25.224+01:00" confidence="1.00">
je voudrais euh suivre ma commande
</input>
</interpretation>
</result>

Numbers in digit form

Raw ASR with interpretation of numbers in digit form.

Grammar name:

  • builtin:speech/text2num

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>environ 1575 euros et 28 centimes</instance>
<input mode="speech" timestamp-start="2021-02-19T10:20:21.114+01:00" timestamp-end="2021-02-19T10:20:25.224+01:00" confidence="1.00">environ mille cinq cent soixante-quinze euros et vingt-huit centimes</input>
</interpretation>
</result>

Spelling

With the following grammars, the interpretation forces understanding of letters and numbers (returned in digit form) while filtering out parasite words.

Spelled punctuation and diacritics are supported only with builtin:speech/spelling/mixed_with_punct.

Alphanumeric

Spelling mode for both letters and digits.

Grammar name:

  • builtin:speech/spelling/mixed

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="0.92">
<instance>r n e 123</instance>
<input mode="speech" timestamp-start="2021-02-19T10:28:30.294+01:00" timestamp-end="2021-02-19T10:28:32.944+01:00" confidence="0.98">air euh n e accent aigu tiret cent vingt-trois</input>
</interpretation>
</result>

Alphanumeric plus punctuation and diacritics

Grammar name:

  • builtin:speech/spelling/mixed_with_punct

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="0.92">
<instance>r e n é - m ü l l e r @ g m a i l . c o m</instance>
<input mode="speech" timestamp-start="2021-02-19T10:28:30.294+01:00" timestamp-end="2021-02-19T10:28:32.944+01:00" confidence="0.98">air euh n e accent aigu tiret m u tréma deux ailes e r arobase g m a i l point c o m</input>
</interpretation>
</result>

Digits only

Spelling mode restricted to numbers (in digit form) only.

Grammar name:

  • builtin:speech/spelling/digits

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="0.79">
<instance>1 2</instance>
<input mode="speech" timestamp-start="2021-02-19T10:30:12.684+01:00" timestamp-end="2021-02-19T10:30:17.934+01:00" confidence="0.63">ah ah ah un deux ah ah ah</input>
</interpretation>
</result>

Concerning the grouping of numbers, there are two possibilities:

  • Use of a "regex" or "length" parameter: all the digits are grouped and formatted without spaces.
  • No use of a "regex" or "length" parameter: no formatting is defined, the grouping is done according to what has been said.

Letters only

Spelling mode restricted to letters only. This grammar is especially useful for interpretation of names.

Grammar name:

  • builtin:speech/spelling/letters

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="0.91">
<instance>t n c a p b f f</instance>
<input mode="speech" timestamp-start="2021-02-19T10:31:54.664+01:00" timestamp-end="2021-02-19T10:32:06.814+01:00" confidence="0.96">alors t n c a p b f quatre-vingt-dix-huit f cinq dix-sept trente-huit soixante et un trente-trois dix-sept</input>
</interpretation>
</result>

Additional options

All of the above spelling grammars (mixed, digits and letters) accept additional options to narrow down the interpretation.

Thanks to these options, you will be able to interpret phone numbers, customer identifiers, social security numbers, licence plates and many other entities.

regex and length parameters are mutually exclusive.

Length

Forces the interpretation as a single word of the given length. Compatible with Hotword mode.

Grammar names:

  • builtin:speech/spelling/mixed?length= + integer
  • builtin:speech/spelling/digits?length= + integer
  • builtin:speech/spelling/letters?length= + integer

Response example for builtin:speech/spelling/digits?length=5:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>35170</instance>
<input mode="speech" timestamp-start="2021-02-19T10:33:36.303+01:00" timestamp-end="2021-02-19T10:33:38.193+01:00" confidence="1.00">trente-cinq mille cent soixante-dix</input>
</interpretation>
</result>

If you're looking for a ZipCode interpretation, please have a look at our dedicated grammar.

If you're looking for a phone number, the spelling digits grammar with a specific length is what you're looking for.

Regex

The interpretation returns the regular expression match as a single word. Compatible with Hotword mode.

Note that partial matches are not supported with this parameter set, so you'd better use hotword mode with it.

The regex syntax supported is detailed here: https://docs.rs/regex/1.4.3/regex/#syntax. We also provide an additional non-standard operator: ||.

The || operator is like the alternative operator | except that it actually enforces the precedence of the leftmost alternatives and that it searches the whole utterance in normal mode.

For example, given the string aaabb:

  • b+|a+ matches aaa
  • b+||a+ matches bb, that is, the first regex before ||actually has precedence over the second one.

For example, people hesitating or reading their account number on a piece of paper might say "123 humm wait... yeah 123456". Matching the rightmost element allow to return the correct value 123456, not 123123.

Another, more subtle example. Given the string ab123xz512pr:

  • [a-z]{2}[0-9]{3}[a-z]{2}$|[a-z]{2}[0-9]{3}[a-z]{2} matches ab123xz
  • [a-z]{2}[0-9]{3}[a-z]{2}$||[a-z]{2}[0-9]{3}[a-z]{2} matches xz512pr

Don't use the || operator in hotword mode as it will then behave exactly like |, but slower.

Grammar names:

  • builtin:speech/spelling/mixed?regex= + pattern
  • builtin:speech/spelling/digits?regex= + pattern
  • builtin:speech/spelling/letters?regex= + pattern

Response example for builtin:speech/spelling/mixed?regex=[a-z]{2}[0-9]{9}[a-z]{2}:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="0.98">
<instance>bf969352528jd</instance>
<input mode="speech" timestamp-start="2021-02-19T10:35:20.474+01:00" timestamp-end="2021-02-19T10:35:30.014+01:00" confidence="0.98">b f neuf six neuf trois cinq deux cinq deux huit j d</input>
</interpretation>
</result>

Looking to interpret licence plates? That's probably the grammar you're looking for!

  • For example for both old and new licence plates in France, the pattern you're looking for is ([a-z]{2}[0-9]{3}[a-z]{2})|([0-9]{4}[a-z]{3}[0-9]{2}),
  • For the UK, the pattern would be [a-z]{2}[0-9]{2}[a-z]{3}

Note that these patterns are quite straightforward but could be improved. For example in the UK Z cannot be part of the two first letters, or in France the letter O is never used.

Clarifications on homophones

When using spelling grammars, we may encounter homophones that can confuse the expected results. For example, "mon numéro DE colis" may be understood and transformed into the number "2". In most cases, homophones are corrected and not added to the results. Some homophones may not be corrected yet, we will add new rules as we encounter these cases. Do not hesitate to give us feedback about this.

Zipcode

The interpretation returns a 5 digits ZIP codes that are splittable as 2+3 digits, as it can be found in France, Spain or USA. Compatible with Hotword mode.

To conform to MRCPv2 RFC, note that:

  • in normal mode, it may return a partial-match if what the user said starts like a legal zip code
  • in hotword mode, it only returns match or no-match (you'd have to set a hotword-max-duration for that)

Grammar name:

  • builtin:speech/spelling/zipcode
  • builtin:speech/zipcode alias to builtin:speech/spelling/zipcode

Response example:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<interpretation grammar="session:demo-grammar-0" confidence="1.00">
<instance>35170</instance>
<input mode="speech" timestamp-start="2021-02-19T10:37:17.755+01:00" timestamp-end="2021-02-19T10:37:20.125+01:00" confidence="1.00">trente-cinq cent soixante-dix</input>
</interpretation>
</result>

Beware that this builtin is not universal. For other patterns than the one described above, we recommend you use spelling digits with length option instead.