Table of content

Getting started with the real-time WebSocket Stream H2B API

What are you building here

This Getting Started guides you to build a very simple app. Speaking in your microphone, you'll try to guess a number between 0 and 99 in less than 5 tries.

You can download the Python script we're going to build on our repository, either sync or async version. You can also find other examples on the Git repository.

Get credentials

Please contact our support to ask for credentials.

Install the python SDK

Get the Python SDK

The SDK is available on pip. It is built for Python 3.7 and higher and can then be installed using pip, including dependencies needed to run the examples:

$ pip install -U uhlive[examples]

Connect to the server

You need an access_token to connect to the API. This token has to be requested on our authentication server, using the client_id and client_secret credentials that were provided to you by your account manager.

Our Python SDK takes care of requesting this token for you. You'll just have to provide the ID and secret.

Once the access_token has been retrieved, a WebSocket connection will be established, the server URL being wss://api.uh.live/bots.

You can read more about the whole authentication flow in the protocol documentation.

We recommend you pass credentials in this Getting Started script as environment variables. A simple way to make credentials available at script run time is to run it as follows:

UHLIVE_API_CLIENT="your-client-id" UHLIVE_API_SECRET="your-client-secret" python myapp.py

Our SDK is designed so that you are free to use the websocket library you want as transport, and to architecture your code the way you like.

Here is an example of connecting to the API using either websocket-client (sync) or aiohttp (async):

websocket
aiohttp

import os
import requests
import websocket as ws
from uhlive.auth import build_authentication_request
from uhlive.stream.recognition import build_connection_request

uhlive_client = os.environ["UHLIVE_API_CLIENT"]
uhlive_secret = os.environ["UHLIVE_API_SECRET"]

# create transport
auth_url, auth_params = build_authentication_request(uhlive_client, uhlive_secret)
login = requests.post(auth_url, data=auth_params)
login.raise_for_status()
uhlive_token = login.json()["access_token"]

url, headers = build_connection_request(uhlive_token)
socket = ws.create_connection(url, header=headers)
print("Connected")

# Properly close WS connection
socket.close()

import asyncio
import os
from aiohttp import ClientSession
from uhlive.auth import build_authentication_request
from uhlive.stream.recognition import build_connection_request

async def main(uhlive_client: str, uhlive_secret: str):
    # create transport
    async with ClientSession() as session:
        auth_url, auth_params = build_authentication_request(
            uhlive_client, uhlive_secret
        )
        async with session.post(auth_url, data=auth_params) as login:
            login.raise_for_status()
            body = await login.json()
            uhlive_token = body["access_token"]

        url, headers = build_connection_request(uhlive_token)
        async with session.ws_connect(url, headers=headers) as socket:
            print("Connected")

uhlive_client = os.environ["UHLIVE_API_CLIENT"]
uhlive_secret = os.environ["UHLIVE_API_SECRET"]
asyncio.run(main(uhlive_client, uhlive_secret))

When running this code, you should see a log line.

Open a session and start streaming

This API is meant to develop rich voice interactive applications like chatbots, Interactive Voice Response (IVR)… So you need to be able to source some realtime audio from the user, be it from a local soundcard interface (microphone) or a network stream. You also need to be able to interact back with the user, either by playing audio (recorded prompts or Text-To-Speech) or by displaying some text.

To keep code snippets short and illustrative, we won't deal with the details of audio acquisition or TTS, instead we'll be just display the prompt on the console, and get the voice from the microphone using sounddevice.

Update the previous code as follows:

websocket
aiohttp

# Update imports with:
import sounddevice as sd
from uhlive.stream.recognition import Recognizer, Opened

# Append the following to the code above:
def stream_mic(socket, client):
    def callback(indata, frame_count, time_info, status):
        socket.send_binary(bytes(indata))

    stream = sd.RawInputStream(
        callback=callback, channels=1, samplerate=8000, dtype="int16", blocksize=960
    )
    stream.start()
    return stream

# instantiate service
client = Recognizer()
# Open a session
# Commands are sent as text frames
socket.send(client.open())
# Check if successful
event = client.receive(socket.recv())
assert isinstance(event, Opened), f"Expected Opened, got {event}"
# start streaming the user's voice
voice = stream_mic(socket, client)
print("Streaming mic")

# Update imports with:
import asyncio
import sounddevice as sd
from uhlive.stream.recognition import Recognizer, Opened

# Before the main function, add these functions:
async def inputstream_generator(channels=1, samplerate=8000, dtype="int16", **kwargs):
    """Generator that yields blocks of input data as NumPy arrays."""
    q_in = asyncio.Queue()
    loop = asyncio.get_event_loop()

    def callback(indata, frame_count, time_info, status):
        loop.call_soon_threadsafe(q_in.put_nowait, bytes(indata))

    stream = sd.RawInputStream(
        callback=callback,
        channels=channels,
        samplerate=samplerate,
        dtype=dtype,
        **kwargs,
    )
    with stream:
        while True:
            indata = await q_in.get()
            yield indata


async def play_prompt(text):
    print(text)
    # let time to read it
    await asyncio.sleep(len(text.split()) * 0.1)


async def stream(socket, client):
    try:
        async for block in inputstream_generator(blocksize=960):
            await socket.send_bytes(client.send_audio_chunk(block))
    except asyncio.CancelledError:
        pass


# Within the main function...
async def main(uhlive_client: str, uhlive_secret: str):
    async with ClientSession() as session:
        async with session.ws_connect(url, headers=headers) as socket:

            # ...append the following to the code above:
            # instantiate service
            client = Recognizer()
            # Open a session
            # Commands are sent as text frames
            await socket.send_str(client.open())
            # Check if successful
            msg = await socket.receive()
            event = client.receive(msg.data)
            assert isinstance(event, Opened), f"Expected Opened, got {event}"
            # start streaming the user's voice
            voice = asyncio.create_task(stream(socket, client))

Voice is streamed, but nothing happens yet.

Define some default values and grammar aliases

In order to avoid repeating the same parameters on every recognize request, you can define them once for the entire session. See the input headers for a list of the possible options and their meanings.

Also, we're talking to a bot, and the interaction happens within a scenario, therefore we'll use grammars, which are a way to give context to the expected speech recognition. For example we know we want to recognize an address, or a serial number spelling.

For our example, we're going to define a digits spelling grammar: the one that will allow us to check what the user's guess is between 0 and 99.

websocket
aiohttp

# Update imports with:
from uhlive.stream.recognition import ParamsSet, GrammarDefined

# Append the following to the code above:
voice.start()
socket.send(
    client.set_params(
        speech_language="en",  # or "fr"
        no_input_timeout=5000,
        speech_complete_timeout=1000,
        speech_incomplete_timeout=2000,
        speech_nomatch_timeout=3000,
        recognition_timeout=30000,
        logging_tag="any_tag_of_my_own_to_track_in_the_dev_console"
    )
)
# Check if successful
event = client.receive(socket.recv())
assert isinstance(event, ParamsSet), f"Expected ParamsSet, got {event}"
socket.send(
    client.define_grammar(
        "speech/spelling/digits?regex=[0-9]{1,2}", "num_in_range100"
    )
)
# Check if successful
event = client.receive(socket.recv())
assert isinstance(event, GrammarDefined), f"Expected ParamsSet, got {event}"
print("Parameters set")

# Update imports with:
from uhlive.stream.recognition import ParamsSet, GrammarDefined

# Within the main function...
async def main(uhlive_client: str, uhlive_secret: str):
    async with ClientSession() as session:
        async with session.ws_connect(url, headers=headers) as socket:

            # ...append the following to the code above:
            await socket.send_str(
                client.set_params(
                    speech_language="en",  # or "fr"
                    no_input_timeout=5000,
                    speech_complete_timeout=1000,
                    speech_incomplete_timeout=2000,
                    speech_nomatch_timeout=3000,
                    recognition_timeout=30000,
                )
            )
            # Check if successful
            msg = await socket.receive()
            event = client.receive(msg.data)
            assert isinstance(event, ParamsSet), f"Expected ParamsSet, got {event}"
            await socket.send_str(
                client.define_grammar(
                    "speech/spelling/digits?regex=[0-9]{1,2}", "num_in_range100"
                )
            )
            # Check if successful
            msg = await socket.receive()
            event = client.receive(msg.data)
            assert isinstance(
                event, GrammarDefined
            ), f"Expected GrammarDefined, got {event}"
            print("Parameters set")

Want to try in another language? You can set parameter speech_language to English with en or French with fr.

Define some convenience functions

To spare some typing, we're going to define some convenience functions (closures to be exact), now that you've learned the details.

Append these lines at the end of the previous code block:

websocket
aiohttp

send = socket.send

def expect(*event_classes):
    event = client.receive(socket.recv())
    assert isinstance(event, event_classes), f"Expected {event_classes} got {event}"
    return event

# Update imports with:

# Within the main function...
async def main(uhlive_client: str, uhlive_secret: str):
    async with ClientSession() as session:
        async with session.ws_connect(url, headers=headers) as socket:

            # ...append the following to the code above:
            send = socket.send_str

            async def expect(*event_classes):
                msg = await socket.receive()
                event = client.receive(msg.data)
                assert isinstance(
                    event, event_classes
                ), f"expected {event_classes} got {event}"
                return event

Write your first interactions

Let's implement the guess game! As a scenario, the vocal bot will randomly choose a number between 0 and 99 (both end included) and ask the user to guess it within five tries.

At each guess, the user may win, otherwise the bot will give a hint, telling whether the guess is above or under the secret number.

websocket
aiohttp

# Update imports with:
import time
from random import randint
from uhlive.stream.recognition import (
    RecognitionInProgress,
    RecognitionComplete,
    StartOfInput,
    CompletionCause
)

# Append the following to the code above:
def play_prompt(text):
    print(text)
    # let time to read it
    time.sleep(len(text.split()) * 0.1)

play_again = True
while play_again:
    secret = randint(0, 99)
    play_prompt(
        "I chose a number between 0 and 99. Try to guess it in less than five turns."
    )
    for i in range(1, 6):
        play_prompt(f"Turn {i}: what's your guess?")
        send(client.recognize("session:num_in_range100"))
        expect(RecognitionInProgress)
        response = expect(RecognitionComplete, StartOfInput)
        if isinstance(response, StartOfInput):
            response = expect(RecognitionComplete)
        if response.completion_cause == CompletionCause.NoInputTimeout:
            play_prompt("You should answer faster, you loose your turn!")
            continue
        if response.completion_cause != CompletionCause.Success:
            play_prompt("That's not a number between 0 and 99. You lose your turn.")
            continue
        # It's safe to access the NLU value now
        guess = int(response.body.nlu.value)
        if guess == secret:
            play_prompt(f"{guess} is correct! You win! Congratulations!")
            break
        elif guess > secret:
            play_prompt(f"Your guess, {guess}, is too high")
        else:
            play_prompt(f"Your guess, {guess}, is too low")
    else:
        play_prompt(f"You lose! My secret number was {secret}.")
    while True:
        play_prompt("Do you want to play again?")
        send(client.recognize("builtin:speech/boolean", recognition_mode="hotword"))
        expect(RecognitionInProgress)
        # No StartOfInput in hotword mode
        response = expect(RecognitionComplete)
        if response.completion_cause != CompletionCause.Success:
            play_prompt("Please, clearly answer the question.")
            continue
        play_again = response.body.nlu.value
        break
voice.stop()
send(client.close())
socket.close()

# Update imports with:
from random import randint
from uhlive.stream.recognition import (
    RecognitionInProgress,
    RecognitionComplete,
    StartOfInput,
    CompletionCause
)

# Within the main function...
async def main(uhlive_client: str, uhlive_secret: str):
    async with ClientSession() as session:
        async with session.ws_connect(url, headers=headers) as socket:

            # ...append the following to the code above:
            play_again = True
            while play_again:
                secret = randint(0, 99)
                await play_prompt(
                    "I chose a number between 0 and 99. Try to guess it in less than five turns."
                )
                for i in range(1, 6):
                    await play_prompt(f"Turn {i}: what's your guess?")
                    await send(client.recognize("session:num_in_range100"))
                    await expect(RecognitionInProgress)
                    response = await expect(RecognitionComplete, StartOfInput)
                    if isinstance(response, StartOfInput):
                        response = await expect(RecognitionComplete)
                    if response.completion_cause == CompletionCause.NoInputTimeout:
                        await play_prompt(
                            "You should answer faster, you loose your turn!"
                        )
                        continue
                    if response.completion_cause != CompletionCause.Success:
                        print(response)
                        got = response.body.asr.transcript or response.completion_cause
                        await play_prompt(
                            f"{got} is not a number between 0 and 99. You lose your turn."
                        )
                        continue
                    # It's safe to access the NLU value now
                    guess = int(response.body.nlu.value)
                    if guess == secret:
                        await play_prompt(f"{guess} is correct! You win! Congratulations!")
                        break
                    elif guess > secret:
                        await play_prompt(f"Your guess, {guess}, is too high")
                    else:
                        await play_prompt(f"Your guess, {guess} is too low")
                else:
                    await play_prompt(f"You lose! My secret number was {secret}.")
                while True:
                    await play_prompt("Do you want to play again?")
                    await send(
                        client.recognize(
                            "builtin:speech/boolean", recognition_mode="hotword"
                        )
                    )
                    await expect(RecognitionInProgress)
                    # No StartOfInput in hotword mode
                    response = await expect(RecognitionComplete)
                    if response.completion_cause != CompletionCause.Success:
                        await play_prompt("Please, clearly answer the question.")
                        continue
                    play_again = response.body.nlu.value
                    break
            voice.cancel()
            await voice
            await send(client.close())

Note how we're using the spelling digit grammar to guess the number, while we're using a boolean one to know if the player wants to try again or not.

If you need to use multiple grammars, just pass them as additional positional arguments to client.recognize().

Example by the code

A full version of this example is available on our GitHub repository: