Getting started with the real-time WebSocket Stream H2B API
What are you building here
This Getting Started guides you to build a very simple app. Speaking in your microphone, you'll try to guess a number between 0 and 99 in less than 5 tries.
You can download the Python script we're going to build on our repository, either sync or async version. You can also find other examples on the Git repository.
Get credentials
Please contact our support to ask for credentials.
Install the python SDK
Get the Python SDK
The SDK is available on pip. It is built for Python 3.7 and higher and can then be installed using pip, including dependencies needed to run the examples:
$ pip install -U uhlive[examples]
Connect to the server
You need an access_token
to connect to the API. This token has to be requested on our authentication server, using the client_id
and client_secret
credentials that were provided to you by your account manager.
Our Python SDK takes care of requesting this token for you. You'll just have to provide the ID and secret.
Once the access_token
has been retrieved, a WebSocket connection will be established, the server URL being wss://api.uh.live/bots
.
You can read more about the whole authentication flow in the protocol documentation.
We recommend you pass credentials in this Getting Started script as environment variables. A simple way to make credentials available at script run time is to run it as follows:
UHLIVE_API_CLIENT="your-client-id" UHLIVE_API_SECRET="your-client-secret" python myapp.py
Our SDK is designed so that you are free to use the websocket library you want as transport, and to architecture your code the way you like.
Here is an example of connecting to the API using either websocket-client
(sync) or aiohttp
(async):
import os
import requests
import websocket as ws
from uhlive.auth import build_authentication_request
from uhlive.stream.recognition import build_connection_request
uhlive_client = os.environ["UHLIVE_API_CLIENT"]
uhlive_secret = os.environ["UHLIVE_API_SECRET"]
# create transport
auth_url, auth_params = build_authentication_request(uhlive_client, uhlive_secret)
login = requests.post(auth_url, data=auth_params)
login.raise_for_status()
uhlive_token = login.json()["access_token"]
url, headers = build_connection_request(uhlive_token)
socket = ws.create_connection(url, header=headers)
print("Connected")
# Properly close WS connection
socket.close()
import asyncio
import os
from aiohttp import ClientSession
from uhlive.auth import build_authentication_request
from uhlive.stream.recognition import build_connection_request
async def main(uhlive_client: str, uhlive_secret: str):
# create transport
async with ClientSession() as session:
auth_url, auth_params = build_authentication_request(
uhlive_client, uhlive_secret
)
async with session.post(auth_url, data=auth_params) as login:
login.raise_for_status()
body = await login.json()
uhlive_token = body["access_token"]
url, headers = build_connection_request(uhlive_token)
async with session.ws_connect(url, headers=headers) as socket:
print("Connected")
uhlive_client = os.environ["UHLIVE_API_CLIENT"]
uhlive_secret = os.environ["UHLIVE_API_SECRET"]
asyncio.run(main(uhlive_client, uhlive_secret))
When running this code, you should see a log line.
Open a session and start streaming
This API is meant to develop rich voice interactive applications like chatbots, Interactive Voice Response (IVR)… So you need to be able to source some realtime audio from the user, be it from a local soundcard interface (microphone) or a network stream. You also need to be able to interact back with the user, either by playing audio (recorded prompts or Text-To-Speech) or by displaying some text.
To keep code snippets short and illustrative, we won't deal with the details of audio acquisition or TTS, instead we'll be just display the prompt on the console, and get the voice from the microphone using sounddevice
.
Update the previous code as follows:
# Update imports with:
import sounddevice as sd
from uhlive.stream.recognition import Recognizer, Opened
# Append the following to the code above:
def stream_mic(socket, client):
def callback(indata, frame_count, time_info, status):
socket.send_binary(bytes(indata))
stream = sd.RawInputStream(
callback=callback, channels=1, samplerate=8000, dtype="int16", blocksize=960
)
stream.start()
return stream
# instantiate service
client = Recognizer()
# Open a session
# Commands are sent as text frames
socket.send(client.open())
# Check if successful
event = client.receive(socket.recv())
assert isinstance(event, Opened), f"Expected Opened, got {event}"
# start streaming the user's voice
voice = stream_mic(socket, client)
print("Streaming mic")
# Update imports with:
import asyncio
import sounddevice as sd
from uhlive.stream.recognition import Recognizer, Opened
# Before the main function, add these functions:
async def inputstream_generator(channels=1, samplerate=8000, dtype="int16", **kwargs):
"""Generator that yields blocks of input data as NumPy arrays."""
q_in = asyncio.Queue()
loop = asyncio.get_event_loop()
def callback(indata, frame_count, time_info, status):
loop.call_soon_threadsafe(q_in.put_nowait, bytes(indata))
stream = sd.RawInputStream(
callback=callback,
channels=channels,
samplerate=samplerate,
dtype=dtype,
**kwargs,
)
with stream:
while True:
indata = await q_in.get()
yield indata
async def play_prompt(text):
print(text)
# let time to read it
await asyncio.sleep(len(text.split()) * 0.1)
async def stream(socket, client):
try:
async for block in inputstream_generator(blocksize=960):
await socket.send_bytes(client.send_audio_chunk(block))
except asyncio.CancelledError:
pass
# Within the main function...
async def main(uhlive_client: str, uhlive_secret: str):
async with ClientSession() as session:
async with session.ws_connect(url, headers=headers) as socket:
# ...append the following to the code above:
# instantiate service
client = Recognizer()
# Open a session
# Commands are sent as text frames
await socket.send_str(client.open())
# Check if successful
msg = await socket.receive()
event = client.receive(msg.data)
assert isinstance(event, Opened), f"Expected Opened, got {event}"
# start streaming the user's voice
voice = asyncio.create_task(stream(socket, client))
Voice is streamed, but nothing happens yet.
Define some default values and grammar aliases
In order to avoid repeating the same parameters on every recognize request, you can define them once for the entire session. See the input headers for a list of the possible options and their meanings.
Also, we're talking to a bot, and the interaction happens within a scenario, therefore we'll use grammar, which are a way to give context to the expected speech recognition. For example we know we want to recognize an address, or a serial number spelling.
For our example, we're going to define a digits spelling grammar: the one that will allow us to check what the user's guess is between 0 and 99.
# Update imports with:
from uhlive.stream.recognition import ParamsSet, GrammarDefined
# Append the following to the code above:
voice.start()
socket.send(
client.set_params(
speech_language="en", # or "fr"
no_input_timeout=5000,
speech_complete_timeout=1000,
speech_incomplete_timeout=2000,
speech_nomatch_timeout=3000,
recognition_timeout=30000,
logging_tag="any_tag_of_my_own_to_track_in_the_dev_console"
)
)
# Check if successful
event = client.receive(socket.recv())
assert isinstance(event, ParamsSet), f"Expected ParamsSet, got {event}"
socket.send(
client.define_grammar(
"speech/spelling/digits?regex=[0-9]{1,2}", "num_in_range100"
)
)
# Check if successful
event = client.receive(socket.recv())
assert isinstance(event, GrammarDefined), f"Expected ParamsSet, got {event}"
print("Parameters set")
# Update imports with:
from uhlive.stream.recognition import ParamsSet, GrammarDefined
# Within the main function...
async def main(uhlive_client: str, uhlive_secret: str):
async with ClientSession() as session:
async with session.ws_connect(url, headers=headers) as socket:
# ...append the following to the code above:
await socket.send_str(
client.set_params(
speech_language="en", # or "fr"
no_input_timeout=5000,
speech_complete_timeout=1000,
speech_incomplete_timeout=2000,
speech_nomatch_timeout=3000,
recognition_timeout=30000,
)
)
# Check if successful
msg = await socket.receive()
event = client.receive(msg.data)
assert isinstance(event, ParamsSet), f"Expected ParamsSet, got {event}"
await socket.send_str(
client.define_grammar(
"speech/spelling/digits?regex=[0-9]{1,2}", "num_in_range100"
)
)
# Check if successful
msg = await socket.receive()
event = client.receive(msg.data)
assert isinstance(
event, GrammarDefined
), f"Expected GrammarDefined, got {event}"
print("Parameters set")
Want to try in another language? You can set parameter speech_language
to English with en
or French with fr
.
Define some convenience functions
To spare some typing, we're going to define some convenience functions (closures to be exact), now that you've learned the details.
Append these lines at the end of the previous code block:
send = socket.send
def expect(*event_classes):
event = client.receive(socket.recv())
assert isinstance(event, event_classes), f"Expected {event_classes} got {event}"
return event
# Update imports with:
# Within the main function...
async def main(uhlive_client: str, uhlive_secret: str):
async with ClientSession() as session:
async with session.ws_connect(url, headers=headers) as socket:
# ...append the following to the code above:
send = socket.send_str
async def expect(*event_classes):
msg = await socket.receive()
event = client.receive(msg.data)
assert isinstance(
event, event_classes
), f"expected {event_classes} got {event}"
return event
Write your first interactions
Let's implement the guess game! As a scenario, the vocal bot will randomly choose a number between 0 and 99 (both end included) and ask the user to guess it within five tries.
At each guess, the user may win, otherwise the bot will give a hint, telling whether the guess is above or under the secret number.
# Update imports with:
import time
from random import randint
from uhlive.stream.recognition import (
RecognitionInProgress,
RecognitionComplete,
StartOfInput,
CompletionCause
)
# Append the following to the code above:
def play_prompt(text):
print(text)
# let time to read it
time.sleep(len(text.split()) * 0.1)
play_again = True
while play_again:
secret = randint(0, 99)
play_prompt(
"I chose a number between 0 and 99. Try to guess it in less than five turns."
)
for i in range(1, 6):
play_prompt(f"Turn {i}: what's your guess?")
send(client.recognize("session:num_in_range100"))
expect(RecognitionInProgress)
response = expect(RecognitionComplete, StartOfInput)
if isinstance(response, StartOfInput):
response = expect(RecognitionComplete)
if response.completion_cause == CompletionCause.NoInputTimeout:
play_prompt("You should answer faster, you loose your turn!")
continue
if response.completion_cause != CompletionCause.Success:
play_prompt("That's not a number between 0 and 99. You lose your turn.")
continue
# It's safe to access the NLU value now
guess = int(response.body.nlu.value)
if guess == secret:
play_prompt(f"{guess} is correct! You win! Congratulations!")
break
elif guess > secret:
play_prompt(f"Your guess, {guess}, is too high")
else:
play_prompt(f"Your guess, {guess}, is too low")
else:
play_prompt(f"You lose! My secret number was {secret}.")
while True:
play_prompt("Do you want to play again?")
send(client.recognize("builtin:speech/boolean", recognition_mode="hotword"))
expect(RecognitionInProgress)
# No StartOfInput in hotword mode
response = expect(RecognitionComplete)
if response.completion_cause != CompletionCause.Success:
play_prompt("Please, clearly answer the question.")
continue
play_again = response.body.nlu.value
break
voice.stop()
send(client.close())
socket.close()
# Update imports with:
from random import randint
from uhlive.stream.recognition import (
RecognitionInProgress,
RecognitionComplete,
StartOfInput,
CompletionCause
)
# Within the main function...
async def main(uhlive_client: str, uhlive_secret: str):
async with ClientSession() as session:
async with session.ws_connect(url, headers=headers) as socket:
# ...append the following to the code above:
play_again = True
while play_again:
secret = randint(0, 99)
await play_prompt(
"I chose a number between 0 and 99. Try to guess it in less than five turns."
)
for i in range(1, 6):
await play_prompt(f"Turn {i}: what's your guess?")
await send(client.recognize("session:num_in_range100"))
await expect(RecognitionInProgress)
response = await expect(RecognitionComplete, StartOfInput)
if isinstance(response, StartOfInput):
response = await expect(RecognitionComplete)
if response.completion_cause == CompletionCause.NoInputTimeout:
await play_prompt(
"You should answer faster, you loose your turn!"
)
continue
if response.completion_cause != CompletionCause.Success:
print(response)
got = response.body.asr.transcript or response.completion_cause
await play_prompt(
f"{got} is not a number between 0 and 99. You lose your turn."
)
continue
# It's safe to access the NLU value now
guess = int(response.body.nlu.value)
if guess == secret:
await play_prompt(f"{guess} is correct! You win! Congratulations!")
break
elif guess > secret:
await play_prompt(f"Your guess, {guess}, is too high")
else:
await play_prompt(f"Your guess, {guess} is too low")
else:
await play_prompt(f"You lose! My secret number was {secret}.")
while True:
await play_prompt("Do you want to play again?")
await send(
client.recognize(
"builtin:speech/boolean", recognition_mode="hotword"
)
)
await expect(RecognitionInProgress)
# No StartOfInput in hotword mode
response = await expect(RecognitionComplete)
if response.completion_cause != CompletionCause.Success:
await play_prompt("Please, clearly answer the question.")
continue
play_again = response.body.nlu.value
break
voice.cancel()
await voice
await send(client.close())
Note how we're using the spelling digit grammar to guess the number, while we're using a boolean one to know if the player wants to try again or not.
Example by the code
A full version of this example is available on our GitHub repository:
Further reading
More examples are available to learn how to use the SDK.
You will need a Google Text to Speech token to run the desktop bot examples.