Table of content

Stream API, for human to human interactions

Welcome to our automatic speech recognition API guide. This API empower you to stream audio in real-time and get in return enriched transcription.

You will find documentation in order to use our API and build your awesome products with it.

If you know what you're looking for, you might be interested in our API Reference.

Getting started

What you'll be learning here: How to send an audio stream to our API and receive a text transcription, in real-time.

Overview

This guide describes how to consume our websocket API by sending it some audio and receiving a transcription. It also presents you the basic concepts to have in mind when using this API. Once you cover this documentation page, you will know how to:

authenticate to our servers
send us some audio
deal with the transcription you'll get as a response

Examples are based on our SDKs, but you can dig into our API reference to implement the same logic in the language of your choice.

Authentication

See the dedicated section to know how to get credentials and how to use them.

Install an SDK (or not)

For your convenience we provide SDKs to ease the use of our API. Of course you can dig into the API reference if you want to use your own implementation, or if your language of choice is not yet available.

Limitation

There is a bandwidth limit of 20kB / second / WebSocket connection. It is therefore highly suggested to use one WebSocket connection for each audio stream you'll send. Both the Python and JavaScript SDK handle this transparently.

This limitation allow us to provide the best quality of service as possible to each and every user, while permitting to stream audio in real time, in the spirit of this API.