Manipulate transcript

What are you building here

In this example, we are going to manipulate a bit the transcript before saving it to a file.

Please refer to previous example regarding authentication, pagination, and how to run such scripts.

Let's say that for some business purpose, we want to create a CSV file, one call per row, and display the unique identifier of the Call, a link to Scribr, our speech analytics tool, and the first 50 words a customer said during the Call.

Code

In this example, we're going to stop consuming the API after 3 pages. For more information about the resources of the API, please check the reference.

#!/bin/env python3
import csv
import os
import requests

payload = {
"client_id": os.getenv("CLIENT_ID"),
"client_secret": os.getenv("CLIENT_SECRET"),
"grant_type" : "client_credentials"
}
r = requests.post("https://id.uh.live/realms/uhlive/protocol/openid-connect/token", data=payload)
r.raise_for_status()
access_token = r.json()['access_token']

headers = {"Authorization": f"Bearer {access_token}"}
LIMIT = 20
offset = 0


# First get the 3 pages we want and store them in a variable
call_list = []
while offset < 3 * LIMIT:
r = requests.get(f"https://activate.uh.live/calls?limit={LIMIT}&offset={offset}", headers=headers)
r.raise_for_status()
data = r.json()
call_list += data["data"]
offset += LIMIT

output = []
for call in call_list:
# for each call, transcript is within key `transcript_json`. Aside from metadata, `callData` is the key
# you're looking for the transcript itself.
transcript = call["transcript_json"].get("callData", [])
words = []
for segment in transcript:
# A transcript is made of segments: utterance of a given speaker.
# We're just going to keep only the customer's, named `in`.
if segment["from"] == "in":
words += [word["value"] for word in segment["words"]]

# And now we can prepare the data we'll save in CSV:
# - call unique identifier
# - a link to our website
# - 50 first words of the customer
output.append({
'unique_id': call['unique_id'],
'url': f"https://app.uh.live/scribr/{call['unique_id']}",
'50words': " ".join(words[:50])
})

with open("client_first_words.csv", "w") as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['unique_id', 'url', '50words'])
writer.writeheader()
writer.writerows(output)