Распознавание речи

Ниже представлен пример потокового распознавания речи из аудиофайла с использованием SpeechKit Hybrid

1. Получите в Cloupard:

- логин и пароль для доступа на стенд,

- ссылку на метод распознавания аудио.

2. . На основе логина и пароля сгенеририруйте токен доступа
(пример https://www.debugbear.com/basic-auth-header-generator).

3. Склонируйте репозиторий SpeechKit Hybrid:

	git clone https://github.com/yandex-cloud/cloudapi

4. Установите пакет grpcio-tools с помощью менеджера пакетов pip:

	pip install grpcio-tools

5. Перейдите в папку со склонированным репозиторием SpeechKit Hybrid, создайте папку output и сгенерируйте в ней код интерфейса клиента:



cd <путь_к_папке_cloudapi>

mkdir output

python3 -m grpc_tools.protoc -I . -I third_party/googleapis \

--python_out=output \

--grpc_python_out=output \

google/api/http.proto \

google/api/annotations.proto \

yandex/cloud/api/operation.proto \

google/rpc/status.proto \

yandex/cloud/operation/operation.proto \

yandex/cloud/validation.proto \

yandex/cloud/ai/stt/v3/stt_service.proto \

yandex/cloud/ai/stt/v3/stt.proto

В результате в папке output будут созданы файлы с интерфейсом клиента: stt_pb2.py, stt_pb2_grpc.py, stt_service_pb2.py, stt_service_pb2_grpc.py и файлы зависимостей.

6. Создайте файл в корне папки output, например, скрипт speechkit_hybrid_stt.ру



#pip install grpcio grpcio-tools для установки grpc

#pip install yandex-speechkit

#https://yandex.cloud/en/docs/speechkit/stt/api/streaming-examples-v3 установка описана здесь

import argparse

import grpc

import json

import yandex.cloud.ai.stt.v3.stt_pb2 as stt_pb2

import yandex.cloud.ai.stt.v3.stt_service_pb2_grpc as stt_service_pb2_grpc

CHUNK_SIZE = 4000

AUDIO_FILE_NAME = r'{{AUDIO_FILE_NAME}}'

RESULT_FILE_NAME = r'{{RESULT_FILE_NAME}}'

TOKEN = '{{TOKEN}}'

TARGET = '{{TARGET}}'

LANG_CODE = '{{LANG_CODE}}'

AUDIO_TYPE = '{{AUDIO_TYPE}}'

def gen(audio_file_name):

recognize_options = stt_pb2.StreamingOptions(

speaker_labeling=stt_pb2.SpeakerLabelingOptions(

speaker_labeling=stt_pb2.SpeakerLabelingOptions.SPEAKER_LABELING_ENABLED

),

recognition_model=stt_pb2.RecognitionModelOptions(

model="general",

audio_format=stt_pb2.AudioFormatOptions(

container_audio=stt_pb2.ContainerAudio(

container_audio_type=stt_pb2.ContainerAudio.OGG_OPUS if AUDIO_TYPE == 'OGG_OPUS' else stt_pb2.ContainerAudio.WAV if AUDIO_TYPE == 'WAV' else stt_pb2.ContainerAudio.MP3

)

),

# Specify automatic language detection.

language_restriction=stt_pb2.LanguageRestrictionOptions(

restriction_type=stt_pb2.LanguageRestrictionOptions.WHITELIST,

language_code=[LANG_CODE]

),

audio_processing_type=stt_pb2.RecognitionModelOptions.FULL_DATA

)

yield stt_pb2.StreamingRequest(session_options=recognize_options)

with open(audio_file_name, 'rb') as f:

data = f.read(CHUNK_SIZE)

while data != b'':

yield stt_pb2.StreamingRequest(chunk=stt_pb2.AudioChunk(data=data))

data = f.read(CHUNK_SIZE)

class Alternative:

def __init__(self, text, channelTag, startTimeMs, endTimeMs):

self.text = text

self.channelTag = channelTag

self.startTimeMs = startTimeMs

self.endTimeMs = endTimeMs

class AlternativeEncoder(json.JSONEncoder):

def default(self, obj):

if isinstance(obj, Alternative):

return obj.__dict__

return json.JSONEncoder.default(self, obj)

channel_creds = grpc.ssl_channel_credentials()

channel = grpc.secure_channel(TARGET, channel_creds)

stub = stt_service_pb2_grpc.RecognizerStub(channel)

it = stub.RecognizeStreaming(gen(AUDIO_FILE_NAME), metadata=[('authorization', f'Basic {TOKEN}')], timeout=10000000)

partialRes = []

finalRes = []

finalRefinementRes = []

errorText = None

try:

for r in it:

channel_tag, event_type, alternatives = 0, r.WhichOneof('Event'), None

if event_type == 'partial' and len(r.partial.alternatives) > 0:

for a in r.partial.alternatives:

partialRes.append(Alternative(a.text, r.channel_tag, a.start_time_ms, a.end_time_ms))

if event_type == 'final' and len(r.final.alternatives) > 0:

for a in r.final.alternatives:

finalRes.append(Alternative(a.text, r.channel_tag, a.start_time_ms, a.end_time_ms))

if event_type == 'final_refinement' and len(r.final_refinement.normalized_text.alternatives) > 0:

for a in r.final_refinement.normalized_text.alternatives:

finalRefinementRes.append(Alternative(a.text, r.channel_tag, a.start_time_ms, a.end_time_ms))

except grpc._channel._Rendezvous as err:

errorText = f'Error code {err._state.code}, message: {err._state.details}'

if errorText is not None and errorText != "":

print(f"ERROR: {errorText}")

else:

if len(finalRes) > 0:

with open(RESULT_FILE_NAME, 'w', encoding='utf-8') as f:

json.dump(finalRes, f, ensure_ascii=False, indent=4, cls=AlternativeEncoder)

elif len(finalRefinementRes) > 0:

with open(RESULT_FILE_NAME, 'w', encoding='utf-8') as f:

json.dump(finalRefinementRes, f, ensure_ascii=False, indent=4, cls=AlternativeEncoder)

elif len(partialRes) > 0:

with open(RESULT_FILE_NAME, 'w', encoding='utf-8') as f:

json.dump(partialRes, f, ensure_ascii=False, indent=4, cls=AlternativeEncoder)

print(f"RESULT: OK")

В этом скрипте:

· параметр {{TOKEN}} - его нужно сгенерировать на основе полученного логина и пароля

· параметр {{TARGET}} - имя метода TTS

· параметр {{AUDIO_FILE_NAME}} - полное имя аудиофайла для распознавания

· параметр {{RESULT_FILE_NAME}} - полное имя результирующего файла (в него будет помещен результат распознавания в формате JSON - массив класса Alternative, описан в самом скрипте)

· параметр {{LANG_CODE}} - код выбранного языка распознавания (например ru-RU)

· параметр {{AUDIO_TYPE}} - выбранный тип аудио файла (WAV, OGG_OPUS или MP3)

7. Выполните созданный файл:

python3 output/speechkit_hybrid_stt.ру

В случае успеха результат распознавания будет помещен в файл с именем {{RESULT_FILE_NAME}} в формате JSON и в консоли будет "RESULT: OK". В случае ошибки - "ERROR: <текст ошибки>"