파이썬에 대한 Watson을 사용하여 텍스트에 연속 실시간 음성

어떻게 작동하는지와 비슷한 Watson 서버의 마이크를 사용하여 실시간으로 텍스트를 가져올 수있는 작은 파이썬 프로그램을 만들려고합니다. here. 파이썬에 대한 Watson을 사용하여 텍스트에 연속 실시간 음성

이

내가 생각 해낸 한 코드입니다하지만 녹음을 마친 후이 텍스트를 가져옵니다

내가 나서서 WebSocket을을 사용하여 왓슨 서버에 연결하는 데 처음부터 프로그램을 만들어

import pyaudio 
import json 
from watson_developer_cloud import SpeechToTextV1 

CHUNK = 1024 
FORMAT = pyaudio.paInt16 
CHANNELS = 2 
RATE = 44100 
RECORD_SECONDS = 10 

p = pyaudio.PyAudio() 

stream = p.open(format=FORMAT, 
       channels=CHANNELS, 
       rate=RATE, 
       input=True, 
       frames_per_buffer=CHUNK) 
print("* recording") 

frames = [] 

for i in range(0, int(RATE/CHUNK * RECORD_SECONDS)): 
    data = stream.read(CHUNK) 
    frames.append(data) 

print("* done recording") 

stream.stop_stream() 
stream.close() 
p.terminate() 

data_feed = b''.join(frames) 

speech_to_text = SpeechToTextV1(
    username='secret', 
    password='secret too', 
    x_watson_learning_opt_out=False 
) 

result = speech_to_text.recognize(data_feed, 
            content_type="audio/l16;rate=44100;channels=2", 
            word_confidence=True, 
            max_alternatives=4, 
            word_alternatives_threshold=0.5, 
            model="en-US_BroadbandModel", 
            continuous=True) 

j = json.dumps(result, indent=2) 
print(j)

출처

2017-10-27 DBeck

해결하려는 특정 문제가 있습니까? 코드를 단계별로 시도해 보셨습니까? – alex

안녕하세요 @alex, 현재 Python 용 SDK는 마이크를 통해 직접 오디오를받는 대신 오디오 파일을 사용하는 것으로 제한됩니다. 저는 현재 마이크를 사용하면서 실시간 텍스트를 얻을 수있는 프로젝트를 진행 중입니다. – DBeck

나는 이것을하기 위해 웹 소켓을 사용하고 있으며 나는 내일까지 어떤 것을 가지고있을 것이라고 생각한다. – DBeck

. 그것은 여전히 내가 기대하는 바를 정확히 수행하지는 못하지만 아주 가깝습니다.

실시간으로 오디오가 서버로 전송되지만 녹음이 끝나면 녹음을 듣고 있습니다.

import asyncio 
import websockets 
import json 
import requests 
import pyaudio 
import time 

# Variables to use for recording audio 
CHUNK = 1024 
FORMAT = pyaudio.paInt16 
CHANNELS = 2 
RATE = 16000 

p = pyaudio.PyAudio() 

# This is the language model to use to transcribe the audio 
model = "en-US_BroadbandModel" 

# These are the urls we will be using to communicate with Watson 
default_url = "https://stream.watsonplatform.net/speech-to-text/api" 
token_url = "https://stream.watsonplatform.net/authorization/api/v1/token?" \ 
      "url=https://stream.watsonplatform.net/speech-to-text/api" 
url = "wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize?model=en-US_BroadbandModel" 

# BlueMix app credentials 
username = "" # Your Bluemix App username 
password = "" # Your Bluemix App password 

# Send a request to get an authorization key 
r = requests.get(token_url, auth=(username, password)) 
auth_token = r.text 
token_header = {"X-Watson-Authorization-Token": auth_token} 

# Params to use for Watson API 
params = { 
    "word_confidence": True, 
    "content_type": "audio/l16;rate=16000;channels=2", 
    "action": "start", 
    "interim_results": True 
} 

# Opens the stream to start recording from the default microphone 
stream = p.open(format=FORMAT, 
       channels=CHANNELS, 
       rate=RATE, 
       input=True, 
       output=True, 
       frames_per_buffer=CHUNK) 


async def send_audio(ws): 
    # Starts recording of microphone 
    print("* READY *") 

    start = time.time() 
    while True: 
     try: 
      print(".") 
      data = stream.read(CHUNK) 
      await ws.send(data) 
      if time.time() - start > 20: # Records for n seconds 
       await ws.send(json.dumps({'action': 'stop'})) 
       return False 
     except Exception as e: 
      print(e) 
      return False 

    # Stop the stream and terminate the recording 
    stream.stop_stream() 
    stream.close() 
    p.terminate() 


async def speech_to_text(): 
    async with websockets.connect(url, extra_headers=token_header) as conn: 
     # Send request to watson and waits for the listening response 
     send = await conn.send(json.dumps(params)) 
     rec = await conn.recv() 
     print(rec) 
     asyncio.ensure_future(send_audio(conn)) 

     # Keeps receiving transcript until we have the final transcript 
     while True: 
      try: 
       rec = await conn.recv() 
       parsed = json.loads(rec) 
       transcript = parsed["results"][0]["alternatives"][0]["transcript"] 
       print(transcript) 
       #print(parsed) 
       if "results" in parsed: 
        if len(parsed["results"]) > 0: 
         if "final" in parsed["results"][0]: 
          if parsed["results"][0]["final"]: 
           #conn.close() 
           #return False 
           pass 
      except KeyError: 
       conn.close() 
       return False 

# Starts the application loop 
loop = asyncio.get_event_loop() 
loop.run_until_complete(speech_to_text()) 
loop.close()

이제 마이크를 통해 녹음하는 동안 사본을 얻는 것이 전부입니다.

출처

2017-10-29 18:40:44 DBeck

파이썬에 대한 Watson을 사용하여 텍스트에 연속 실시간 음성

답변

관련 문제