Google Speech-to-Text API ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

Google Speech-to-Text API๋Š” ์Œ์„ฑ์„ ํ…์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฐ•๋ ฅํ•œ ๋„๊ตฌ๋‹ค.

์ด๋ฅผ ํ™œ์šฉํ•˜๋ ค๋ฉด Google Cloud Platform(GCP) ๊ณ„์ •์ด ํ•„์š”ํ•˜๋ฉฐ, ์•„๋ž˜ ๋‹จ๊ณ„๋ฅผ ํ†ตํ•ด API ์„ค์ •๊ณผ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•์„ ๋ฐฐ์šธ ์ˆ˜ ์žˆ๋‹ค.

1. Google Cloud ํ”„๋กœ์ ํŠธ ์„ค์ •

  1. Google Cloud Console์— ๋กœ๊ทธ์ธ
  2. ์ƒˆ ํ”„๋กœ์ ํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ฑฐ๋‚˜ ๊ธฐ์กด ํ”„๋กœ์ ํŠธ๋ฅผ ์„ ํƒ
  3. API ๋ฐ ์„œ๋น„์Šค โ†’ API ๋ฐ ์„œ๋น„์Šค ์‚ฌ์šฉ ์„ค์ •์œผ๋กœ ์ด๋™
  4. Speech-to-Text API๋ฅผ ๊ฒ€์ƒ‰ํ•œ ํ›„ ํ™œ์„ฑํ™”

2. ์„œ๋น„์Šค ๊ณ„์ • ํ‚ค ์ƒ์„ฑ ๋ฐ ๋‹ค์šด๋กœ๋“œ

์„œ๋น„์Šค ๊ณ„์ • ํ‚ค ์ƒ์„ฑ

  1. Google Cloud Console์—์„œ ํ”„๋กœ์ ํŠธ๋ฅผ ์„ ํƒ
  2. IAM ๋ฐ ๊ด€๋ฆฌ์ž โ†’ ์„œ๋น„์Šค ๊ณ„์ •์œผ๋กœ ์ด๋™
  3. ์„œ๋น„์Šค ๊ณ„์ • ์ƒ์„ฑ์„ ํด๋ฆญํ•˜๊ณ  ๊ถŒํ•œ์„ ์ถ”๊ฐ€
  4. ํ‚ค ์ถ”๊ฐ€ ๋ฒ„ํŠผ์„ ํด๋ฆญํ•˜์—ฌ JSON ํ˜•์‹์˜ ํ‚ค ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œ

API ํ‚ค ์ƒ์„ฑ (๋Œ€์•ˆ)

์„œ๋น„์Šค ๊ณ„์ • ํ‚ค ๋Œ€์‹  API ํ‚ค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธ์ฆํ•  ์ˆ˜๋„ ์žˆ๋‹ค:

  1. API ๋ฐ ์„œ๋น„์Šค โ†’ ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด๋กœ ์ด๋™
  2. ์‚ฌ์šฉ์ž ์ธ์ฆ ์ •๋ณด ๋งŒ๋“ค๊ธฐ๋ฅผ ํด๋ฆญํ•˜๊ณ  API ํ‚ค๋ฅผ ์„ ํƒ
  3. ์ƒ์„ฑ๋œ API ํ‚ค๋ฅผ ์ €์žฅ(API ํ˜ธ์ถœ ์‹œ ์ด ํ‚ค๋ฅผ ์‚ฌ์šฉ)

Info

API ํ‚ค๋Š” ๊ฐ„ํŽธํ•œ ์ธ์ฆ ๋ฐฉ์‹์ด์ง€๋งŒ, ์„œ๋น„์Šค ๊ณ„์ • ํ‚ค๋Š” ๋” ๊ฐ•๋ ฅํ•œ ๊ถŒํ•œ ์ œ์–ด์™€ ๋ณด์•ˆ์„ ์ œ๊ณตํ•œ๋‹ค. ํ”„๋กœ์ ํŠธ ๊ทœ๋ชจ์— ๋”ฐ๋ผ ์ ํ•ฉํ•œ ๋ฐฉ๋ฒ•์„ ์„ ํƒํ•˜์ž.

3. ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •

๋‹ค์šด๋กœ๋“œํ•œ ํ‚ค ํŒŒ์ผ์„ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋กœ ์„ค์ •ํ•ด์•ผ ํ•œ๋‹ค.

Python ์ฝ”๋“œ์—์„œ ์„ค์ •

import os
 
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/your-service-account-key.json"

ํ„ฐ๋ฏธ๋„์—์„œ ์„ค์ •

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your-service-account-key.json"

4. Google Speech-to-Text API ์‚ฌ์šฉ ์˜ˆ์ œ

์ „์ฒด ์ฝ”๋“œ

์•„๋ž˜๋Š” Google Speech-to-Text API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” Python ์˜ˆ์ œ์ด๋‹ค:

import os
from google.cloud import speech
 
# ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/your-service-account-key.json"
 
# ํด๋ผ์ด์–ธํŠธ ์ƒ์„ฑ
client = speech.SpeechClient()
 
# MP3 ํŒŒ์ผ ์ฝ๊ธฐ
with open('/path/to/audio.mp3', 'rb') as audio_file:
ย  ย  content = audio_file.read()
 
# RecognitionAudio ๊ฐ์ฒด ์ƒ์„ฑ
audio = speech.RecognitionAudio(content=content)
 
# RecognitionConfig ์„ค์ •
config = speech.RecognitionConfig(
ย  ย  encoding=speech.RecognitionConfig.AudioEncoding.MP3,
ย  ย  sample_rate_hertz=16000,
ย  ย  language_code='ko-KR'
)
 
# API ์š”์ฒญ
try:
ย  ย  response = client.recognize(config=config, audio=audio)
ย  ย  for result in response.results:
ย  ย  ย  ย  print("Transcript: {}".format(result.alternatives[0].transcript))
except Exception as e:
ย  ย  print(f"Error during API call: {e}")

์ฝ”๋“œ ์ฃผ์š” ๋‹จ๊ณ„

  1. ํด๋ผ์ด์–ธํŠธ ์ƒ์„ฑ
    • speech.SpeechClient()๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ API์™€์˜ ์—ฐ๊ฒฐ์„ ์„ค์ •
  2. ์˜ค๋””์˜ค ํŒŒ์ผ ์ฝ๊ธฐ
    • ํŒŒ์ผ์„ ๋ฐ”์ด๋„ˆ๋ฆฌ ํ˜•์‹์œผ๋กœ ์ฝ์–ด RecognitionAudio ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑ
  3. RecognitionConfig ์„ค์ •
    • ์˜ค๋””์˜ค ํŒŒ์ผ์˜ ์ธ์ฝ”๋”ฉ ๋ฐฉ์‹, ์ƒ˜ํ”Œ๋ง ์†๋„, ์–ธ์–ด ์ฝ”๋“œ๋ฅผ ์ง€์ •
    • ์œ„ ์˜ˆ์ œ์—์„œ๋Š” MP3 ํ˜•์‹๊ณผ **ํ•œ๊ตญ์–ด(ko-KR)**๋ฅผ ์‚ฌ์šฉ
  4. API ์š”์ฒญ ๋ฐ ์ฒ˜๋ฆฌ
    • client.recognize()๋กœ ์š”์ฒญ์„ ๋ณด๋‚ด๊ณ , ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅ

5. ์‹คํ–‰ ๊ฒฐ๊ณผ

API๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ํ˜ธ์ถœํ•˜๋ฉด, ์˜ค๋””์˜ค ํŒŒ์ผ์˜ ํ…์ŠคํŠธ ๋ณ€ํ™˜ ๊ฒฐ๊ณผ๊ฐ€ ์ถœ๋ ฅ๋œ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด:

Transcript: ์•ˆ๋…•ํ•˜์„ธ์š”, ์—ฌ๊ธฐ๋Š” Google Cloud Speech-to-Text API์ž…๋‹ˆ๋‹ค.

์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ์˜ˆ์™ธ(Exception) ๋ฉ”์‹œ์ง€๋ฅผ ์ถœ๋ ฅํ•œ๋‹ค:

Error during API call: 403 PERMISSION_DENIED

6. API ์‚ฌ์šฉ ์‹œ ์ฃผ์˜์‚ฌํ•ญ

  1. ์‚ฌ์šฉ๋Ÿ‰ ์ œํ•œ
    • Speech-to-Text API๋Š” ์‚ฌ์šฉ๋Ÿ‰์— ๋”ฐ๋ผ ๊ณผ๊ธˆ๋œ๋‹ค. Google Cloud Console์—์„œ ๋ฌด๋ฃŒ ํ• ๋‹น๋Ÿ‰๊ณผ ์ฒญ๊ตฌ ์„ค์ •์„ ํ™•์ธํ•˜์ž.
  2. ์˜ค๋””์˜ค ํŒŒ์ผ ํ˜•์‹
    • ์ง€์›๋˜๋Š” ์˜ค๋””์˜ค ํ˜•์‹์„ ํ™•์ธํ•˜๊ณ , ์ƒ˜ํ”Œ๋ง ์†๋„(sample_rate_hertz)์™€ ์ธ์ฝ”๋”ฉ(encoding)์„ ์ •ํ™•ํžˆ ์„ค์ •ํ•˜์ž.
  3. ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ๋ณด์•ˆ
    • ์„œ๋น„์Šค ๊ณ„์ • ํ‚ค๋Š” ํ”„๋กœ์ ํŠธ์— ๋ฏผ๊ฐํ•œ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋ฏ€๋กœ, ์•ˆ์ „ํ•œ ๊ฒฝ๋กœ์—์„œ ๊ด€๋ฆฌํ•ด์•ผ ํ•œ๋‹ค.