SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC - - PowerPoint PPT Presentation

speechrecognition p y thon librar y
SMART_READER_LITE
LIVE PREVIEW

SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC - - PowerPoint PPT Presentation

SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator Wh y the SpeechRecognition librar y? Some e x isting p y thon libraries CMU Sphin x Kaldi


slide-1
SLIDE 1

SpeechRecognition Python library

SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Daniel Bourke

Machine Learning Engineer/YouTube Creator

slide-2
SLIDE 2

SPOKEN LANGUAGE PROCESSING IN PYTHON

Why the SpeechRecognition library?

Some existing python libraries CMU Sphinx Kaldi SpeechRecognition Wav2leer++ by Facebook

slide-3
SLIDE 3

SPOKEN LANGUAGE PROCESSING IN PYTHON

Getting started with SpeechRecognition

Install from PyPi:

$ pip install SpeechRecognition

Compatible with Python 2 and 3 We'll use Python 3

slide-4
SLIDE 4

SPOKEN LANGUAGE PROCESSING IN PYTHON

Using the Recognizer class

# Import the SpeechRecognition library import speech_recognition as sr # Create an instance of Recognizer recognizer = sr.Recognizer() # Set the energy threshold recognizer.energy_threshold = 300

slide-5
SLIDE 5

SPOKEN LANGUAGE PROCESSING IN PYTHON

Using the Recognizer class to recognize speech

Recognizer class has built-in functions which interact with speech APIs recognize_bing() recognize_google() recognize_google_cloud() recognize_wit()

Input: audio_file Output: transcribed speech from audio_file

slide-6
SLIDE 6

SPOKEN LANGUAGE PROCESSING IN PYTHON

SpeechRecognition Example

Focus on recognize_google() Recognize speech from an audio le with SpeechRecognition:

# Import SpeechRecognition library import speech_recognition as sr # Instantiate Recognizer class recognizer = sr.Recognizer() # Transcribe speech using Goole web API recognizer.recognize_google(audio_data=audio_file language="en-US") Learning speech recognition on DataCamp is awesome!

slide-7
SLIDE 7

Your turn!

SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

slide-8
SLIDE 8

Reading audio files with SpeechRecognition

SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Daniel Bourke

Machine Learning Engineer/YouTube Creator

slide-9
SLIDE 9

SPOKEN LANGUAGE PROCESSING IN PYTHON

The AudioFile class

import speech_recognition as sr # Setup recognizer instance recognizer = sr.Recognizer() # Read in audio file clean_support_call = sr.AudioFile("clean-support-call.wav") # Check type of clean_support_call type(clean_support_call) <class 'speech_recognition.AudioFile'>

slide-10
SLIDE 10

SPOKEN LANGUAGE PROCESSING IN PYTHON

From AudioFile to AudioData

recognizer.recognize_google(audio_data=clean_support_call) AssertionError: ``audio_data`` must be audio data # Convert from AudioFile to AudioData with clean_support_call as source: # Record the audio clean_support_call_audio = recognizer.record(source) # Check the type type(clean_support_call_audio) <class 'speech_recognition.AudioData'>

slide-11
SLIDE 11

SPOKEN LANGUAGE PROCESSING IN PYTHON

Transcribing our AudioData

# Transcribe clean support call recognizer.recognize_google(audio_data=clean_support_call_audio) hello I'd like to get some help setting up my account please

slide-12
SLIDE 12

SPOKEN LANGUAGE PROCESSING IN PYTHON

Duration and offset

duration and offset both None by default

# Leave duration and offset as default with clean_support_call as source: clean_support_call_audio = recognizer.record(source, duration=None,

  • ffset=None)

# Get first 2-seconds of clean support call with clean_support_call as source: clean_support_call_audio = recognizer.record(source, duration=2.0) hello I'd like to get

slide-13
SLIDE 13

Let's practice!

SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

slide-14
SLIDE 14

Dealing with different kinds of audio

SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Daniel Bourke

Machine Learning Engineer/YouTube Creator

slide-15
SLIDE 15

SPOKEN LANGUAGE PROCESSING IN PYTHON

What language?

# Create a recognizer class recognizer = sr.Recognizer() # Pass the Japanese audio to recognize_google text = recognizer.recognize_google(japanese_good_morning, language="en-US") # Print the text print(text) Ohio gozaimasu

slide-16
SLIDE 16

SPOKEN LANGUAGE PROCESSING IN PYTHON

What language?

# Create a recognizer class recognizer = sr.Recognizer() # Pass the Japanese audio to recognize_google text = recognizer.recognize_google(japanese_good_morning, language="ja") # Print the text print(text) ?????????

slide-17
SLIDE 17

SPOKEN LANGUAGE PROCESSING IN PYTHON

Non-speech audio

# Import the leopard roar audio file leopard_roar = sr.AudioFile("leopard_roar.wav") # Convert the AudioFile to AudioData with leopard_roar as source: leopard_roar_audio = recognizer.record(source) # Recognize the AudioData recognizer.recognize_google(leopard_roar_audio) UnknownValueError:

slide-18
SLIDE 18

SPOKEN LANGUAGE PROCESSING IN PYTHON

Non-speech audio

# Import the leopard roar audio file leopard_roar = sr.AudioFile("leopard_roar.wav") # Convert the AudioFile to AudioData with leopard_roar as source: leopard_roar_audio = recognizer.record(source) # Recognize the AudioData with show_all turned on recognizer.recognize_google(leopard_roar_audio, show_all=True) []

slide-19
SLIDE 19

SPOKEN LANGUAGE PROCESSING IN PYTHON

Showing all

# Recognizing Japanese audio with show_all=True text = recognizer.recognize_google(japanese_good_morning, language="en-US", show_all=True) # Print the text print(text) {'alternative': [{'transcript': 'Ohio gozaimasu', 'confidence': 0.89041114}, {'transcript': 'all hail gozaimasu'}, {'transcript': 'ohayo gozaimasu'}, {'transcript': 'olho gozaimasu'}, {'transcript': 'all Hale gozaimasu'}], 'final': True}

slide-20
SLIDE 20

SPOKEN LANGUAGE PROCESSING IN PYTHON

Multiple speakers

# Import an audio file with multiple speakers multiple_speakers = sr.AudioFile("multiple-speakers.wav") # Convert AudioFile to AudioData with multiple_speakers as source: multiple_speakers_audio = recognizer.record(source) # Recognize the AudioData recognizer.recognize_google(multiple_speakers_audio)

  • ne of the limitations of the speech recognition library is that it doesn't

recognise different speakers and voices it will just return it all as one block

  • f text
slide-21
SLIDE 21

SPOKEN LANGUAGE PROCESSING IN PYTHON

Multiple speakers

# Import audio files separately speakers = [sr.AudioFile("s0.wav"), sr.AudioFile("s1.wav"), sr.AudioFile("s2.wav")] # Transcribe each speaker individually for i, speaker in enumerate(speakers): with speaker as source: speaker_audio = recognizer.record(source) print(f"Text from speaker {i}: {recognizer.recognize_google(speaker_audio)}") Text from speaker 0: one of the limitations of the speech recognition library Text from speaker 1: is that it doesn't recognise different speakers and voices Text from speaker 2: it will just return it all as one block a text

slide-22
SLIDE 22

SPOKEN LANGUAGE PROCESSING IN PYTHON

Noisy audio

If you have trouble hearing the speech, so will the APIs

# Import audio file with background nosie noisy_support_call = sr.AudioFile(noisy_support_call.wav) with noisy_support_call as source: # Adjust for ambient noise and record recognizer.adjust_for_ambient_noise(source, duration=0.5) noisy_support_call_audio = recognizer.record(source) # Recognize the audio recognizer.recognize_google(noisy_support_call_audio) hello ID like to get some help setting up my calories

slide-23
SLIDE 23

Let's practice!

SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON