SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC - PowerPoint PPT Presentation

SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator

Wh y the SpeechRecognition librar y? Some e x isting p y thon libraries CMU Sphin x Kaldi SpeechRecognition Wa v2 le � er ++ b y Facebook SPOKEN LANGUAGE PROCESSING IN PYTHON

Getting started w ith SpeechRecognition Install from P y Pi : $ pip install SpeechRecognition Compatible w ith P y thon 2 and 3 We ' ll u se P y thon 3 SPOKEN LANGUAGE PROCESSING IN PYTHON

Using the Recogni z er class # Import the SpeechRecognition library import speech_recognition as sr # Create an instance of Recognizer recognizer = sr.Recognizer() # Set the energy threshold recognizer.energy_threshold = 300 SPOKEN LANGUAGE PROCESSING IN PYTHON

Using the Recogni z er class to recogni z e speech Recognizer class has b u ilt - in f u nctions w hich interact w ith speech APIs recognize_bing() recognize_google() recognize_google_cloud() recognize_wit() Inp u t : audio_file O u tp u t : transcribed speech from audio_file SPOKEN LANGUAGE PROCESSING IN PYTHON

SpeechRecognition E x ample Foc u s on recognize_google() Recogni z e speech from an a u dio � le w ith SpeechRecognition : # Import SpeechRecognition library import speech_recognition as sr # Instantiate Recognizer class recognizer = sr.Recognizer() # Transcribe speech using Goole web API recognizer.recognize_google(audio_data=audio_file language="en-US") Learning speech recognition on DataCamp is awesome! SPOKEN LANGUAGE PROCESSING IN PYTHON

Yo u r t u rn ! SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Reading a u dio files w ith SpeechRecognition SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator

The A u dioFile class import speech_recognition as sr # Setup recognizer instance recognizer = sr.Recognizer() # Read in audio file clean_support_call = sr.AudioFile("clean-support-call.wav") # Check type of clean_support_call type(clean_support_call) <class 'speech_recognition.AudioFile'> SPOKEN LANGUAGE PROCESSING IN PYTHON

From A u dioFile to A u dioData recognizer.recognize_google(audio_data=clean_support_call) AssertionError: ``audio_data`` must be audio data # Convert from AudioFile to AudioData with clean_support_call as source: # Record the audio clean_support_call_audio = recognizer.record(source) # Check the type type(clean_support_call_audio) <class 'speech_recognition.AudioData'> SPOKEN LANGUAGE PROCESSING IN PYTHON

Transcribing o u r A u dioData # Transcribe clean support call recognizer.recognize_google(audio_data=clean_support_call_audio) hello I'd like to get some help setting up my account please SPOKEN LANGUAGE PROCESSING IN PYTHON

D u ration and offset duration and offset both None b y defa u lt # Leave duration and offset as default with clean_support_call as source: clean_support_call_audio = recognizer.record(source, duration=None, offset=None) # Get first 2-seconds of clean support call with clean_support_call as source: clean_support_call_audio = recognizer.record(source, duration=2.0) hello I'd like to get SPOKEN LANGUAGE PROCESSING IN PYTHON

Let ' s practice ! SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Dealing w ith different kinds of a u dio SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator

What lang u age ? # Create a recognizer class recognizer = sr.Recognizer() # Pass the Japanese audio to recognize_google text = recognizer.recognize_google(japanese_good_morning, language="en-US") # Print the text print(text) Ohio gozaimasu SPOKEN LANGUAGE PROCESSING IN PYTHON

What lang u age ? # Create a recognizer class recognizer = sr.Recognizer() # Pass the Japanese audio to recognize_google text = recognizer.recognize_google(japanese_good_morning, language="ja") # Print the text print(text) ????????? SPOKEN LANGUAGE PROCESSING IN PYTHON

Non - speech a u dio # Import the leopard roar audio file leopard_roar = sr.AudioFile("leopard_roar.wav") # Convert the AudioFile to AudioData with leopard_roar as source: leopard_roar_audio = recognizer.record(source) # Recognize the AudioData recognizer.recognize_google(leopard_roar_audio) UnknownValueError: SPOKEN LANGUAGE PROCESSING IN PYTHON

Non - speech a u dio # Import the leopard roar audio file leopard_roar = sr.AudioFile("leopard_roar.wav") # Convert the AudioFile to AudioData with leopard_roar as source: leopard_roar_audio = recognizer.record(source) # Recognize the AudioData with show_all turned on recognizer.recognize_google(leopard_roar_audio, show_all=True) [] SPOKEN LANGUAGE PROCESSING IN PYTHON

Sho w ing all # Recognizing Japanese audio with show_all=True text = recognizer.recognize_google(japanese_good_morning, language="en-US", show_all=True) # Print the text print(text) {'alternative': [{'transcript': 'Ohio gozaimasu', 'confidence': 0.89041114}, {'transcript': 'all hail gozaimasu'}, {'transcript': 'ohayo gozaimasu'}, {'transcript': 'olho gozaimasu'}, {'transcript': 'all Hale gozaimasu'}], 'final': True} SPOKEN LANGUAGE PROCESSING IN PYTHON

M u ltiple speakers # Import an audio file with multiple speakers multiple_speakers = sr.AudioFile("multiple-speakers.wav") # Convert AudioFile to AudioData with multiple_speakers as source: multiple_speakers_audio = recognizer.record(source) # Recognize the AudioData recognizer.recognize_google(multiple_speakers_audio) one of the limitations of the speech recognition library is that it doesn't recognise different speakers and voices it will just return it all as one block of text SPOKEN LANGUAGE PROCESSING IN PYTHON

M u ltiple speakers # Import audio files separately speakers = [sr.AudioFile("s0.wav"), sr.AudioFile("s1.wav"), sr.AudioFile("s2.wav")] # Transcribe each speaker individually for i, speaker in enumerate(speakers): with speaker as source: speaker_audio = recognizer.record(source) print(f"Text from speaker {i}: {recognizer.recognize_google(speaker_audio)}") Text from speaker 0: one of the limitations of the speech recognition library Text from speaker 1: is that it doesn't recognise different speakers and voices Text from speaker 2: it will just return it all as one block a text SPOKEN LANGUAGE PROCESSING IN PYTHON

Nois y a u dio If y o u ha v e tro u ble hearing the speech , so w ill the APIs # Import audio file with background nosie noisy_support_call = sr.AudioFile(noisy_support_call.wav) with noisy_support_call as source: # Adjust for ambient noise and record recognizer.adjust_for_ambient_noise(source, duration=0.5) noisy_support_call_audio = recognizer.record(source) # Recognize the audio recognizer.recognize_google(noisy_support_call_audio) hello ID like to get some help setting up my calories SPOKEN LANGUAGE PROCESSING IN PYTHON

Let ' s practice ! SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC - PowerPoint PPT Presentation

SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator Wh y the SpeechRecognition librar y? Some e x isting p y thon libraries CMU Sphin x Kaldi

Jog-a-thon Parent Presentation When is the Jog-a-thon? <insert date and time> <insert

Mite-A-Thon Date: May 2-17 and August 15-30, 2020 Location: North America 1 Mite-A-Thon: What is

Hello P y thon ! IN TR OD U C TION TO P YTH ON H u go Bo w ne - Anderson Data Scientist at

Mite-A-Thon Date: September 8 to September 15, 2018 Location: North America Mite-A-Thon: What is

CREATE-A-THON: CREATIVE SOLUTIONS FOR EQUITY & ACCESS WHERE TO FIND INFORMATION

Assessing the Art+Feminism Edit-a-thon for Wikipedia Literacy, Learning Outcomes, and Critical

2nd Workshop on the Study supporting the Evaluation of the FCM legislation Thon Hotel EU,

Introd u ction to a u dio data in P y thon SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Large-scale statistical computing Hack-a-thon 17-18th March Atlanta Agenda Introduction

Welcome to P y thon ! P YTH ON FOR SP R E AD SH E E T U SE R S Chris Cardillo Data Scientist

P y thon Lists IN TR OD U C TION TO P YTH ON H u go Bo w ne - Anderson Data Scientist at

Di v e into P y thon IN TR OD U C TION TO DATA SC IE N C E IN P YTH ON Hillar y Green - Lerman

FSK Move-A-Thon October 13 - October 19, 2020 Event Team : Lisa Rockefeller Katie Gaertner

In In-Transit sit Practices ices Among Multi-Cam ampus Universit sity Librar aries ies in

Felton Br elton Branc anch Libr h Librar ary Community Community Meeting Meeting May May

A Que A Quest f st for or a a Li Librar ary y Man anag agem emen ent t Sys ystem em

SI485i Natural Language Processing Set 1 Intro to NLP Fall 2013 : Chambers Assumptions about

Introduction to Statistical Speech Recognition Lecture 1 CS 753 Instructor: Preethi Jyothi

Structured Discriminative Models for Speech Recognition Mark Gales - work with Anton Ragni,

Interchangeable Modalities W3C Workshop on MultiModal Interaction 22-23 July 2013, New York

Letter-to-Phoneme Conversion for a German Text-to-Speech System Vera Demberg Institut fr

Hanady Ahmed Allan Ramsay Arabic Department, CAS

Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis Konstantin Tretjakov kt@ut.ee

Latvian Text-to-Speech Synthesizer Mrcis Pinnis Ilze Auzia Marcis.Pinnis@lumii.lv

SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC - PowerPoint PPT Presentation

SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator Wh y the SpeechRecognition librar y? Some e x isting p y thon libraries CMU Sphin x Kaldi

Jog-a-thon Parent Presentation When is the Jog-a-thon? &lt;insert date and time&gt; &lt;insert

Mite-A-Thon Date: May 2-17 and August 15-30, 2020 Location: North America 1 Mite-A-Thon: What is

Hello P y thon ! IN TR OD U C TION TO P YTH ON H u go Bo w ne - Anderson Data Scientist at

Mite-A-Thon Date: September 8 to September 15, 2018 Location: North America Mite-A-Thon: What is

CREATE-A-THON: CREATIVE SOLUTIONS FOR EQUITY &amp; ACCESS WHERE TO FIND INFORMATION

Assessing the Art+Feminism Edit-a-thon for Wikipedia Literacy, Learning Outcomes, and Critical

2nd Workshop on the Study supporting the Evaluation of the FCM legislation Thon Hotel EU,

Introd u ction to a u dio data in P y thon SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Large-scale statistical computing Hack-a-thon 17-18th March Atlanta Agenda Introduction

Welcome to P y thon ! P YTH ON FOR SP R E AD SH E E T U SE R S Chris Cardillo Data Scientist

P y thon Lists IN TR OD U C TION TO P YTH ON H u go Bo w ne - Anderson Data Scientist at

Di v e into P y thon IN TR OD U C TION TO DATA SC IE N C E IN P YTH ON Hillar y Green - Lerman

FSK Move-A-Thon October 13 - October 19, 2020 Event Team : Lisa Rockefeller Katie Gaertner

In In-Transit sit Practices ices Among Multi-Cam ampus Universit sity Librar aries ies in

Felton Br elton Branc anch Libr h Librar ary Community Community Meeting Meeting May May

A Que A Quest f st for or a a Li Librar ary y Man anag agem emen ent t Sys ystem em

SI485i Natural Language Processing Set 1 Intro to NLP Fall 2013 : Chambers Assumptions about

Introduction to Statistical Speech Recognition Lecture 1 CS 753 Instructor: Preethi Jyothi

Structured Discriminative Models for Speech Recognition Mark Gales - work with Anton Ragni,

Interchangeable Modalities W3C Workshop on MultiModal Interaction 22-23 July 2013, New York

Letter-to-Phoneme Conversion for a German Text-to-Speech System Vera Demberg Institut fr

Hanady Ahmed Allan Ramsay Arabic Department, CAS

Dat Data- a-Dri Drive ven Spe n Speech ech Synt nthe hesis Konstantin Tretjakov kt@ut.ee

Latvian Text-to-Speech Synthesizer Mrcis Pinnis Ilze Auzia Marcis.Pinnis@lumii.lv

Jog-a-thon Parent Presentation When is the Jog-a-thon? <insert date and time> <insert

CREATE-A-THON: CREATIVE SOLUTIONS FOR EQUITY & ACCESS WHERE TO FIND INFORMATION