speechrecognition p y thon librar y
play

SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC - PowerPoint PPT Presentation

SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator Wh y the SpeechRecognition librar y? Some e x isting p y thon libraries CMU Sphin x Kaldi


  1. SpeechRecognition P y thon librar y SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator

  2. Wh y the SpeechRecognition librar y? Some e x isting p y thon libraries CMU Sphin x Kaldi SpeechRecognition Wa v2 le � er ++ b y Facebook SPOKEN LANGUAGE PROCESSING IN PYTHON

  3. Getting started w ith SpeechRecognition Install from P y Pi : $ pip install SpeechRecognition Compatible w ith P y thon 2 and 3 We ' ll u se P y thon 3 SPOKEN LANGUAGE PROCESSING IN PYTHON

  4. Using the Recogni z er class # Import the SpeechRecognition library import speech_recognition as sr # Create an instance of Recognizer recognizer = sr.Recognizer() # Set the energy threshold recognizer.energy_threshold = 300 SPOKEN LANGUAGE PROCESSING IN PYTHON

  5. Using the Recogni z er class to recogni z e speech Recognizer class has b u ilt - in f u nctions w hich interact w ith speech APIs recognize_bing() recognize_google() recognize_google_cloud() recognize_wit() Inp u t : audio_file O u tp u t : transcribed speech from audio_file SPOKEN LANGUAGE PROCESSING IN PYTHON

  6. SpeechRecognition E x ample Foc u s on recognize_google() Recogni z e speech from an a u dio � le w ith SpeechRecognition : # Import SpeechRecognition library import speech_recognition as sr # Instantiate Recognizer class recognizer = sr.Recognizer() # Transcribe speech using Goole web API recognizer.recognize_google(audio_data=audio_file language="en-US") Learning speech recognition on DataCamp is awesome! SPOKEN LANGUAGE PROCESSING IN PYTHON

  7. Yo u r t u rn ! SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

  8. Reading a u dio files w ith SpeechRecognition SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator

  9. The A u dioFile class import speech_recognition as sr # Setup recognizer instance recognizer = sr.Recognizer() # Read in audio file clean_support_call = sr.AudioFile("clean-support-call.wav") # Check type of clean_support_call type(clean_support_call) <class 'speech_recognition.AudioFile'> SPOKEN LANGUAGE PROCESSING IN PYTHON

  10. From A u dioFile to A u dioData recognizer.recognize_google(audio_data=clean_support_call) AssertionError: ``audio_data`` must be audio data # Convert from AudioFile to AudioData with clean_support_call as source: # Record the audio clean_support_call_audio = recognizer.record(source) # Check the type type(clean_support_call_audio) <class 'speech_recognition.AudioData'> SPOKEN LANGUAGE PROCESSING IN PYTHON

  11. Transcribing o u r A u dioData # Transcribe clean support call recognizer.recognize_google(audio_data=clean_support_call_audio) hello I'd like to get some help setting up my account please SPOKEN LANGUAGE PROCESSING IN PYTHON

  12. D u ration and offset duration and offset both None b y defa u lt # Leave duration and offset as default with clean_support_call as source: clean_support_call_audio = recognizer.record(source, duration=None, offset=None) # Get first 2-seconds of clean support call with clean_support_call as source: clean_support_call_audio = recognizer.record(source, duration=2.0) hello I'd like to get SPOKEN LANGUAGE PROCESSING IN PYTHON

  13. Let ' s practice ! SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

  14. Dealing w ith different kinds of a u dio SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON Daniel Bo u rke Machine Learning Engineer / Yo u T u be Creator

  15. What lang u age ? # Create a recognizer class recognizer = sr.Recognizer() # Pass the Japanese audio to recognize_google text = recognizer.recognize_google(japanese_good_morning, language="en-US") # Print the text print(text) Ohio gozaimasu SPOKEN LANGUAGE PROCESSING IN PYTHON

  16. What lang u age ? # Create a recognizer class recognizer = sr.Recognizer() # Pass the Japanese audio to recognize_google text = recognizer.recognize_google(japanese_good_morning, language="ja") # Print the text print(text) ????????? SPOKEN LANGUAGE PROCESSING IN PYTHON

  17. Non - speech a u dio # Import the leopard roar audio file leopard_roar = sr.AudioFile("leopard_roar.wav") # Convert the AudioFile to AudioData with leopard_roar as source: leopard_roar_audio = recognizer.record(source) # Recognize the AudioData recognizer.recognize_google(leopard_roar_audio) UnknownValueError: SPOKEN LANGUAGE PROCESSING IN PYTHON

  18. Non - speech a u dio # Import the leopard roar audio file leopard_roar = sr.AudioFile("leopard_roar.wav") # Convert the AudioFile to AudioData with leopard_roar as source: leopard_roar_audio = recognizer.record(source) # Recognize the AudioData with show_all turned on recognizer.recognize_google(leopard_roar_audio, show_all=True) [] SPOKEN LANGUAGE PROCESSING IN PYTHON

  19. Sho w ing all # Recognizing Japanese audio with show_all=True text = recognizer.recognize_google(japanese_good_morning, language="en-US", show_all=True) # Print the text print(text) {'alternative': [{'transcript': 'Ohio gozaimasu', 'confidence': 0.89041114}, {'transcript': 'all hail gozaimasu'}, {'transcript': 'ohayo gozaimasu'}, {'transcript': 'olho gozaimasu'}, {'transcript': 'all Hale gozaimasu'}], 'final': True} SPOKEN LANGUAGE PROCESSING IN PYTHON

  20. M u ltiple speakers # Import an audio file with multiple speakers multiple_speakers = sr.AudioFile("multiple-speakers.wav") # Convert AudioFile to AudioData with multiple_speakers as source: multiple_speakers_audio = recognizer.record(source) # Recognize the AudioData recognizer.recognize_google(multiple_speakers_audio) one of the limitations of the speech recognition library is that it doesn't recognise different speakers and voices it will just return it all as one block of text SPOKEN LANGUAGE PROCESSING IN PYTHON

  21. M u ltiple speakers # Import audio files separately speakers = [sr.AudioFile("s0.wav"), sr.AudioFile("s1.wav"), sr.AudioFile("s2.wav")] # Transcribe each speaker individually for i, speaker in enumerate(speakers): with speaker as source: speaker_audio = recognizer.record(source) print(f"Text from speaker {i}: {recognizer.recognize_google(speaker_audio)}") Text from speaker 0: one of the limitations of the speech recognition library Text from speaker 1: is that it doesn't recognise different speakers and voices Text from speaker 2: it will just return it all as one block a text SPOKEN LANGUAGE PROCESSING IN PYTHON

  22. Nois y a u dio If y o u ha v e tro u ble hearing the speech , so w ill the APIs # Import audio file with background nosie noisy_support_call = sr.AudioFile(noisy_support_call.wav) with noisy_support_call as source: # Adjust for ambient noise and record recognizer.adjust_for_ambient_noise(source, duration=0.5) noisy_support_call_audio = recognizer.record(source) # Recognize the audio recognizer.recognize_google(noisy_support_call_audio) hello ID like to get some help setting up my calories SPOKEN LANGUAGE PROCESSING IN PYTHON

  23. Let ' s practice ! SP OK E N L AN G U AG E P R OC E SSIN G IN P YTH ON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend