Jarvis and NeMo GTC China Jarvis 2 JARVIS Platform to develop - PowerPoint PPT Presentation

Jarvis and NeMo GTC China

Jarvis 2

JARVIS Platform to develop and deploy conversational AI applications Designed for sensor fusion Gaze & Speech https://www.youtube.com/watch?v=r264lBi1nMU 3

USE CASES ACROSS ALL VERTICALS Online Store Industrial Finance Energy / Oil & Gas Consumer Internet In car experience Provide Collaborative robots - Robots Call center: Sentiment Use camera and ask, Video diarization - Autonomous Driving: “what are the safety conversational and humans collaborate in of customers calling Meeting/conversation Enhanced In-car interface for close proximity guidelines for this transcription per person with experience combining chemical”? shopping Insurance chatbot: timestamps visual inputs with speech “Add a wedding ring to Engineer troubleshooting with the help of AI assistant an insurance policy via Loud environment - virtual Content tagging with Image, an image and receive assistant using lip reading text, Audio - policy price quote” Recommendation, Ads 4

CHALLENGES OF CONVERSATIONAL AI Deployment Multiple sensors Real Time Custom models High accuracy Cloud services not Existing software not Difficult to use customizable Requires low latency designed for modern Need state-of-the-art multiple sensors High costs for natural interaction production algorithms and models efficiently Data Sovereignty environments 5

JARVIS BENEFITS Deployment Multiple sensors Real Time Custom models High accuracy Micro-service approach Best-in-breed Framework for training Start from base model, End-to-end inference Designed for K8s algorithms and deploying models train with your data on on GPUs optimized to Simple APIs, easy to Direct access to across modalities your infrastructure reduce latency integrate Tools to simplify fusion cutting-edge research 6

JARVIS WORKFLOW OVERVIEW JARVIS AI Services Gaze Pose detection estimation Wake Speech Intent Speech word Object Recognition Classification Synthesis Lip detection activity Pretrained Data for Fine-Tuning gRPC, Python client library models customizing Jarvis Core Client (client) Application (optional) Multiple End users sensor input Sensor Fusion, Dialog 7 Manager, Backend fulfillment

JARVIS WORKFLOW OVERVIEW JARVIS AI Services Gaze Pose detection estimation Wake Speech Intent Speech word Object Recognition Classification Synthesis Lip detection activity Pretrained Data for Fine-Tuning gRPC, Python client library models customizing Jarvis Core Client (client) Application (optional) Multiple End users sensor input Sensor Fusion, Dialog 8 Manager, Backend fulfillment

Visual Diarization Multiple speaker transcription based on video and audio streams Interaction : Jupyter notebook with live video stream overlaying gaze detection and lip activity detection and producing a text transcript per person from the audio stream Technology of sensor fusion : ● Video stream ○ Gaze detection to engage the system ○ Lip activity to determine who is speaking ● Audio stream: ○ Transcribe the audio Label transcriptions per individual speaker ○ Implementation : ● Fusion graph via JSON to combine the multiple inference models Transcription ● gRPC end points for direct interaction with the inference models Driver: Where is a good sushi restaurant? Passenger: What’s the weather in Chicago ● Jupyter notebook demonstrates Python APIs for interaction Model Developer : Improve the conversational model accuracy via fine-tuning with NeMo Developer Operations : Deploy via docker containers from NGC into Kubernetes (EGX) 9

Jarvis ASR Service Jarvis ASR TRTIS pipeline Pre- Post- Acoustic Post- Post- Processing Model Processing Processing Processing End of Greedy BERT- Feature Audio Text Jasper Sentence or Beam based Extractor Detector Decoder Punctuator TRTIS custom TRT on TRTIS custom TRT on backend on GPU backend, N-gram GPU GPU language model Jarvis ASR API Method Name Description Recognize Given audio file as input, return transcript StreamingRecognize Process audio from a file or a microphone as it’s being captured, returning partial transcripts 10

Jarvis – Weather Bot Architecture Deployment of Jarvis components with simple dialog manager Fulfillment Engine Weather query, etc. Jarvis Service Action Result Text ASR Spoken Text Dialog Manager input Jarvis Service • State of conversation Intent & • Route text to services Intent • Entity Pass commands to Audio Jarvis Service Text Slots fulfillment engine response TTS Dialog Trained model Description weights Legend Domain specific NEMO (offline) Authoring (offline) NVIDIA Intent & dialog states, transitions, Entity Chat Application response templates (e.g. iFlyTek) 13

Neural Modules (NeMo) 14

CONVERSATIONAL AI WORKFLOW JARVIS AI Services Gaze Pose detection estimation Wake Speech Intent Speech word Object Recognition Classification Synthesis Lip detection activity Pretrained Data for Fine-Tuning gRPC, Python client library models customizing Jarvis Core Client (client) Application (optional) Multiple End users sensor input Sensor Fusion, Dialog 15 Manager, Backend fulfillment

CONVERSATIONAL AI WORKFLOW JARVIS AI Services NeMo Gaze Pose detection estimation Wake Speech Intent Speech word Object Recognition Classification Synthesis Lip detection activity Pretrained Data for Fine-Tuning gRPC, Python client library models customizing Jarvis Core Client (client) Application (optional) Multiple End users sensor input Sensor Fusion, Dialog 16 Manager, Backend fulfillment

NEMO: TRAINING CONVERSATIONAL AI MODELS Pretrained Models per module • Open source deep learning Python toolkit for training speech and language models Neural Modules Collection Libraries • High performance training on NVIDIA GPUs Uses TensorCores • Multi-GPU • Multi-Node • Voice Natural Speech • Based on concept of Neural Module – Recognition Language Synthesis reusable high level building block for defining deep learning models • PyTorch backend (TensorFlow on Roadmap) Neural Modules Core Mixed Precision, Distributed training, Semantic checks Optimized Framework 18 Accelerated Libraries CUDA, cuBLAS, cuDNN etc...

NEMO COLLECTIONS pip install nemo_asr pip install nemo_nlp pip install nemo_tts nemo_asr nemo_nlp nemo_tts (Speech Recognition) (Natural Lang Processing) (Speech Synthesis) Jasper acoustic model BERT pre-training & Tacotron 2 • • • QuartzNet acoustic model finetuning WaveGlow • • RNN with attention GLUE tasks English and Mandarin • • • Transformer-based Language modeling output and datasets • • English and Mandarin Neural Machine Translation importers • • tokenizers and dataset Intent classification & slot • importers filling ASR spell correction • Punctuation • English and Mandarin • dataset importers 19

NEMO EXAMPLE: JASPER ASR AUDIO AUDIO SPECT SPECT ENC AUDIO AUDIO SPECT SPECT ENC Audio To Text Audio Jasper LEN LEN LEN LEN LEN Data Layer Preprocessing Encoder TEXT TEXT LEN Logging LOG LOG PROB PREDICT ENC PROB Callback Jasper ENC Greedy CTC LEN Decoder For Decoder CTC (invokes) Train LOG PROB LOSS Action LOG PROB LEN CTC Loss TEXT TEXT LEN 20

NEMO EXAMPLE: JASPER ASR Create modules Connect them Define training and evaluation actions 21 ”Jasper: An End -to- End Convolutional Neural Acoustic Model” by Li et al. INTERSPEECH 2019 https://arxiv.org/pdf/1904.03288.pdf

ASR COMPARISONS English LibriSpeech dataset %WER Model Language Model Test-Clean Test-Other Params, M DeepSpeech 2 5-gram 5.33 13.25 >70 wav2letter++ ConvLM 3.26 10.47 208 Listen-Attend-Spell RNN 2.5 5.8 360 (with SpecAugment) - 3.77 11.08 Jasper 10x5 6-gram 3.19 9.03 333 Transformer-XL 2.86 8.17 - 3.90 11.28 QuartzNet 15x5 6-gram 2.96 8.07 19 Transformer-XL 2.69 7.25 22

DOMAIN SPECIFIC ASR Jupyter notebook transfer learning tutorial • Start with pretrained base QuartzNet model • Fine tune with WSJ data (newspaper read aloud) • Add custom language model to base model • Add custom language model to fine-tuned model for best performance • Achieves Word Error Rate of < 2.5 ! Fine-tuned acoustic Pretrained acoustic Fine-tuned Pretrained base model + custom model + custom acoustic model model language model language model Tutorial here: https://ngc.nvidia.com/catalog/containers/nvidia:nemo_asr_app_img 23

TRANSFER LEARNING CUSTOMER STORY An S&P Global Company ● S&P Global produces transcriptions of earnings calls – 10,000 hours of high quality data ● Scribe application works with ASR models ● Recognizes domain specific financial jargon ● Additional language models provide meta tags, punctuation GTC Talk: https://events.rainfocus.com/widget/nvidia/gtcdc19/catalog-short?search=nemo 24

KENSHO ASR RESULTS ● QuartzNet trained on domain specific financial data outperformed all leading ASR models ● Fine tuning was faster and had higher accuracy than training from scratch 25

JARVIS AND NEMO TOGETHER 26

Jarvis and NeMo GTC China Jarvis 2 JARVIS Platform to develop - PowerPoint PPT Presentation

Jarvis and NeMo GTC China Jarvis 2 JARVIS Platform to develop and deploy conversational AI applications Designed for sensor fusion Gaze & Speech https://www.youtube.com/watch?v=r264lBi1nMU 3 USE CASES ACROSS ALL VERTICALS Online

Status of the NEMO project Status of the NEMO project Piera Sapienza on behalf of the NEMO

(NEMO) Intent Language Bert Wijnen bwietf@bwijnen.net NEMO Language IETF95 1

NEMO Basic Support Protocol NEMO Design Team Vijay Devarapalli (NOKIA) Ryuji Wakikawa

NEMO 3 and SuperNEMO Hideaki OHSUMI (Saga Univ.) (NEMO/SuperNEMO Collaboration) x 20 Sectors

Search for neutrinoless double beta decay in NEMO 3 and SuperNEMO Yu. Shitov, IC

Double beta decays study with NEMO 3 and SuperNEMO CHAUVEAU Emmanuel on behalf of the NEMO

NEMO: A New I mplicit NEMO: A New I mplicit Connection Graph- -Based Based Gridless Gridless

Purpose of this presentation Inform the IETF NEMO WG about the use of NEMO BS in the CALM

DISCLAIMER 1 Any use or further distribution of this document is only permitted with written

The NEMO Project Neutrino Mediterranean Observatory P. Piattelli I stituto Nazionale di Fisica

MANET-based Nested NEMO Route Optimization Thomas Heide Clausen, Emmanuel Baccelli Ecole

MIRON MIRON MIPv6 Route Optimization for NEMO MIPv6 Route Optimization for NEMO Carlos J.

Beyond Our Comfort Zone Spreading KDE Software to Non-Free Platforms Stuart Jarvis Akademy 2010

Progress and latest results from Progress and latest results from Baikal, Nestor, NEMO and

INTERNATIONAL NETWORK OF DOLPHINARIUMS ABOUT COMPANY Nemo International Corporation of

INTERNATIONAL NETWORK OF DOLPHINARIUMS ABOUT COMPANY Nemo International Corporation of

TMVA Exercise Crist ov ao Beir ao da Cruz e Silva Instituto Superior T ecnico,

General Information Mission: Supporting our customers product development by improved

SOME NEW RESULTS IN INVERSE RECONSTRUCTION . Alfred S. Carasso, ACMD . PART I FALSE

Building Capacity Together February 8, 2019 BRIDGES Forum Nadine Nasir Adult Educator Toronto

Project Healthy Schools Past, Present, and Future Program with Dr. Kim Eagle May 31, 2011

Springfield Medical Care Systems Blueprint for Health Semi-Annual Conference October 20, 2014

A framework for visual representation of sustainable value creation process in food industry

Can red yeast rice & olive extract improve lipid profile and cardiovascular risk in metabolic

Jarvis and NeMo GTC China Jarvis 2 JARVIS Platform to develop - PowerPoint PPT Presentation

Jarvis and NeMo GTC China Jarvis 2 JARVIS Platform to develop and deploy conversational AI applications Designed for sensor fusion Gaze & Speech https://www.youtube.com/watch?v=r264lBi1nMU 3 USE CASES ACROSS ALL VERTICALS Online

Status of the NEMO project Status of the NEMO project Piera Sapienza on behalf of the NEMO

(NEMO) Intent Language Bert Wijnen bwietf@bwijnen.net NEMO Language IETF95 1

NEMO Basic Support Protocol NEMO Design Team Vijay Devarapalli (NOKIA) Ryuji Wakikawa

NEMO 3 and SuperNEMO Hideaki OHSUMI (Saga Univ.) (NEMO/SuperNEMO Collaboration) x 20 Sectors

Search for neutrinoless double beta decay in NEMO 3 and SuperNEMO Yu. Shitov, IC

Double beta decays study with NEMO 3 and SuperNEMO CHAUVEAU Emmanuel on behalf of the NEMO

NEMO: A New I mplicit NEMO: A New I mplicit Connection Graph- -Based Based Gridless Gridless

Purpose of this presentation Inform the IETF NEMO WG about the use of NEMO BS in the CALM

DISCLAIMER 1 Any use or further distribution of this document is only permitted with written

The NEMO Project Neutrino Mediterranean Observatory P. Piattelli I stituto Nazionale di Fisica

MANET-based Nested NEMO Route Optimization Thomas Heide Clausen, Emmanuel Baccelli Ecole

MIRON MIRON MIPv6 Route Optimization for NEMO MIPv6 Route Optimization for NEMO Carlos J.

Beyond Our Comfort Zone Spreading KDE Software to Non-Free Platforms Stuart Jarvis Akademy 2010

Progress and latest results from Progress and latest results from Baikal, Nestor, NEMO and

INTERNATIONAL NETWORK OF DOLPHINARIUMS ABOUT COMPANY Nemo International Corporation of

INTERNATIONAL NETWORK OF DOLPHINARIUMS ABOUT COMPANY Nemo International Corporation of

TMVA Exercise Crist ov ao Beir ao da Cruz e Silva Instituto Superior T ecnico,

General Information Mission: Supporting our customers product development by improved

SOME NEW RESULTS IN INVERSE RECONSTRUCTION . Alfred S. Carasso, ACMD . PART I FALSE

Building Capacity Together February 8, 2019 BRIDGES Forum Nadine Nasir Adult Educator Toronto

Project Healthy Schools Past, Present, and Future Program with Dr. Kim Eagle May 31, 2011

Springfield Medical Care Systems Blueprint for Health Semi-Annual Conference October 20, 2014

A framework for visual representation of sustainable value creation process in food industry

Can red yeast rice &amp; olive extract improve lipid profile and cardiovascular risk in metabolic

Can red yeast rice & olive extract improve lipid profile and cardiovascular risk in metabolic