Offline Language Translation Tool Capability Collaboration Event 11 - - PowerPoint PPT Presentation

offline language translation tool
SMART_READER_LITE
LIVE PREVIEW

Offline Language Translation Tool Capability Collaboration Event 11 - - PowerPoint PPT Presentation

Offline Language Translation Tool Capability Collaboration Event 11 June 2019 sofwerx.org/translator Blue Sky Working Group Brief Outcomes Team Votes Black 15 Blue 2 Purple 4 Red 1 Green 18 Orange 9 {Black} One-Year Product


slide-1
SLIDE 1

Offline Language Translation Tool Capability Collaboration Event

11 June 2019

sofwerx.org/translator

slide-2
SLIDE 2

Team Votes Black 15 Blue 2 Purple 4 Red 1 Green 18 Orange 9

Blue Sky Working Group Brief Outcomes

slide-3
SLIDE 3

{Black} One-Year Product Development Plan

One-on-One, Two-Way Speech

1. Hardware

  • Current generation Android phone
  • Memory: 2 GB RAM, 16 GB internal mem (min)
  • Microphone: Push-to-Talk directional microphone
  • Audio: Speaker/AUX output

2. Software

  • Identify candidate existing algorithm sources for…
  • ASR, MT, TTS
  • Build application, UI, integrate components
  • Target MOPs: ILR = 1+; ASR Error < 50%; MT BLEU > 15;

Latency = 0.5s; CTR* = X 3. Language & Acoustics

  • General Data collection (leverage existing data) >

Database population

  • Use-specific Data Collection > Database population
  • Build transcription and translation libraries
  • Build language and acoustics models
  • Target Languages: User’s single highest priority language

4. Other

  • User interface research

Surreptitious/Background (Future)

1. Hardware

  • Current generation Android phone w/ large display
  • Memory: 2 GB RAM, 16 GB internal mem (min)
  • Microphone: Far-field and/or array

2. Software

  • Identify candidate existing algorithm sources for…
  • ASR, MT
  • Build application, UI, integrate components
  • Target MOPs: ASR Error < 50%; MT BLEU > 15; Latency =

0.5s; CTR* = X

  • Integrate keyword spotting algorithm

3. Language & Acoustics

  • General Data collection (leverage existing data) > Database

population

  • Use-specific Data Collection > Database population
  • Build transcription and translation libraries
  • Build language and acoustics models
  • Target Languages: User’s single highest priority language

4. Other

  • User interface research
slide-4
SLIDE 4

{Black} Year 2/3 Product Improvement Plan

One-on-One, Two-Way Speech

1. Hardware

  • Upgrade Android host phones as newer models

come out (refinements to software will only continue to increase computing load) 2. Software

  • Refine ASR, MT, TTS algorithms
  • Update UI based on testing/field data
  • Target MOPs (increase): ILR = 1+; ASR Error < 25%;

MT BLEU > 25; Latency = 0.5s; CTR* = X 3. Language & Acoustics

  • Update language and acoustics models based on

testing/field data

  • Target Languages (user-driven): either (A) add

more languages, or (B) increase sophistication on priority languages, or (C) pay a ton of $ and do both 4. Other

  • User interface research

Surreptitious/Background

1. Hardware

  • Upgrade Android host phones as newer models

come out (refinements to software will only continue to increase computing load) 2. Software

  • Refine ASR, MT, algorithms
  • Update UI based on testing/field data
  • Target MOPs (increase): ASR Error < 25%; MT

BLEU > 25; Latency = 0.5s; CTR* = X 3. Language & Acoustics

  • Update language and acoustics models based on

testing/field data

  • Target Languages (user-driven): either (A) add

more languages, or (B) increase sophistication on priority languages, or (C) pay a ton of $ and do both 4. Other

  • User interface research
slide-5
SLIDE 5

{Blue}

  • One-Year product development plan

▪ Select/Define Use Case ▪ Real time transcription/translation via audio ▪ Must address concerns for: a) Hardware Architecture– Form Fit Factor, Processing Ability a) Wireless ear piece: Near Field Magnetic Solution b) Handheld/Body - S7 ATAK/Galaxy Note 8 c) Centralized Processing Unit – KLAS VOYAGER; AWS Snowball/Outpost, etc. a) Bandwidth = 80MB (Trellisware/MANET) b) Computing power = 50 Watts

slide-6
SLIDE 6

{Blue}

  • One-Year product development plan

▪ Select/Define Use Case ▪ Real time transcription/translation via audio ▪ Must address concerns for: a) Software Architecture – AI Software Selection/Configuration w/HW a) Engine Selection for Transcription/Translation w/ appropriate mix of AI engines a) Via, Sage Maker, Veritone aiWARE for orchestration and algorithm management b) Identify key acoustic engines – open source & proprietary c) Identify key transcription/translation engines – open source & proprietary b) Language – Identify Set # of Languages to process for first year. a) Arabic, Spanish, Russian, Chinese c) UI/UX Definition – SW/HW a) Data Processing & Analysis, Training. b) In Theater/On Scene Interactions

  • How will you employ it?

▪ One on one, surreptitious/background/etc.? ▪ One on one – general collection/Ground Truth Classification/low noise ▪ Background Noise/Combination of Speakers – Preprocessing/ Adv Classification /De noise/Acoustic Eng

slide-7
SLIDE 7

{Blue}

  • Pre-Planned Product Improvements for year two to three:
  • Add Use Cases
  • Interactive Solution – More engines, recommender, Selective Speech ID
  • Add languages
  • Improve UI
  • Audio/video – Ear Piece/Helmet Cam/Eye Glass
  • Add additional cognitive categories
  • Speaker separation, Speech ID, Language ID, Voice Recognition, Sentiment
  • Object Detection, OCR, Clothing etc.
  • Behavioral Engines
  • Performance Improvements, Scale, Processing ability
  • Adv Architecture deployment Core/Edge Solution via Cloud/Stand Alone
  • On the fly Training Feedback Loop/Algorithm-Model Retraining Process
  • Transition capability to additional problem sets
  • Transition data to cloud-based system
slide-8
SLIDE 8

{Purple}

  • One-Year product development plan

▪ Must address concerns for: a) Hardware – microphones –far field experience. Beam forming tech on consumer smartphones. Mini cloud. Finite languages (2 or 3?). USB for terabyte extension b) Software –multiple OS (android/iOS). Transcription tech to share. Data at rest encryption. Gather existing solutions. c) Acoustics– capture enough noise environments. Phonetics. d) Language- decide on metrics/product viability. Recording data for languages without existing data sources. e) Other – assess cognitive load of the Warfighter when using machine translation

  • How will you employ it?

▪ One on one at first

slide-9
SLIDE 9

{Purple} –Year 1 Plan

  • Stage 1:
  • Test and evaluate using off the shelf commercial hardware

and software including Android/iOS using Google, Microsoft, iTranslate etc. Modern capable phone

  • Using Warfighters, role players and interpreters, figure
  • ut how a warfighter’s operational capability is

increased (eg detect when interpreter is changing the message or missed something important) with machine translation.

  • Use best language translation pairs
  • Evaluate key languages to begin field testing for the

languages with less data and begin improving them with acquisition of language data Stage 2: collect data from louder environments for key languages Stage 3: Measures of success – task completion %.

slide-10
SLIDE 10

{Purple}- Year 2/3 Plan

  • Pre-Planned Product Improvements for year two to three:
  • Hardware development – microphones for multi-person

environments and background noise.

  • Increase storage capacity on phones (USB?) for recordings

to improve the machine and assess accuracy of translations

  • Increase language availability through acquisition of

language data

slide-11
SLIDE 11

{Red} One Year

  • Hardware
  • Android based device
  • Best of breed devices
  • Focus on using earphones for translation
  • Hands free as much as possible
  • External microphones on user
  • Bottom line: device is based upon program office procurements today and

future

slide-12
SLIDE 12

{Red} One Year

  • Software
  • Application based
  • Cloud Updates for new phrases and common terms
  • Language Identification to distinguish unexpected languages
  • Loose coupling: data can be processed and viewed the same way across

different systems

  • Algorithms: Speech to text, text to text (translation), text to speech
slide-13
SLIDE 13

{Red} One Year

  • Acoustics
  • Assessment of current technology, most effective
  • Beamforming: multiple microphones to distinguish voices (alexa)
  • Audio processing is noise cancelling for speaker’s separation from the

environment

slide-14
SLIDE 14

{Red} One Year

  • Language
  • Upload and focus on gathering information for mission critical languages and

dialects (local translators, language database via government, social media, television closed captioning)

  • Understand common phrases or indicators of harm (someone saying “bomb”)
  • Languages will be selected on device before entering mission area (most

common languages/dialects in region)

  • User discretion for switching language
slide-15
SLIDE 15

{Red}

  • Pre-Planned Product Improvements for year two to three:
  • User feedback process improvement
  • Upgrade size of devices
  • Enhance range of microphones
  • Continuously collect information for more languages and

dialects

  • Bottom Line: stay consistent with program office and user

needs

slide-16
SLIDE 16

End solution

  • Different modules that could be used based on the situation and

available hardware.

  • Mobile phone only
  • Mobile phone + microphone
  • Mobile phone + mic + Jetson
slide-17
SLIDE 17

Jetson AGX Xavier + Battery + storage Standalone unit for situations where other hardware is not available Microphone

slide-18
SLIDE 18

Hardware

  • Discrete hardware the operator carries in their backpack (Jetson AGX

Xavier + battery ) connects to multiple mobile phones in the vicinity

  • ver a local network.
  • Phone is just an interface and can be any Android device if used with

discrete processing hardware

  • In standalone mode (lower accuracy and capability) mobile phone will

be a Samsung Galaxy S7 / Note 8

  • Wearable directional microphone (Optional)
  • Data storage (SD card/Portable HDD) (Store 24 hours of audio that can

be offloaded and processed later)

slide-19
SLIDE 19

Software

  • Plug-in architecture to add different modules into the system on the

field

  • Auto detect hardware
  • Different ML models based on different situations
  • HLT algorithms (Human language technologies)
  • Ability to use different models based on operator feedback (select

language, dialect)

  • Retrain on stored data with periodic upgrades
  • Different noise cancellation algorithms based on environment
slide-20
SLIDE 20

Acoustics

  • Microphone
  • Choice of having a unidirectional microphone vs omnidirectional mic.
  • Limit to environments with few (2-5) speakers in the first year
slide-21
SLIDE 21

Language

  • Limit to few languages in first year (STT+MT)
  • Spanish
  • Portuguese
  • Russian
  • French
  • Mandarin
  • Farsi (stretch goal if data is available)
  • See what data is available from other government entities.
slide-22
SLIDE 22

Other

  • Keyword alerts across devices
slide-23
SLIDE 23

Phase 2 (2nd year)

  • Increase model accuracy for standalone mode
  • Multiple speaker diarization
  • Add more languages
  • Get more data (Transcribe the recorded audio)
slide-24
SLIDE 24

Phase 3 (3rd year)

  • Add code switching
  • Add more languages
  • Refine technology
  • Add ability to focus on different conversations (eg: Focus on different

conversations based on important keywords)

  • Train speaker ID based on collected and annotated data (if required as a

separate tool)

slide-25
SLIDE 25

ORANGE Product development plan: Year 1

  • Use case:
  • One-on-one interaction: separate channel to verify interpreter performance
  • Integrate speaker recognition/tracking to separately track interpreter and

foreign-language interlocutor

  • Speaker ID for pre-enrolled speakers, to flag if the interlocutor is a known

speaker

  • User chooses language
  • Prioritize accuracy and performance over weight/compute power/battery life
slide-26
SLIDE 26

Hardware

  • Ruggedized laptop/tablet in backpack/vehicle
  • Tethered by Bluetooth or wired (if in backpack) to phone
  • Use built-in mic in phone
  • Earbud and also scrolling on-screen display of translated output
slide-27
SLIDE 27

Software

  • User interface, multiple iterations with users
  • Communications between phone and laptop, synchronization
  • Format transcript so it’s usable
  • Storage of audio recordings on laptop
  • Storage of written transcripts on laptop and optionally on phone
  • Integrate speaker recognition/tracking to separately track interpreter

and foreign-language interlocutor

  • Means to download new language components, update models, upload

stored recordings & transcripts

slide-28
SLIDE 28

Languages

  • Year 1:
  • Modern Standard Arabic, Farsi, Russian, (African) French
  • Need to verify availability, quantity, quality, types of data
  • Further improve existing speech recognition and machine translation engines
  • Start data collection for out years
  • Egyptian Arabic, Levantine Arabic, Korean, Indonesian
  • Collect real or role-played interactions; make audio recordings, transcribe,

translate (in text)

  • Gather data that’s already being created, e.g. US-funded/partnered

radio/TV/etc. (including call-ins), which is being translated back to English

  • Collections of different dialect acoustics to support language ID
slide-29
SLIDE 29

Program management

  • Bimonthly meetings with government customer and with users to solicit

feedback

  • Trial system with most developed language (e.g. MSA) starting from

month 8

  • Trial again in month 10.5
  • Delivery of systems at end of month 12
slide-30
SLIDE 30

Year 2

  • Use case
  • Add language ID: user picks a set of most likely dialects (e.g. based on location),

system can make its own choice from among those or others

  • Translate a conversation with no interpreter involved, just listening in
  • Ability to give some feedback to the user about audio quality coming in
  • Directional mic or array mic
slide-31
SLIDE 31

Year 2

  • Develop models for new languages for basic components
  • Egyptian Arabic, Levantine Arabic, Korean, Indonesian
  • Improvements, bug fixes, feature enhancements after year 1 delivery
  • Add language ID
  • Reduce component size with a view toward running on phone

standalone

  • Plan for surreptitious/crowd-usage case