Advanced Animatronics Voice and Jaws v1.1
NordicFuzzCon – 20/02/2020
Floere T. Pillowcase, Devourer of Automobiles floere@robocow.be
Advanced Animatronics Voice and Jaws v1.1 NordicFuzzCon 20/02/2020 - - PowerPoint PPT Presentation
Advanced Animatronics Voice and Jaws v1.1 NordicFuzzCon 20/02/2020 Floere T. Pillowcase, Devourer of Automobiles floere@robocow.be What is this Talk About ? An overview of the State of the Art of moving jaws and voice projection Why
NordicFuzzCon – 20/02/2020
Floere T. Pillowcase, Devourer of Automobiles floere@robocow.be
2 / 64
3 / 64
4 / 64
your acting
interaction
contained in the costume
Lip-syncing with puppet mask (manual actuated) Radula Castion – Zuzu’s White Rabbit https://www.youtube.com/watch?v=b2pDuWh3ik8
5 / 64
implement by hobbyists
animatronic with 30+ servos and a head full of gears
must suffice
– Off-the-shelf parts – 3D printable
Gustav Hoegen
6 / 64
Wikipedia - Uncanny Valley Conjecture (Mori 1970)
7 / 64
8 / 64
(no lip-syncing or over-dubbing)
–
Katey McGregor – Talking Mickey Mouse https://www.youtube.com/watch?v=762-tHwnAHg
–
Mascot – Animatronic Mascots https://www.youtube.com/watch?v=Ve3vuxII6Dc
–
Lunaspuppets - Human-Size Animatronic Robotic Talking Donkey Puppet
https://www.youtube.com/watch?v=Cv5yAfHWEY4
–
Bake Me Up Buttercup – How to Measure Flour Correctly https://www.youtube.com/watch?v=YBkT5woqmAY
–
Beautyofthe Bass – Speaker Costume Talks Live! V3 https://www.youtube.com/watch?v=UWOWqe1kP7U
–
DRAGON =^ ^= - Howwwwwwdy folks and welcome to Monday ‿
Twitter: @GRNdragon0
9 / 64
– Limited, static articulation (blinks + simple mouth) – Good voice quality
– Most costumes are actually puppets, controlled by the
actor’s hand/chin/tongue, or a remote operator
– Let’s have a look at this…
The Character Academy – How Disney Characters Blink https://www.youtube.com/watch?v=YRDBFc-TrtM
10 / 64
– Articulated jaws can work (but often don’t)
– Voice is dull in real life
voice projection
Perform V” for voice projection, which works well (but bulky system)
11 / 64
12 / 64
13 / 64
– Speaking with exaggerated jaw motion – E.g.: Buttercup and NIIC do this well
14 / 64
– Big and very powerful ones for chewing and large
jaw motions. These are slow!
– Little, fast ones for speech – The big ones disengage when speaking
– Under ~0.3 cm pronouncing /ta/ and /te/
Ostry and Flanagan, 1989
– Under ~2.5 cm pronouncing /a/
Vatikiotis-Bateson and Ostry, 1995
15 / 64
“Human Jaw Movement in Mastication and Speech”, D.J. Ostry and J.R. Flanagan,
Sensor attached to the chin, just posterior to the mental notch.
16 / 64
Marker 4 cm from lower incisors, ~on the midsagittal plane. “An Analysis of the Dimensianality of Jaw Motion in Speech”, E. Vatikiotis-Bateson and D.J. Ostry, Journal of Phonetics, Vol. 23, pp. 101-117, 1995
17 / 64
18 / 64
19 / 64
Haskins Laboratories
gosh.nhs.uk
20 / 64
Jörgen Ahlberg – Source-Filter Model of Speech Production
21 / 64
– Voiced speech starts from glottal impulses – Bzz! Bzzzzzz! – Recorded using a contact microphone – Is also why throat microphones sound iffy…
22 / 64
visible speech is lip motion
– Many speech sounds
(phonemes) look alike
– Eg: to a lip reader
“elephant juice” = “I love you”
speech?
– A very hard problem – Key to speech recognition
23 / 64
– Voiced or louder
– Nasal or unvoiced
Wolf Paulus – Viseme Model with 12 Mouth Shapes
24 / 64
– Estimate mouth state
from jaw + lips
– No actual phoneme
detection
– Don’t need perfection
– Chin motion (slow) – Measured from jaw – Includes static poses
– Lip motion (fast) – Estimated from speech – No action when silent
Jaw sensor Lip “sensor” Speech Analysis Jaw Servos Mouth Est.
25 / 64
– Voiced, unvoiced, or
– How much energy? – Can we do this
“A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition,” Bishnu S. Atal, Lawrence R. Rabiner, 1976.
26 / 64
– How nasal is voiced
speech?
– Have done original
research on sensors…
– But can’t use any of that
work here!
underlying principle
airflow + speech
Donald Derrick – nasalence of na
27 / 64
– Lips can be separate or added to jaw motion
28 / 64
– Jaw → 1 servo
– Lips → 1/2 servos (opt.)
– Two microphones (mouth + nose) – Jaw motion sensor
Eva Taylor – Animatronic Alien
https://makezine.com/2014/10/27/the-making-of-an-animatronic-alien/
29 / 64
http://www.tioh.de/
https://radulacastion.wixsite.com/radulacastion
Rick Lazzarini, Stan Winston School of Character Arts
skud duncan – Animatronic Jaw Test https://www.youtube.com/watch?v=15IVl1VYdSk Winter Snowmew - “Couple of my followers have been curious about the weird snout. Here is the snarl and mouth mechanics.”
30 / 64
– This is not the point of this project – Afforability and “bang for the buck” is key
31 / 64
TheCharacterShop – TCSpolarbearWaldo.mov https://www.youtube.com/watch?v=bFW2azvVEdI Shanetheactor – MetroPCS Commercial https://www.youtube.com/watch?v=udlQ7SH_RtM Radula Castion – Zuzu’s White Rabbit https://www.youtube.com/watch?v=b2pDuWh3ik8
32 / 64
– Conventions are LOUD! – Voice acting gives bizarre speech patterns – Sensors don’t stay put
– Computer vision systems not practical (yet)
33 / 64
– Mouth held open for a long time – Mouth unmoving while speaking – Mouth held shut while mumbling
– Smile = mouth a little open for now...
34 / 64
35 / 64
– Loud, even during calm
moments
– Noise is non-stationary
much of our voice out of the noise as possible, so L3 can really go to town on the noise. (Which can also be a voice! This is how it can tell the difference.)
L3 MMSE L2 GCCPF
L1 Cardioid Mic Ambient Mic
36 / 64
37 / 64
* This test recording was actually done using an omni-directional microphone, thus worst-case
Figure Eight - Knowles Acoustics Cardioid - SoundGuys
38 / 64
Coupled Paired Filter
noise reference and speech microphones, then subtract the noise reference from the signal and vise-versa
algorithm to take better advantage of the close-talking mic and self-adjust to the stupid acoustic environment better
“Low Distortion Noise Cancellers – Revival of a Classical Technique,” Akihiko Sugiyama
39 / 64
noise estimation (Minimum Mean-Square Error Short- Time Spectral Amplitude)
noise canceller, but able to handle non-stationary noise
does!
aggressive!
“Development of speech technologies to support hearing through mobile terminal users,” T. Togawa, T. Otani, K. Suzuki, T. Taniguchi, 2015. (Not the exact algorithm used in my code – used here for the nice figure)
40 / 64
41 / 64
– Latency low enough – Very noise robust – Works over wide range with same settings
42 / 64
–
They will work in most environments
–
They will work with most speakers and languages
–
They will work with squeakers
–
They can get it wrong at times
–
Many, many parameters to configure
–
These are some of the most robust algo’s out there
–
Most of the parameters are fixed for the application
–
The remainder tunes easily to a specific costume
43 / 64
44 / 64
strap with a stretch sensor and…
comfortable anyway
elastic, etc…
– Shifts around too much – Interferes with speech
45 / 64
– Very comfy – Quite robust – Cheap – Easy to manufacture – Looks boss!
bend angle and causing light to leak-out
– Needs adaptive algorithm! Sensor output while saying “mama, papa”
46 / 64
– Aside from the latency? (need >50 fps) – Contrast with beards, balaclava’s; lighting (IR) – Powerful computer needed
– Readily-available algorithms for facial landmarking
47 / 64
– Works very well – Good accuracy
costume
– Need clear view of face
from a distance
– Complex algorithms
need powerful computers
Cara Motion Capture (www.vicon.com) DisneyResearchHub – Synthetic prior design for real time facial capture https://www.youtube.com/watch?v=w71vxi60SzM
48 / 64
landmark annotation
smoothing (Kalman)
– Filters-out all the little
motions
– Some overshoot
requirements and lighting not practical
RoboCow Industries
49 / 64
50 / 64
Close-Talking Cardioid-ish Microphone 3L Noise Reduction Feed-Back Canceller Parametric Equalizer Cross-Over Sound Effects Amplifier Tweeter Mid-Range Woofer
51 / 64
Close-Talking Cardioid-ish Microphone 3L Noise Reduction Feed-Back Canceller Parametric Equalizer Cross-Over Sound Effects Amplifier Tweeter Mid-Range Woofer
52 / 64
– Larson effect – Why there are few
costume voice systems
– Microphone design – Speaker design – Feed-back control
pitch shifting)
53 / 64
microphone and speaker
–
Cardioid mic + decent speaker design ~20 dB
–
Total: 30 dB system gain!
your voice, at about the same volume. (Or “big creature” volume)
–
Not “punk band in a suit”!
–
If you can speak loud, the suit can also be LOUD
“Robust and Efficient Implementation of the PEM–AFROW Algorithm for Acoustic Feedback Cancellation,” G. Rombouts, T. Van Waterschoot,
54 / 64
Close-Talking Cardioid-ish Microphone 3L Noise Reduction Feed-Back Canceller Parametric Equalizer Cross-Over Sound Effects Amplifier Tweeter Mid-Range Woofer
55 / 64
– But it often sounds bad (kinda incomprehensible)
commonly used on an Arduino) are NOT formant preserving
– This ruins the formant relationships in speech – A time-domain pitch shifter has to lock to F0 for that
– Help the algorithm and actually voice act!
56 / 64
Close-Talking Cardioid-ish Microphone 3L Noise Reduction Feed-Back Canceller Parametric Equalizer Cross-Over Sound Effects Amplifier Tweeter Mid-Range Woofer
57 / 64
muffled voice
effect of the costume head, speaker response, microphone, etc...
– REW to the rescue – With help from own
method for transfer function estimation
https://www.roomeqwizard.com/
58 / 64
Close-Talking Cardioid-ish Microphone 3L Noise Reduction Feed-Back Canceller Parametric Equalizer Cross-Over Sound Effects Amplifier Tweeter Mid-Range Woofer
59 / 64
for realism
the mouth
–
High frequencies do most for sound localization
–
Tweeter in the nose
–
Tweeters are small!
else (eg: muzzle, cheeks, forehead, chin, chest, shoulders)
(no directionality)
3-D Audio & Applied Acoustics Lab Princeton
60 / 64
required for proper sound
– Avoid comb filtering due
to acoustic short-circuit
microphone isolation
microphone (if cardioid)
Elliot Sound Products
61 / 64
62 / 64
63 / 64