Machine Learning for Signal Processing
Lecture 1: Signal Representations
Class 1. 29 August 2013 Instructor: Bhiksha Raj
29 Aug 2013 11-755/18-797 1
Machine Learning for Signal Processing Lecture 1: Signal - - PowerPoint PPT Presentation
Machine Learning for Signal Processing Lecture 1: Signal Representations Class 1. 29 August 2013 Instructor: Bhiksha Raj 29 Aug 2013 11-755/18-797 1 What is a signal A mechanism for conveying information Semaphores, gestures,
29 Aug 2013 11-755/18-797 1
– Semaphores, gestures, traffic lights..
– from a source to a destination – about a real world phenomenon
29 Aug 2013 11-755/18-797 2
29 Aug 2013 11-755/18-797 3
– Or sets of numbers (for color images)
– 0 is minimum / black, 1 is maximum / white – Position / order is important
29 Aug 2013 11-755/18-797 4
Pixel = 0.5
29 Aug 2013 11-755/18-797 5
– Learning patterns in data
analysis
– Learning to classify between different kinds of data
– Learning to predict data
29 Aug 2013 11-755/18-797 6
– Such as audio, images, video, etc.
– Characterizing signals
– Detecting signals
– Recognize signals
– Predict signals – Etc..
29 Aug 2013 11-755/18-797 7
– Linear Algebra, Signal Processing, Probability
– Methods of modelling, estimation, classification, prediction
– Sounds:
mixtures, Music retrieval
– Images:
– Other forms of data – Representation – Sensing and recovery.
progresses
29 Aug 2013 11-755/18-797 8
– Fourier transforms, linear systems, basic statistical signal processing
– Definitions, vectors, matrices, operations, properties
– Basics: what is an random variable, probability distributions, functions of a random variable
– Learning, modelling and classification techniques
29 Aug 2013 11-755/18-797 9
29 Aug 2013 11-755/18-797 10
29 Aug 2013 11-755/18-797 11
29 Aug 2013 11-755/18-797 12
– Mini projects – Will be assigned during course – Minimum 3, Maximum 4 – You will not catch up if you slack on any homework
– Will be assigned early in course – Dec 5: Poster presentation for all projects, with demos (if possible)
29 Aug 2013 11-755/18-797 13
29 Aug 2013 11-755/18-797 14
– Room 6705 Hillman Building – bhiksha@cs.cmu.edu – 412 268 9826
– James Ding
– Varun Gupta
– Bhiksha Raj: Wed 3:30-4.30 – TA: TBD
29 Aug 2013 11-755/18-797 15
Hillman Windows My office Forbes
29 Aug 2013 11-755/18-797 16
29 Aug 2013 11-755/18-797 17
29 Aug 2013 11-755/18-797 18
29 Aug 2013 11-755/18-797 19
moving through the air
– Essentially by producing puff after puff of air – Any sound producing mechanism actually produces pressure waves
– Highs push it in, lows suck it out – We sense these motions of our eardrum as “sound”
29 Aug 2013 11-755/18-797 20
Pressure highs Spaces between arcs show pressure lows
29 Aug 2013 11-755/18-797 21
– On the microphone
– Many ways to do this
29 Aug 2013 11-755/18-797 22
computer have anything to do with the recorded sound really?
– Recreate the sense of sound
signal
produce a pressure wave
– That we sense as sound
29 Aug 2013 11-755/18-797 23
computer have anything to do with the recorded sound really?
– Recreate the sense of sound
signal
produce a pressure wave
– That we sense as sound
29 Aug 2013 11-755/18-797 24
sinusoids with frequency
many sinusoids of different frequencies
– Frequency is a physically motivated unit – Each hair cell in our inner ear is tuned to specific frequency
components
– We can hear frequencies up to 16000Hz
be heard by children and some young adults
29 Aug 2013 11-755/18-797 25
10 20 30 40 50 60 70 80 90 100
0.5 1
Pressure A sinusoid
29 Aug 2013 11-755/18-797 26
Time in secs.
29 Aug 2013 11-755/18-797 27
29 Aug 2013 11-755/18-797 28
Time Frequency 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.5 1 1.5 2 x 10
4Time Frequency 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 2000 4000 6000 8000 10000 Time Frequency 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1000 2000 3000 4000 5000
44.1kHz SR, is ok 22kHz SR, aliasing! 11kHz SR, double aliasing!
at 44kHz at 22kHz at 11kHz at 5kHz at 4kHz at 3kHz
– And then some – Cannot control the rate of variation of pressure waves in nature
– Cut off all frequencies above sampling.frequency/2 – E.g., to sample at 44.1Khz, filter the signal to eliminate all frequencies above 22050 Hz
29 Aug 2013 11-755/18-797 29
Antialiasing Filter Sampling Analog signal Digital signal
29 Aug 2013 11-755/18-797 30
– The pressure wave can take any value (within limits) – The diaphragm can also move continuously – The electrical signal from the diaphragm has continuous variations
– Numbers can only be stored to finite resolution – E.g. a 16-bit number can store only 65536 values, while a 4-bit number can store only 16 values – To store the sound wave on the computer, the continuous variation must be “mapped” on to the discrete set of numbers we can store
29 Aug 2013 11-755/18-797 31
Signal Value Bit sequence Mapped to S > 2.5v 1 1 * const S <=2.5v
29 Aug 2013 11-755/18-797 32
Original Signal Quantized approximation
Signal Value Bit sequence Mapped to S >= 3.75v 11 3 * const 3.75v > S >= 2.5v 10 2 * const 2.5v > S >= 1.25v 01 1 * const 1.25v > S >= 0v
29 Aug 2013 11-755/18-797 33
Original Signal Quantized approximation
29 Aug 2013 11-755/18-797 34
29 Aug 2013 11-755/18-797 35
29 Aug 2013 11-755/18-797 36
29 Aug 2013 11-755/18-797 37
Signal Value Bits Mapped to S >= 3.75v 11 3 * const 3.75v > S >= 2.5v 10 2 * const 2.5v > S >= 1.25v 01 1 * const 1.25v > S >= 0v Signal Value Bits Mapped to S >= 4v 11 4.5 * const 4v > S >= 2.5v 10 3.25 * const 2.5v > S >= 1v 01 1.25 * const 1.0v > S >= 0v 0.5 * const
29 Aug 2013 11-755/18-797 38
At the sampling instant, the actual value of the
Values entirely outside the range are quantized to
29 Aug 2013 11-755/18-797 39
Quantization levels are non-uniformly spaced At the sampling instant, the actual value of the
Values entirely outside the range are quantized to
Original Uniform Nonuniform
29 Aug 2013 11-755/18-797 40
UPON BEING SAMPLED AT ONLY 3 BITS (8 LEVELS)
29 Aug 2013 11-755/18-797 41
There is a lot more action in the central region than outside. Assigning only four levels to the busy central region and four
Assigning more levels to the central region and less to the outer
for the same storage
29 Aug 2013 11-755/18-797 42
Assigning more levels to the central region and less to the outer
region can give better fidelity for the same storage
29 Aug 2013 11-755/18-797 43
Assigning more levels to the central region and less to the outer
region can give better fidelity for the same storage
Uniform Non-uniform
wider farther away
– The curve that the steps are drawn on follow a logarithmic law:
sampling as 12bits of uniform sampling
29 Aug 2013 11-755/18-797 44
Nonlinear Uniform
Analog value quantized value Analog value quantized value
– Linear PCM, Mu-law, A-law,
– I.e. map the bits onto the number on the right column – This mapping is typically provided by a table computed from the sample compression function – No lookup for data stored in PCM
– http://www.speech.cs.cmu.edu/comp.speech/Section2/Q2.7.html
29 Aug 2013 11-755/18-797 45
Signal Value Bits Mapped to S >= 3.75v 11 3 3.75v > S >= 2.5v 10 2 2.5v > S >= 1.25v 01 1 1.25v > S >= 0v Signal Value Bits Mapped to S >= 4v 11 4.5 4v > S >= 2.5v 10 3.25 2.5v > S >= 1v 01 1.25 1.0v > S >= 0v 0.5
29 Aug 2013 11-755/18-797 46
29 Aug 2013 11-755/18-797 47
29 Aug 2013 11-755/18-797 48
Basic Neuroscience: Anatomy and Physiology Arthur C. Guyton, M.D. 1987 W.B.Saunders Co.
Retina
29 Aug 2013 11-755/18-797 49
http://www.brad.ac.uk/acad/lifesci/optometry/resources/modules/stage1/pvp1/Retina.html
– Fast – Sensitive – Grey scale – predominate in the periphery
– Slow – Not so sensitive – Fovea / Macula – COLOR!
29 Aug 2013 11-755/18-797 50
Basic Neuroscience: Anatomy and Physiology Arthur C. Guyton, M.D. 1987 W.B.Saunders Co.
– The region immediately surrounding the fovea is the macula
51
29 Aug 2013 11-755/18-797 52
(From Foundations of Vision, by Brian Wandell, Sinauer Assoc.)
29 Aug 2013 11-755/18-797 53
Wavelength in nm Normalized reponse
29 Aug 2013 11-755/18-797 54
29 Aug 2013 11-755/18-797 55
29 Aug 2013 11-755/18-797 56
29 Aug 2013 11-755/18-797 57
29 Aug 2013 11-755/18-797 58
29 Aug 2013 11-755/18-797 59
29 Aug 2013 11-755/18-797 60
– Sufficient to trigger each of the three cone types in a manner that produces the sensation of the desired color
– Some new-world monkeys are tetrachromatic
– By appropriate combinations of these colors, the cones can be excited to produce a very large set of colours
– How many colours? …
29 Aug 2013 11-755/18-797 61
Wright and John Guild
– Subjects adjusted x,y,and z on the right of a circular screen to match a colour on the left
sensors
– X + Y + Z is 1.0
– The outer curve represents monochromatic light
– The lower line is the line of purples
– The newer charts are less popular
29 Aug 2013 11-755/18-797 62 International council on illumination, 1931
– Colours outside this area cannot be matched by additively combining only 3 colours
would have a differently restricted area
coordinate of one of the three “primary” colours used in images
fraction of our visual acuity
– Also affected by the quantization of levels
29 Aug 2013 11-755/18-797 63
– Each number represents the intensity of the image at a specific location in the image – Implicitly, R = G = B at all locations
– The matrices represent different things in different representations – RGB Colorspace: Matrices represent intensity of Red, Green and Blue – CMYK Colorspace: Cyan, Magenta, Yellow – YIQ Colorspace.. – HSV Colorspace..
29 Aug 2013 11-755/18-797 64
29 Aug 2013 11-755/18-797 65
R = G = B. Only a single number need be stored per pixel
29 Aug 2013 11-755/18-797 66
29 Aug 2013 11-755/18-797 67
From: Digital Image Processing, by Gonzales and Woods, Addison Wesley, 1992
29 Aug 2013 11-755/18-797 68
– Tonescale to change image brightness – Threshold to reduce the information in an image – Colorspace operations
29 Aug 2013 11-755/18-797 69
29 Aug 2013 11-755/18-797 70
29 Aug 2013 11-755/18-797 71
29 Aug 2013 11-755/18-797 72
29 Aug 2013 11-755/18-797 73
29 Aug 2013 11-755/18-797 74
29 Aug 2013 11-755/18-797 75
29 Aug 2013 11-755/18-797 76
29 Aug 2013 11-755/18-797 77
29 Aug 2013 11-755/18-797 78
29 Aug 2013 11-755/18-797 79
29 Aug 2013 11-755/18-797 80
29 Aug 2013 11-755/18-797 81 Blue
– Adding equal parts of red, green and blue creates white
– Clue – paint colouring is subtractive..
– The base is white – Masking it with equal parts of C, M and Y creates Black – Masking it with C and Y creates Green
– Masking it with M and Y creates Red
– Masking it with M and C creates Blue
– Designed specifically for printing
29 Aug 2013 11-755/18-797 82
– Each paint masks out some colours – Mixing paint subtracts combinations of colors – Paintings represent subtractive colour masks
– How do you think he did it?
29 Aug 2013 11-755/18-797 83
29 Aug 2013 11-755/18-797 84
.299 .587 .114 .596 .275 .321 .212 .523 .311 Y R I G Q B 29 Aug 2013 11-755/18-797 85
29 Aug 2013 11-755/18-797 86
– From http://wikipedia.org/
– In RGB must be done on all three
– A black and white TV only needs Y
29 Aug 2013 11-755/18-797 87
29 Aug 2013 11-755/18-797 88
0 1 2 3 4 amplitude frequency (MHz) Luminance Chrominance
Understanding image perception allowed NTSC to add color to the black and white television signal. The eye is more sensitive to I than Q, so lesser bandwidth is needed for Q. Both together used much less than Y, allowing for color to be added for minimal increase in transmission bandwidth.
29 Aug 2013 11-755/18-797 89
The HSV Colour Model By Mark Roberts http://www.cs.bham.ac.uk/~mer/colour/hsv.html
Blue
– 0 = Black – 1 = Max (white at S = 0)
– As H goes from 0 (Red) to 360, it represents a different combinations of 2 colors
29 Aug 2013 11-755/18-797 90
29 Aug 2013 11-755/18-797 91
Max is the maximum of (R,G,B) Min is the minimum of (R,G,B)
29 Aug 2013 11-755/18-797 92
H S V
29 Aug 2013 11-755/18-797 93
29 Aug 2013 11-755/18-797 94
29 Aug 2013 11-755/18-797 95
– j may be time, position, etc.. – Usually continuously valued
– ; Q is the space of all j – K( j) is a measurement kernel – Ideally a delta (which takes non-zero value only at the desired j)
– But in reality not
29 Aug 2013 11-755/18-797 96
Q
29 Aug 2013 11-755/18-797 97