Unsupervised Learning II
George Konidaris gdk@cs.brown.edu
Fall 2019
Unsupervised Learning II George Konidaris gdk@cs.brown.edu Fall - - PowerPoint PPT Presentation
Unsupervised Learning II George Konidaris gdk@cs.brown.edu Fall 2019 Machine Learning Subfield of AI concerned with learning from data . Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell, 1997)
George Konidaris gdk@cs.brown.edu
Fall 2019
Subfield of AI concerned with learning from data. Broadly, using:
(Tom Mitchell, 1997)
Input: X = {x1, …, xn} Try to understand the structure of the data. E.g., how many types of cars? How can they vary?
inputs
Clustering Given:
Find:
Density Estimation Given:
Find:
Dimensionality Reduction Given:
Find:
|X0| << |X|
Vi and eigenvalues vi of C. Each Vi is a direction, and each vi is its importance - the amount
New data points: Xi = Xi − X
j
Xj m ˆ Xi = [V1, ..., Vp]Xi
Reconstruction:
¯ Xi = V1 ˆ Xi[1] + V2 ˆ Xi[2] + ... + Vp ˆ Xi[p]
real valued numbers
axes
Every data point is expressed as a point in a new coordinate frame. Equivalently: weighted sum of basis (eigenvector) functions.
Fundamental issue with PCA: Linear reconstruction. Can we use a nonlinear method for construction?
Yes, there are several.
x1 x2 h1 h2 h3
input layer hidden layer
σ(w · x + c)
w · x + c
regression
x1 x2 h1 h2 h3
input layer value computed
h1 = σ(wh1
1 x1 + wh1 2 x2 + wh1 3 )
σ(wh2
1 x1 + wh2 2 x2 + wh2 3 )
σ(wh3
1 x1 + wh3 2 x2 + wh3 3 )
x1, x2 ∈ [0, 1]
feed forward
σ(wo1
1 h1 + wo1 2 h2 + wo1 3 h3 + wo1 4 )
value computed
σ(wo2
1 h1 + wo2 2 h2 + wo2 3 h3 + wo2 4 )
Idea: train the network to reproduce the output.
x3 x4 h1 h2 h3 x2 x1 x5 x6 x3 x4 x2 x1 x5 x6
input error measured against input compressed representation
The compressed representation is sufficient to reproduce input.
x3 x4 h1 h2 h3 x2 x1 x5 x6 x3 x4 x2 x1 x5 x6
compressed representation
(wiki)
x3 x4 h1 h2 h3 x2 x1 x5 x6 x3 x4 x2 x1 x5 x6
pretraining
training
How helpful is this for classification?
[Erhan et al., 2010]
Denoising Autoencoders
via OpenDeep.org
Image completion
via Yijun Li
Yet another type! Latent Structure Learning What hidden structure explains the data? Given:
Find:
Common problem in Natural Language Processing. Collection of documents
Assume that they are about something. Specifically:
Bayes Net for describing topic models. There is a set of hidden topics, Z, and a set of words, W. Each topic zi has a conditional probability of each word wj appearing in a document: P(wj | zi)
z1 z2 zn …
w1 w2 w3 wm-1 wm
. . .
w1 w2 w3 wm-1 wm
. . .
w1 w2 w3 wm-1 wm
. . .
(wiki)
Each document is modeled as … A combination of topics
A collection of words
Goal:
AP corpus: 16k articles
Most common application of unsupervised learning. Given large corpus of data, what can be learned? Lots of subproblems:
Any unsupervised method can be applied in principle. Most common in industry:
“As Pole’s computers crawled through the data, he was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. More important, he could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages
One Target employee I spoke to provided a hypothetical example. Take a fictional Target shopper named Jenny Ward, who is 23, lives in Atlanta and in March bought cocoa-butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87 percent chance that she’s pregnant and that her delivery date is sometime in late August.”
https://www.technologyreview.com/s/412529/mapping-a-citys- rhythm/
So far, Jebara says, Sense Networks has categorized 20 types, or “tribes,” of people in cities, including “young and edgy,” “business traveler,” “weekend mole,” and “homebody.” These tribes are determined using three types of data: a person’s “flow,” or movements around a city; publicly available data concerning the company addresses in a city; and demographic data collected by the U.S. Census Bureau. If a person spends the evening in a certain neighborhood, it’s more likely that she lives in that neighborhood and shares some of its demographic traits.
http://www.tylervigen.com/spurious-correlations