Final projects
21 May: Class split into 4 sub-classes (for each TA). Each group gives a ~8 min presentation (each person~2 min)
- Motivation & background, which data?
- small Example,
- final outcome, (focused on method and data)
- difficulties,
Final projects 21 May: Class split into 4 sub-classes (for each TA). - - PowerPoint PPT Presentation
Final projects 21 May: Class split into 4 sub-classes (for each TA). Each group gives a ~8 min presentation (each person~2 min) Motivation & background, which data? small Example, final outcome, (focused on method and data)
1 Source Localization in an Ocean Waveguide Using Unsupervised ML 3 ML Methods for Ship Detection in Satellite Images 4 Transparent Conductor Prediction 4 Classify ships in San Francisco Bay using Planet satellite imagery 2 Fruit Recognition 3 Bone Age Prediction 1 Facial Expression Classification into Emotions 2 Urban Scene Segmentation for Autonomous Vehicles 1 Face Detection Using Deep Learning 2 Understanding the Amazon Rainforest from Space using NN 4 Mercedez Bench Test Time Estimation 3 Vegetation classification in Hyperspectral Images 4 Threat Detection with CNN 2 Plankton Classification Using ResNet and Inception V3 3 U-net on Biomedical Images 4 Image to Image Transformation using GAN 1 Dog Breed Classification Using CNN 1 Dog Breed Identification 2 Plankton Image Classification 3 Sunspot Detection
Undirected Tree Directed Tree Polytree
f = 750 Hz
6
Location 1: Prince - “Sign o’ the times” Location 1: Otis Redding - “Hard to handle”
Spectral coherence between i and j
i j
(Normalization: |X(f,t)|2=1) 30-microphone array
Homework'Deadline' tonight?' Do'homework'
Party'invitaNon?'
Do'I'have'friends'
Go'to'the'party' Read'a'book'
Hang'out'with' friends'
|
t1 t2 t3 t4 R1 R1 R2 R2 R3 R3 R4 R4 R5 R5 X1 X1 X1 X2 X2 X1 ≤ t1 X2 ≤ t2 X1 ≤ t3 X2 ≤ t4
and non-overlapping regions, R1, R2, . . . , RJ .
which is simply the mean of the response values for the training observations in Rj. The goal is to find boxes R1,...,RJ that minimize the RSS (residual sum square), given by where ! "#$is the mean response for the training observations within the jth box.
J
j=1
i∈Rj
X1 X2
We can write the model in the following form f(x) = E [y|x] =
M
wmI(x ∈ Rm) =
M
wmφ(x; vm) (16.4)
φ(x) directly from the input data. That is, we will create what we call an adaptive basis- function model (ABM), which is a model of the form f(x) = w0 +
M
wmφm(x) (16.3)
X1 X2
Classification and regression trees
f(x) =
M
cmI(x ∈ Rm). (9.10)
We use a sum of squares ∑"
#[%" − ' () ]+
R1(j, s) = {X|Xj ≤ s} and R2(j, s) = {X|Xj > s}. (9.12) Then we seek the splitting variable j and split point s that solve
min
j, s
c1
(yi − c1)2 + min
c2
(yi − c2)2 . (9.13)
Define a split: s
X1 X2
In a region Rm, the proportion of points in class k is
ˆ pk(Rm) = 1 nm X
xi∈Rm
1{yi = k}.
{ ∈ ≤ } { ∈ } We then greedily chooses j, s by minimizing the misclassification error argmin
j,s
⇣⇥ 1 − ˆ pc1(R1) ⇤ + ⇥ 1 − ˆ pc2(R2) ⇤⌘
Where c1 is the most common class in R1 (similar for c2)
0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 x1 x2
0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 x1 x2
The split in region Rm on requires Nm splits.
The bootstrap is a fundamental resampling tool in statistics. The Idea in the bootstrap is that we can estimate the true F by the so-called empirical distribution Fˆ Given the training data (xi, yi), i = 1,…n, the empirical distribution function =>a discrete probability distribution, putting equal weight (1/n) on observed training points
P ˆ
F
= (
1 n
if (x, y) = (xi, yi) for some i
i , y∗ i ), i = 1, . . . m
A bootstrap sample of size m from the training data is where each sample is drawn from uniformly at random from the training data with replacement This corresponds exactly to m independent draws from Fˆ. It approximates what we would see if we could sample more data from the true F. We often consider m = n, which is like sampling an entirely new training set.
From Ryan Tibshirani
B
b=1
Bagging helps decrease the misclassification rate of the classifier (evaluated on a large independent test set). Look at
k
B
b=1
1
k
k
k
B
b=1
k
k=1,...K
k
…From Ryan Tibshirani
10.1.4 Graph terminology
Before we continue, we must define a few basic terms, most of which are very intuitive. A graph G = (V, E) consists of a set of nodes or vertices, V = {1, . . . , V }, and a set
which we write G(s, t) = 1 to denote (s, t) ∈ E, that is, if s → t is an edge in the graph. If G(s, t) = 1 ifg G(t, s) = 1, we say the graph is undirected, otherwise it is directed. We usually assume G(s, s) = 0, which means there are no self loops. Here are some other terms we will commonly use:
pa(s) {t : G(t, s) = 1}.
ch(s) {t : G(s, t) = 1}.
{s} ∪ pa(s).
That is, the ancestors of t is the set of nodes that connect to t via a trail: anc(t) {s : s ❀ t}.
a node. That is, the descendants of s is the set of nodes that can be reached via trails from s: desc(s) {t : s ❀ t}.
connected nodes, nbr(s) {t : G(s, t) = 1 ∨ G(t, s) = 1}. For an undirected graph, we
write s ∼ t to indicate that s and t are neighbors (so (s, t) ∈ E is an edge in the graph).
the in-degree and out-degree, which count the number of parents and children.
we can get back to where we started by following edges, s1 − s2 · · · − sn − s1, n ≥ 2. If the graph is directed, we may speak of a directed cycle. For example, in Figure 10.1(a), there are no directed cycles, but 1 → 2 → 4 → 3 → 1 is an undirected cycle.
Figure 10.1(a) for an example.
Figure 10.1(a), we can use (1, 2, 3, 4, 5), or (1, 3, 2, 5, 4), etc.
which there are no directed cycles. If we allow a node to have multiple parents, we call it a polytree, otherwise we call it a moral directed tree.
their corresponding edges, GA = (VA, EA).
clique property. For example, in Figure 10.1(b), {1, 2} is a clique but it is not maximal, since we can add 3 and still maintain the clique property. In fact, the maximal cliques are as follows: {1, 2, 3}, {2, 3, 4}, {3, 5}.
Polynomial Plate
where
Shared prior
If are discrete, K-state variables, in general has O(KM) parameters.
The parameterized form requires only M + 1 parameters If the above is discrete variable 2M parameters
Note: this is the opposite of Example 1, with c unobserved.
Note: this is the opposite of Example 1, with c unobserved.
a) the arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C, or b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
joint distribution over all variables satisfies .