The Promise and Perils of Big Data Some Slides from A. Efros and A. - PowerPoint PPT Presentation

Step 4: Score Word-String Candidates • Scoring of candidates based on: – Proximity (minimize extraneous words in target n-gram ≈ precision) – Number of word matches (maximize coverage ≈ recall) ) – Regular words given more weight than function words – Combine results (e.g., optimize F 1 or p-norm or …) Target Word-String Candidates “ Regular ” Words Word Matches Proximity Total Scoring Scoring T3-b T(x) T2-d T(x) T(x) T6-c 3rd 3rd --- 3rd 3rd T4-a T6-b T(x) T2-c T3-a 1st 2st --- 2nd 1st T3-c T2-b T4-e T5-a T6-a 1st 1st --- 1st 1st Slide by Jaime Carbonell

Step 5: Select Candidates Using Overlap   (Propagate context over entire sentence) T(x1) T2-d T3-c T(x2) T4-b Word-String 1 T(x1) T(x1) T3-c T2-b T4-e T3-c T2-b T4-e Candidates T(x2) T(x2) T4-a T6-b T(x3) T4-a T6-b T(x3) T2-c T2-c T3-b T(x3) T3-b T(x3) T3-b T(x3) T2-d T(x5) T2-d T(x5) T2-d T(x5) T(x6) T(x6) T(x6) T6-c T6-c T6-c Word-String 2 T4-a T6-b T(x3) T4-a T6-b T(x3) T4-a T6-b T(x3) T2-c T3-a T2-c T3-a T2-c T3-a Candidates T3-c T2-b T4-e T5-a T6-a T3-c T2-b T4-e T5-a T6-a T3-c T2-b T4-e T5-a T6-a T2-b T4-e T5-a T6-a T(x8) T2-b T4-e T5-a T6-a T(x8) Word-String 3 T6-b T(x11) T2-c T3-a T(x9) Candidates T6-b T(x3) T6-b T(x3) T2-c T3-a T(x8) T2-c T3-a T(x8) Slide by Jaime Carbonell

Step 5: Select Candidates Using Overlap Best translations selected via maximal overlap T(x2) T4-a T6-b T(x3) T2-c T4-a T6-b T(x3) T2-c T3-a Alternative 1 T6-b T(x3) T2-c T3-a T(x8) T(x2) T4-a T6-b T(x3) T2-c T3-a T(x8) T(x1) T3-c T2-b T4-e T3-c T2-b T4-e T5-a T6-a Alternative 2 T2-b T4-e T5-a T6-a T(x8) T(x1) T3-c T2-b T4-e T5-a T6-a T(x8) Slide by Jaime Carbonell

A (Simple) Real Example of Overlap Flooding � N-gram fidelity Overlap � Long range fidelity a United States soldier N-grams United States soldier died generated from soldier died and two others Flooding died and two others were injured two others were injured Monday N-grams connected via a United States soldier died and two others were injured Monday Overlap Slide by Jaime Carbonell

Texture Synthesis

So, how do we use big data?

Two ways to use Lots of Data Brute Force Vision: Find See what different subsets that needle in the of data think of you haystack and disregard the rest (a.k.a. kNN)

kNN matching is great… • because we live in a (mostly) boring world!

Lots Of Images A. Torralba, R. Fergus, W.T .Freeman. PAMI 2008

Lots Of Images

Automatic Colorization Result Grayscale input High resolution Colorization of input using average A. Torralba, R. Fergus, W.T .Freeman. 2008

im2gps Instead of using objects labels, the web provides other kinds of metadata associate to large collections of images 20 million geotagged and geographic text-labeled images Hays & Efros. CVPR 2008

im2gps Hays & Efros. CVPR 2008

Image completion Instead, generate proposals using millions of images Input output 16 nearest neighbors   (gist+color matching) Hays, Efros, 2007

With a good image similarity   and a lot of data… Nearest neighbors Input image 22,000 LabelMe scenes Hays, Efros, Siggraph 2006 Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

With a good image similarity   and a lot of data… Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

Outputs Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

While many scenes are boring… Slide by Antonio Torralba

Some scenes are unique Slide by Antonio Torralba

Dealing with sparse data (rare scenes) • better similarity

Medici Fountain, Paris 83

Medici Fountain, Paris (winter) 86

O UR G OAL 92

Input Query Top Matches 94

I MPORTANT P ARTS ? Input Query Important Parts 97

Top Matches Input Query 98

“Data-driven Uniqueness” 99

Search using Images Input Query Top Matches 100

The Promise and Perils of Big Data Some Slides from A. Efros and A. - PowerPoint PPT Presentation

The Promise and Perils of Big Data Some Slides from A. Efros and A. Torralba Why do we need data? Most problems in vision are ambiguous and hard. 2D -> 3D Segmentation/Edges So, how do we solve these problems? Magic of data !

PIERCE PROMISE PIERCE PROMISE PIERCE PROMISE PIERCE PROMISE PIERCE PROMISE What is Pierce

Outline Background on Ketamine The Promise and Perils of Ketamine The Promise

Living the Promise: Living the Promise: Living the Promise: Living the Promise: A Collaborative

Past Perils Promise Happy Birthday Zoning: 1916-2016 Recent Challenges to Zoning Case Law

PROMISE INDIANA Technical Aspects How Promise Works February 23, 2017 HOW PROMISE WORKS

Businesses and Tax The Perils of Perception October 2015 Business and Tax Perils of

The Promise and Perils of AI and Big Data in the City October 29, 2019 Martino Tran PhD

The Perils of Poor Communication Paul Kenny Pensions Ombudsman Perils of Poor communication

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Living the Promise: Living the Promise: Building the Future Building the Future Dale Braun

The Promise and Perils of Data Science in the Wild Data Science & Society Seminar | eScience

The Promise and Perils of Real-World EHR Data Mark Hoffman, Ph.D. Chief Research Information

The promise and perils of jacksonian America 1836 1 1815 Timeline 1816 - 1836 1836 1815

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

PCP Lecture 26 And Hardness of Approximation 1 Promise Problems 2 Promise Problems Decision

2016-2017 School Board Presentation October 10, 2017 Our Equity Promise Our promise:

Scene Recognition Scene Recognition Adriana Kovashka Adriana Kovashka UTCS, PhD student UTCS,

Illustration of the Capability and Limits of Visual Perception Aude Oliva Brain and Cognitive

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Introduction to transcriptome analysis using high- throughput sequencing technologies D.

Learning Deep Features for Scene Recognition using Places Database Bolei Zhou, Agata Lapedriza,

gOlogy: impact of -O* on -g Alexandre Oliva aoliva@redhat.com http://people.redhat.com/~aoliva/

Mo Movin ving F From G m Goo ood In Intentio tions ns to o Co Concrete A Actio tion:

Network-based and Client-based DMM solutions using Mobile IP mechanisms

The Promise and Perils of Big Data Some Slides from A. Efros and A. - PowerPoint PPT Presentation

The Promise and Perils of Big Data Some Slides from A. Efros and A. Torralba Why do we need data? Most problems in vision are ambiguous and hard. 2D -> 3D Segmentation/Edges So, how do we solve these problems? Magic of data !

PIERCE PROMISE PIERCE PROMISE PIERCE PROMISE PIERCE PROMISE PIERCE PROMISE What is Pierce

Outline Background on Ketamine The Promise and Perils of Ketamine The Promise

Living the Promise: Living the Promise: Living the Promise: Living the Promise: A Collaborative

Past Perils Promise Happy Birthday Zoning: 1916-2016 Recent Challenges to Zoning Case Law

PROMISE INDIANA Technical Aspects How Promise Works February 23, 2017 HOW PROMISE WORKS

Businesses and Tax The Perils of Perception October 2015 Business and Tax Perils of

The Promise and Perils of AI and Big Data in the City October 29, 2019 Martino Tran PhD

The Perils of Poor Communication Paul Kenny Pensions Ombudsman Perils of Poor communication

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Living the Promise: Living the Promise: Building the Future Building the Future Dale Braun

The Promise and Perils of Data Science in the Wild Data Science &amp; Society Seminar | eScience

The Promise and Perils of Real-World EHR Data Mark Hoffman, Ph.D. Chief Research Information

The promise and perils of jacksonian America 1836 1 1815 Timeline 1816 - 1836 1836 1815

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

PCP Lecture 26 And Hardness of Approximation 1 Promise Problems 2 Promise Problems Decision

2016-2017 School Board Presentation October 10, 2017 Our Equity Promise Our promise:

Scene Recognition Scene Recognition Adriana Kovashka Adriana Kovashka UTCS, PhD student UTCS,

Illustration of the Capability and Limits of Visual Perception Aude Oliva Brain and Cognitive

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Introduction to transcriptome analysis using high- throughput sequencing technologies D.

Learning Deep Features for Scene Recognition using Places Database Bolei Zhou, Agata Lapedriza,

gOlogy: impact of -O* on -g Alexandre Oliva aoliva@redhat.com http://people.redhat.com/~aoliva/

Mo Movin ving F From G m Goo ood In Intentio tions ns to o Co Concrete A Actio tion:

Network-based and Client-based DMM solutions using Mobile IP mechanisms

The Promise and Perils of Data Science in the Wild Data Science & Society Seminar | eScience