R SciDB Julia Mert Terzihan Zhixiong Chen R 1. What is R In - PowerPoint PPT Presentation

R SciDB Julia Mert Terzihan Zhixiong Chen

1. What is R ● In the 1970s, at Bell Labs, John Chambers developed a statistical programming language – S ○ The aim was to turn ideas into software, quickly and faithfully ○ R is an implementation of S, initially written by R obert Gentleman and R oss Ihaka in 1993. ● R is a language and environment for statistical computing and graphics

2. Features ● Object Oriented ○ similar to Python ● Optimized for Vector/Matrix operation ○ similar to Matlab ● Fully statistical analysis support ● Part of the GNU FREE software project ● Over 4300 user contributed packages

3. Study Plan ● Scalar ● Vector ● Matrix ● Data Frame ● The apply Function ● Statistics ● Plot

Scalar ● Use R as a calculator > 4+6 [1] 10 > x<-6 /* '<-' means to assign value 6 to object x */ > y<-4 > x+y [1] 10 > x<-"Hello world" /* String support */ > x [1] "Hello world"

Vector ● Create a vector > x<-c(5,9,1,0) /* function c is to concatenate individual elements */ > x [1] 5 9 1 0 > x<-1:10 /* generate the numbers from 1 to 10 */ > x [1] 1 2 3 4 5 6 7 8 9 10 > seq(1,9,by=2) /* generate the numbers stepping by 2 from 1 to 9 */ [1] 1 3 5 7 9 > seq(8,20,length=6) /*evenly generate 6 numbers from 8 to 20 inclusively */ [1] 8.0 10.4 12.8 15.2 17.6 20.0

Vector ● Access a vector, indexing from 1 and using [] > x<-rep(1:3,6) /* repeatedly generating numbers from 1 to 3 6 times */ > x [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 > x[1:9] /* Get the numbers indexed from 1 to 9 */ [1] 1 2 3 1 2 3 1 2 3 > x[c(3,6,9)] /* Get the numbers indexed as 3, 6, and 9 */ [1] 3 3 3 > x[-c(3,6,9)] /* '-' is to exclude particular elements */ [1] 1 2 1 2 1 2 1 2 3 1 2 3 1 2 3

Vector ● Access a vector, masking > x [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 > mask = x == 3 /* Create a mask */ > mask /* mask is stored as a vector of logic(boolean) values */ [1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE [16] FALSE FALSE TRUE > x[mask] [1] 3 3 3 3 3 3 > x[!mask] /* '!' is to reverse each logic value in the mask vector */ [1] 1 2 1 2 1 2 1 2 1 2 1 2

Matrix ● Create a matrix > x<-c(5,7,9) > y<-c(6,3,4) > z<-cbind(x,y) /* bind two vectors as a column-wise matrix */ > z x y [1,] 5 6 [2,] 7 3 [3,] 9 4 > matrix(c(5,7,9,6,3,4),nrow=3) /* create a 3-row matrix from the vector */ > diag(3) /* identity*/ [,1] [,2] [,1] [,2] [,3] [1,] 5 6 [1,] 1 0 0 [2,] 7 3 [2,] 0 1 0 [3,] 9 4 [3,] 0 0 1

Matrix ● Matrix Operations, component-wise > z<-matrix(c(5,7,9,6,3,4),nrow=3,byrow=T) > z [,1] [,2] [1,] 5 7 [2,] 9 6 [3,] 3 4 > y<-matrix(c(1,3,0,9,5,-1),nrow=3,byrow=T) > y [,1] [,2] > y+z > y*z [1,] 1 3 [,1] [,2] [,1] [,2] [2,] 0 9 [1,] 6 10 [1,] 5 21 [3,] 5 -1 [2,] 9 15 [2,] 0 54 [3,] 8 3 [3,] 15 -4

Matrix ● Matrix Operations, based on definition > y [,1] [,2] > t(z) /*transpose*/ [1,] 1 3 [,1] [,2] [2,] 0 9 [1,] 3 -2 [3,] 5 -1 [2,] 4 6 > z<-matrix(c(3,4,-2,6),nrow=2,byrow=T) > z [,1] [,2] [1,] 3 4 [2,] -2 6 > solve(z) /* inverse */ > y%*%x /*multiplication*/ [,1] [,1] [,2] [1,] 0.23076923 -0.1538462 [1,] 26 [2,] 0.07692308 0.1153846 [2,] 63 [3,] 18

Matrix ● Access a matrix, indexing > y [,1] [,2] [1,] 1 3 [2,] 0 9 [3,] 5 -1 > y[1,2] /* fetch a specific value */ [1] 3 > y[c(1,2),] /* use vectors */ > y[1:2,] /* fetch rows */ [,1] [,2] [,1] [,2] [1,] 1 3 [1,] 1 3 [2,] 0 9 [2,] 0 9 > y[,2] /* fetch columns */ [1] 3 9 -1

Matrix ● Access a matrix, masking > y [,1] [,2] [1,] 1 3 [2,] 0 9 [3,] 5 -1 > mask<-y>0 > mask [,1] [,2] [1,] TRUE TRUE [2,] FALSE TRUE [3,] TRUE FALSE > y[mask] [1] 1 5 3 9

Data Frame ● Create, like a table in database mydata <- data.frame(col1, col2, col3,...) > patientID <- c(1, 2, 3, 4) > age <- c(25, 34, 28, 52) > diabetes <- c("Type1", "Type2", "Type1", "Type1") > status <- c("Poor", "Improved", "Excellent", "Poor") > patientdata <- data.frame(patientID, age, diabetes, status) > patientdata patientID age diabetes status 1 1 25 Type1 Poor 2 2 34 Type2 Improved 3 3 28 Type1 Excellent 4 4 52 Type1 Poor

Data Frame ● Access a data frame > patientdata patientID age diabetes status 1 1 25 Type1 Poor 2 2 34 Type2 Improved 3 3 28 Type1 Excellent 4 4 52 Type1 Poor > patientdata[1:3,] /*Treat it as a special matrix*/ patientID age diabetes status 1 1 25 Type1 Poor 2 2 34 Type2 Improved 3 3 28 Type1 Excellent > patientdata$patientID /*Access using column name*/ [1] 1 2 3 4

The apply Function ● Apply a function to data structure elements > y [,1] [,2] [1,] 1 3 [2,] 0 9 [3,] 5 -1 > func <- function(x){ /*define a function func: 1+0.1*y */ + x = x+10 + return (x/10) + } > apply(y,c(1,2),func) /* apply the func on all elements in matrix y */ [,1] [,2] [1,] 1.1 1.3 [2,] 1.0 1.9 [3,] 1.5 0.9

Statistics ● Some handy distributions > dnorm(c(3,2),0,1) /* normal distribution */ [1] 0.004431848 0.053990967 > x<-seq(-5,10,by=.1) > dnorm(x,3,2) [1] 6.691511e-05 8.162820e-05 9.932774e-05 1.205633e-04 1.459735e-04 1.762978e-04 [7] 2.123901e-04 2.552325e-04 3.059510e-04 3.658322e-04 … ... d*:density function p*:distribution function q*:quantile function (the inverse distribution function) dnorm,pnorm,qnorm dt,pt,qt binomial,exponential,posson,gamma

Statistics ● Simulations to randomly simulate 100 observations from the N(3,4) > rnorm(100,3,2) [1] 2.75259237 0.99932968 0.63348792 3.48292324 2.60880274 3.78258364 5.68923819 [8] 0.08003764 1.93627124 2.53843236 3.52610754 5.31448617 2.73017110 3.35264165 …… rnorm,rt,rpois

Plot ● ploting x*sin(x) > f <- function(x) { /* define the function f(x)=x*sin(x) */ + return (x*sin(x)) + } > plot(f,-20*pi,20*pi) /* plot f between -20*pi and 20*pi */ > abline(0,1,lty=2) /* lty = 2 means dash line */ /* add a dash line with intercept 0 and slope 1 */ > abline(0,-1,lty=2) /* add a dash line with intercept 0 and slope -1 */

More? ● The help() function ● Refer to the official manual ○ http://cran.r-project.org/manuals.html ● A wonderful 4-week long online course ○ http://blog.revolutionanalytics. com/2012/12/coursera-videos.html ● A good book ○ ‘R in Action’ by Robert Kabacoff ● Google

4. Bonus ● Installation ○ Tested on Ubuntu12.04http://livesoncoffee. wordpress.com/2012/12/09/installing-r-on-ubuntu- 12-04/ ○ ignore some error like “Unknown media type in type 'all/all'” ● RStudio ○ a wonderful IDE for R programmers ○ http://www.rstudio.com/

Ricardo Integrating R and Hadoop

Motivation ● Statistical software, such as R, provides rich functionality for data analysis and modeling, but can handle only limited amounts of data ● Data management systems, such as hadoop, can handle large data, but provides insufficient analytical functionality Union is strength!

Solution ● Ricardo decompose data-analysis algorithms into ○ parts executed by the R statistical analysis system ○ parts handled by the Hadoop data management system.

Components ● R ○ The core of statistical analysis ● Large-Scale Data Management Systems ○ HDFS ○ Work with dirty, semi/un-structured data ○ Massive data storage, manipulation and parallel processing ● Jaql ○ A JSON Query Language ○ The declarative interface to Hadoop for Ricardo ○ Like Pig, Hive

Architecture

Conclusion ● The current version has poor performance

Overview of SciDB Large Scale Array Storage, Processing and Analysis

Context 1. Background and Motivation 2. Features and Functionality 3. Data Definition 4. Data Manipulation 5. Architecture

What is SciDB? ● Massively parallel storage manager ● Able to parallelize large scale array processing algorithms

1. Background and Motivation ● Modern scientific data differs from business data in three important respects: ○ Sensor arrays consist of rectangular ‘arrays’ of individual sensors ○ Scientific analysis requires sophisticated data processing methods ■ Ex: Noisy data needs to be ‘cleaned’ ○ Data generated by modern scientific instruments is extremely large ● Array Data Model is more desirable in scientific domains ○ With notions of adjacency or neighborhood ○ Ordering is fundamental ● Complexity of data processing needs a much more flexible data management platform ○ A different kind of DBMS

R SciDB Julia Mert Terzihan Zhixiong Chen R 1. What is R In - PowerPoint PPT Presentation

R SciDB Julia Mert Terzihan Zhixiong Chen R 1. What is R In the 1970s, at Bell Labs, John Chambers developed a statistical programming language S The aim was to turn ideas into software, quickly and faithfully R is an

Julia for Infrastructure Ajay Mendez ajay@kinant.com Agenda - Julia for Startups - Our

Topics Part I: BFAST R package optimizations Part rt II: II: Sc Scala lable le EO data

SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB Marius Appel marius.appel@uni-muenster.de EO

How Julia Goes Fast Leah Hanson Main Points 1. Design choices make Julia fast. 2. Design and

On computability and computational complexity of Julia sets Artem Dudko IM PAN CAFT 2018

Julia tutorial Introduction Some useful pointers Getting started Julia syntax

Julia, my new optimization friend Intro to the Julia programming language, for MATLAB users

Fetal Radiation Shield Emily Knott Maura McDonagh Julia Mauser Lizzy Schmida Ethan Wen Julia

Case Studies: Bridging Learning with Real World Challenges Julia Ivy Chris Unger Mary

Principles of Protection: 11/01/2017 Cybersecurity Julia Breaux Data Protection William

Julia A Fresh Approach to GPU Computing What is Julia? function mandel (z) c = z Technical

National Parks System National Parks System Julia Miranda Julia Miranda General Director

Lab calibration measurements with radioactive sources Julia Rietenbach 28. August 2012 Julia

1 Julia, my new computing friend? | 14 June 2018, IETR@Vannes | By: L. Besson & P.

Laminations of the Unit Disk and Cubic Julia Sets John C. Mayer Department of Mathematics

making Julia more inclusive and accessible Jane Herriman Julia Computing & Caltech Our

Robustness Techniques for Speech Recognition Berlin Chen, 2004 References: 1. X. Huang et al.

Spoken Language Understanding strategies developed at the University of Avignon: For a better

Monte Carlo methods Draw random samples from the desired distribution Yield a stochastic

SurFi: Detecting Surveillance Camera Looping Attacks with Wi-Fi Channel State Information Nitya

Introduction to the R Language Data Types and Basic Operations Computing for Data Analysis 1 /

Wh y do missing v al u es e x ist ? FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN

Robust code R Functions What do these calls do? > df[, vars] > subset(df, x == y) >

List Processing in SML CS251 Programming Languages Spring 2016, Lyn Turbak

R SciDB Julia Mert Terzihan Zhixiong Chen R 1. What is R In - PowerPoint PPT Presentation

R SciDB Julia Mert Terzihan Zhixiong Chen R 1. What is R In the 1970s, at Bell Labs, John Chambers developed a statistical programming language S The aim was to turn ideas into software, quickly and faithfully R is an

Julia for Infrastructure Ajay Mendez ajay@kinant.com Agenda - Julia for Startups - Our

Topics Part I: BFAST R package optimizations Part rt II: II: Sc Scala lable le EO data

SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB Marius Appel marius.appel@uni-muenster.de EO

How Julia Goes Fast Leah Hanson Main Points 1. Design choices make Julia fast. 2. Design and

On computability and computational complexity of Julia sets Artem Dudko IM PAN CAFT 2018

Julia tutorial Introduction Some useful pointers Getting started Julia syntax

Julia, my new optimization friend Intro to the Julia programming language, for MATLAB users

Fetal Radiation Shield Emily Knott Maura McDonagh Julia Mauser Lizzy Schmida Ethan Wen Julia

Case Studies: Bridging Learning with Real World Challenges Julia Ivy Chris Unger Mary

Principles of Protection: 11/01/2017 Cybersecurity Julia Breaux Data Protection William

Julia A Fresh Approach to GPU Computing What is Julia? function mandel (z) c = z Technical

National Parks System National Parks System Julia Miranda Julia Miranda General Director

Lab calibration measurements with radioactive sources Julia Rietenbach 28. August 2012 Julia

1 Julia, my new computing friend? | 14 June 2018, IETR@Vannes | By: L. Besson &amp; P.

Laminations of the Unit Disk and Cubic Julia Sets John C. Mayer Department of Mathematics

making Julia more inclusive and accessible Jane Herriman Julia Computing &amp; Caltech Our

Robustness Techniques for Speech Recognition Berlin Chen, 2004 References: 1. X. Huang et al.

Spoken Language Understanding strategies developed at the University of Avignon: For a better

Monte Carlo methods Draw random samples from the desired distribution Yield a stochastic

SurFi: Detecting Surveillance Camera Looping Attacks with Wi-Fi Channel State Information Nitya

Introduction to the R Language Data Types and Basic Operations Computing for Data Analysis 1 /

Wh y do missing v al u es e x ist ? FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN

Robust code R Functions What do these calls do? &gt; df[, vars] &gt; subset(df, x == y) &gt;

List Processing in SML CS251 Programming Languages Spring 2016, Lyn Turbak

1 Julia, my new computing friend? | 14 June 2018, IETR@Vannes | By: L. Besson & P.

making Julia more inclusive and accessible Jane Herriman Julia Computing & Caltech Our

Robust code R Functions What do these calls do? > df[, vars] > subset(df, x == y) >