The package {bigstatsr}: memory- and computation-ecient tools for - PowerPoint PPT Presentation

Mar 13, 2024 •232 likes •385 views

The package {bigstatsr}: memory- and computation-ecient tools for big matrices stored on disk Florian Priv (@prive) eRum 2018 1 / 15 About I'm a PhD Student (2016-2019) in Predictive Human Genetics in Grenoble. Disease DNA

The  package {bigstatsr}: memory- and computation-e�cient tools for big matrices stored on disk Florian Privé (@prive�) eRum 2018 1 / 15
About I'm a PhD Student (2016-2019) in Predictive Human Genetics in Grenoble. Disease ∼ DNA mutations + ⋯ 2 / 15
Very large genotype matrices previously: 15K x 280K, celiac disease (~30GB) currently: 500K x 500K, UK Biobank (~2TB) But I still want to use  .. 3 / 15
The solution I found FBM is very similar to filebacked.big.matrix from package {bigmemory}. 4 / 15
Similar accessor as R matrices X <- FBM(2, 5, init = 1:10, backingfile = "test") X$backingfile ## [1] "/home/privef/Bureau/eRum-2018/test.bk" X[, 1] ## ok ## [1] 1 2 X[1, ] ## bad ## [1] 1 3 5 7 9 X[] ## super bad ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1 3 5 7 9 ## [2,] 2 4 6 8 10 5 / 15
Similar accessor as R matrices colSums(X[]) ## super bad ## [1] 3 7 11 15 19 6 / 15
Split-(par)Apply-Combine Strategy Apply standard R functions to big matrices (in parallel) Implemented in big_apply() . 7 / 15
Similar accessor as Rcpp matrices // [[Rcpp::depends(BH, bigstatsr)]] #include <bigstatsr/BMAcc.h> // [[Rcpp::export]] NumericVector big_colsums (Environment BM) { XPtr<FBM> xpBM = BM["address"]; BMAcc< double > macc(xpBM); size_t n = macc.nrow(); size_t m = macc.ncol(); NumericVector res (m); for ( size_t j = 0; j < m; j++) for ( size_t i = 0; i < n; i++) res[j] += macc(i, j); return res; } 8 / 15
Partial Singular Value Decomposition 15K 100K -- 10 first PCs -- 6 cores -- 1 min (vs 2h in base R) × Implemented in big_randomSVD() , powered by R packages {RSpectra} and {Rcpp}. 9 / 15
Sparse linear models Predicting complex diseases with a penalized logistic regression 15K 280K -- 6 cores -- 2 min × 10 / 15
Other functions matrix operations association of each variable with an output plotting functions read from text files many other functions.. Parallel most of the functions are parallelized (memory-mapping makes it easy!) you can parallelize you own functions with big_parallelize() 11 / 15
I'm able to run algorithms on 100GB of data in  on my computer 12 / 15
R Packages {bigstatsr}: to be used by any field of research {bigsnpr}: algorithms specific to my field of research 13 / 15
Contributors are welcomed! 14 / 15
Thanks! Presentation: https://privefl.github.io/eRum-2018/slides.html Package's website: https://privefl.github.io/bigstatsr/ DOI: 10.1093/bioinformatics/bty185  privefl  privefl  F. Privé Slides created via the R package xaringan . 15 / 15

Recommend

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Main Focus I. Memory as a process Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory the process by which I. Sensory Memory information is - acquired, II. Short -Term Memory - stored,

171 views • 5 slides

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for small memories, but it be- comes increasingly e ffi cient in using available storage space as memory size increases. The attractive features of the

940 views • 60 slides

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

869 views • 62 slides

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC Computing Computing + Fabric SoC Memory HYPERCONVERGED Exascale EDGE DEVICE SYSTEM Eliminate data movement via shared

405 views • 11 slides

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory Device Device Memory Computer-Computer Comm CPU CPU CPU CPU Comm Comm Comm Comm Memory Memory Memory Memory Device Device Device Device

631 views • 36 slides

Package Managers CC-BY-SA 2016 Nate Levesque What is a Package Manager? A package manager or

Package Managers CC-BY-SA 2016 Nate Levesque What is a Package Manager? A package manager or package management system is a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer

925 views • 40 slides

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

np : A Package for Nonparametric Kernel Smoothing with Mixed Datatypes np : A Package for Nonparametric Kernel Smoothing with Mixed Datatypes np : Kernel Smoothing with Mixed Datatypes The np package np : A Package for Nonparametric Kernel The

257 views • 14 slides

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if we want to run a process that requires 10GB memory? 2 Memory Hierarchy Virtual Memory Memory Cache Registers Answer: Pretend we had something

739 views • 45 slides

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a bucket of bytes . Computer Memory Organization Memory is a bucket of bytes. Each byte is 8 bits wide. Computer Memory Organization Memory

994 views • 42 slides

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate (Working memory) Retain (Long term memory) Memory Retrieve (Long term memory) processing A difficulty with any one or more of these skills

362 views • 6 slides

Memory Management Memory Manager Requirements Minimize primary memory access time

Memory Management Memory Manager Requirements Minimize primary memory access time Maximize primary memory size Primary memory must be cost-effective Todays memory manager: Allocates primary memory to processes Maps

637 views • 27 slides

The Network Operation Centre of a RREN: The Network Operation Centre of a RREN: Anella Cient

The Network Operation Centre of a RREN: The Network Operation Centre of a RREN: Anella Cient Anella Cient fica fica Maria Isabel Ganda Carriedo Communications Area, Systems & Networks Department, CESCA TF-NOC Preparation Meeting

734 views • 48 slides

Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient

0. Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient statistics Su ffi cient statistics The concept of su ffi ciency addresses the question Is there a statistic T ( X ) that in some sense contains

303 views • 14 slides

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation model The model of computation considered so far is the work performed by a finite automaton Formal Definition of Computation p.2/28

1.28k views • 69 slides

croft design studio Package Prices 2020 Package Prices We are now offering these package

croft design studio Package Prices 2020 Package Prices We are now offering these package prices with our croft prints. Our croft prints can all be found on our site under artwork and then under croft prints All croft prints are

395 views • 6 slides

GETTING STARTED? BASIC PREMIUM SHRED10 PACKAGE PACKAGE* PACKAGE* $61.50 /month $132.75

WHERE DO YOU SEE YOURSELF GETTING STARTED? BASIC PREMIUM SHRED10 PACKAGE PACKAGE* PACKAGE* $61.50 /month $132.75 /month $194.25 /month ($2.05 /day) ($4.40 /day) ($6.50 /day) One shake per day Juice Plus+ capsules Juice Plus+

462 views • 15 slides

Nutritional Supplementation for the GP-Friendly Diet Living (Well!) with Gastroparesis Program

Nutritional Supplementation for the GP-Friendly Diet Living (Well!) with Gastroparesis Program Class 9 What are vitamins & minerals? Essential to human health; each vitamin/mineral has a specific function within the body Vitamins

297 views • 7 slides

Microbiome Research 0368-3116-01 Prof. Elhanan Borenstein School of Computer Science Semester

Computational Methods in Metagenomics and Microbiome Research 0368-3116-01 Prof. Elhanan Borenstein School of Computer Science Semester B, 2019 Let me know who you are . 1. Name 2. Degree (undergraduate, MSc) 3. Background (CS/Biology) 4.

951 views • 49 slides

Hypothyroidism Pathophysiology, differentials, investigations and management. Quiz Cases Dr

Hypothyroidism Pathophysiology, differentials, investigations and management. Quiz Cases Dr Azeem Alam, MBBS BSc (Hons) Surgical AFP Guys and St. Thomas Hospital Endocrinology series Content reviewed on the 26/04/2020. Case 1

337 views • 29 slides

Non-Alcoholic Fatty Dr. Oscar Cruz Pereira, MD, FRCPC Liver Disease Assistant Clinical

9/22/2019 Non-Alcoholic Fatty Dr. Oscar Cruz Pereira, MD, FRCPC Liver Disease Assistant Clinical Professor of Medicine 1 Research Gilead NASH trial Intercept NASH trial Advisory Board Pfizer IBD Lupin

472 views • 8 slides

The Human Microbiome: Medical, Philosophical, and Theological Complexity John Pohl MD Professor

The Human Microbiome: Medical, Philosophical, and Theological Complexity John Pohl MD Professor of Pediatrics Division of Pediatric Gastroenterology University of Utah Salt Lake City, Utah Twitter: @jfpohl Disclosures INSPPIRE

637 views • 44 slides

Direct to Consumer Testing (DTC) Power and Pitfalls What is DTC Testing? Direct to Consumer

Direct to Consumer Testing (DTC) Power and Pitfalls What is DTC Testing? Direct to Consumer (DTC) genetic tests are advertised and sold directly to the u public. Offers information that many include ancestry, risks of developing certain u

667 views • 17 slides

Communicators: Quarterly Meeting Thursday, July 16, 2020 1:00 2:00 PM ET The University of

CTSA Program Hub Communicators: Quarterly Meeting Thursday, July 16, 2020 1:00 2:00 PM ET The University of Rochester Center for Leading Innovation and Collaboration (CLIC) is the coordinating center for the Clinical and Translational

667 views • 49 slides

1 Microbiota GI tract consists of approximately 10 15 /mL (Million trillion) 30 genera

Sm all I ntestinal Bacterial Overgrow th The old and the new Jack A. Di Palm a, M.D. University of South Alabam a Mobile, Alabam a The Old The New 1 Microbiota GI tract consists of approximately 10 15 /mL (Million

456 views • 17 slides

The package {bigstatsr}: memory- and computation-ecient tools for - PowerPoint PPT Presentation

The package {bigstatsr}: memory- and computation-ecient tools for big matrices stored on disk Florian Priv (@prive) eRum 2018 1 / 15 About I'm a PhD Student (2016-2019) in Predictive Human Genetics in Grenoble. Disease DNA

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Package Managers CC-BY-SA 2016 Nate Levesque What is a Package Manager? A package manager or

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Management Memory Manager Requirements Minimize primary memory access time

The Network Operation Centre of a RREN: The Network Operation Centre of a RREN: Anella Cient

Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

croft design studio Package Prices 2020 Package Prices We are now offering these package

GETTING STARTED? BASIC PREMIUM SHRED10 PACKAGE PACKAGE* PACKAGE* $61.50 /month $132.75

Nutritional Supplementation for the GP-Friendly Diet Living (Well!) with Gastroparesis Program

Microbiome Research 0368-3116-01 Prof. Elhanan Borenstein School of Computer Science Semester

Hypothyroidism Pathophysiology, differentials, investigations and management. Quiz Cases Dr

Non-Alcoholic Fatty Dr. Oscar Cruz Pereira, MD, FRCPC Liver Disease Assistant Clinical

The Human Microbiome: Medical, Philosophical, and Theological Complexity John Pohl MD Professor

Direct to Consumer Testing (DTC) Power and Pitfalls What is DTC Testing? Direct to Consumer

Communicators: Quarterly Meeting Thursday, July 16, 2020 1:00 2:00 PM ET The University of

1 Microbiota GI tract consists of approximately 10 15 /mL (Million trillion) 30 genera

Sambuz

Useful Links

Newsletter

Mail Us