big data sets seen as a big problem and how to deal with
play

Big Data Sets Seen as a Big Problem and How to Deal with Them - PowerPoint PPT Presentation

Big Data Sets Seen as a Big Problem and How to Deal with Them Daniil Shliakhov, Kharkiv, Ukraine Frankfurt 2018 B I G D ATA S E T S S E E N A S A B I G P R O B L E M INTRO Running time is an issue! Running time Normal Size Data Sets


  1. Big Data Sets Seen as a Big Problem and How to Deal with Them Daniil Shliakhov, Kharkiv, Ukraine Frankfurt 2018

  2. B I G D ATA S E T S S E E N A S A B I G P R O B L E M INTRO Running time is an issue! Running time Normal Size Data Sets Large Data Sets Page 2

  3. B I G D ATA S E T S S E E N A S A B I G P R O B L E M INTRO Parameter Treatment n Mean SD Median Min Max Alkaline Baseline Phosphatase (U/L) Pooled TRT1 xxx xxx.x xxx.xx xxx.xx xxx xxx Pooled TRT2 xxx xxx.x xxx.xx xxx.xx xxx xxx Cycle 1 Pooled TRT1 xxx xxx.x xxx.xx xxx.xx xxx xxx Pooled TRT2 xxx xxx.x xxx.xx xxx.xx xxx xxx Cycle 2 Pooled TRT1 xxx xxx.x xxx.xx xxx.xx xxx xxx Pooled TRT2 xxx xxx.x xxx.xx xxx.xx xxx xxx Page 3

  4. GENERAL TIPS RETRIEVING SAS DATA SETS VIEW option

  5. G E N E R A L T I P S VIEW OPTION A simple data step? How much time it may take to run this step if ADAM.ADLB is huge? data adlb; set adam.adlb; run; Page 5

  6. G E N E R A L T I P S VIEW OPTION 35 seconds? Too looooong L data adlb; set adam.adlb; run; Real time 34:23 CPU time 3:10 Page 6

  7. G E N E R A L T I P S VIEW OPTION A SAS VIEW is a type of SAS data set that retrieves data values from other files data adlb / VIEW=adlb; set adam.adlb; run; Page 7

  8. G E N E R A L T I P S VIEW OPTION Less than 1 second? MAGIC! data adlb / VIEW=adlb; set adam.adlb; run; Real time 0:03 CPU time 0:01 Page 8

  9. G E N E R A L T I P S VIEW OPTION Simple merge, common sort… How long? data analysis ; merge adam.adsl adam.adlb; by studyid usubjid; run ; proc sort data=analysis; by trt01an parcat paramcd avisitn; run; Page 9

  10. G E N E R A L T I P S VIEW OPTION Simple merge, common sort… Too looooong again data analysis ; merge adam.adsl adam.adlb; by studyid usubjid; run ; proc sort data=analysis; by trt01an parcat paramcd avisitn; run; Data Step PROC step Real time 36:04 1:12.40 CPU time 8:18 12.45 Page 10

  11. G E N E R A L T I P S VIEW OPTION Adding VIEW option to do magic data analysis / VIEW=analysis ; merge adam.adsl adam.adlb; by studyid usubjid; run ; proc sort data=analysis out=analysis_sorted; by trt01an parcat paramcd avisitn; run; Page 11

  12. G E N E R A L T I P S VIEW OPTION VIEW option gives extra time to drink coffee with colleagues data analysis / VIEW=analysis ; merge adam.adsl adam.adlb; by studyid usubjid; run ; proc sort data=analysis out=analysis_sorted; by trt01an parcat paramcd avisitn; run; Data Step PROC step Real time 0:04 1:17.65 CPU time 0:02 20.32 Page 12

  13. GENERAL TIPS RETRIEVING SAS DATA SETS IF or WHERE?

  14. G E N E R A L T I P S IF or WHERE? IF vs. WHERE. Who is the champion? data adlb ; set adam.adlb; if ANL01FL = 'Y'; run ; data adlb ; set adam.adlb; where ANL01FL = 'Y'; run ; Page 14

  15. G E N E R A L T I P S IF or WHERE? IF is champion! Woohoo! data adlb ; set adam.adlb; IF statement WHERE if ANL01FL = 'Y'; statement run ; Real time 31:64 33:31 data adlb ; CPU time 3:53 5:68 set adam.adlb; where ANL01FL = 'Y'; run ; Page 15

  16. G E N E R A L T I P S IF or WHERE? INDEX helps WHERE to win J data adlb ; set adam.adlb; if ANL01FL = 'Y'; run ; INDEX APPLIED data adlb ; set adam.adlb; where ANL01FL = 'Y'; run ; IF statement WHERE statement Real time 32:15 27:26 CPU time 4:28 2.98 Page 16

  17. SAS PROCEDURES DESCRIPTIVE STATS

  18. S A S P R O C E D U R E S DESCRIPTIVE STATS proc means data=adlb noprint; by trt01an parcat paramcd avisitn; var aval; output out = mnout n = n mean = mean median = median std = std min = min max = max; run; Page 18

  19. S A S P R O C E D U R E S DESCRIPTIVE STATS proc univariate data=adlb noprint; by trt01an parcat paramcd avisitn; var aval; output out = mnout n = n mean = mean median = median std = std min = min max = max; run; Page 19

  20. S A S P R O C E D U R E S DESCRIPTIVE STATS proc summary data=adlb noprint; by trt01an parcat paramcd avisitn; var aval; output out = mnout n = n mean = mean median = median std = std min = min max = max; run; Page 20

  21. S A S P R O C E D U R E S DESCRIPTIVE STATS proc sql noprint; create table mnout as select trt01an, parcat, paramcd, avisitn, COUNT(*) as n, MEAN(aval) as mean, MEDIAN(aval) as median, STD(aval) as std, MIN(aval) as min, MAX(aval) as max from adlb group by trt01an, parcat, paramcd, avisitn quit; Page 21

  22. S A S P R O C E D U R E S DESCRIPTIVE STATS DESCRIPTIVE STATS COMPARISON MEANS UNIVARIATE SUMMARY SQL Real time 15:14 24:78 13:24 13:45 CPU time 3:38 1:76 3:33 2:53 Page 22

  23. SAS PROCEDURES FREQUENCY

  24. S A S P R O C E D U R E S FREQUENCY proc freq data=adlb noprint; by trt01an parcat paramcd; tables avisitn / out=frout; run; Page 24

  25. S A S P R O C E D U R E S FREQUENCY proc summary data=adlb nway noprint; by trt01an parcat paramcd avisitn; output out=frout; run; Page 25

  26. S A S P R O C E D U R E S FREQUENCY proc sql noprint; create table frout as select trt01an, parcat, paramcd, avisitn, COUNT(*) as count from adlb group by trt01an, parcat, paramcd, avisitn quit; Page 26

  27. S A S P R O C E D U R E S FREQUENCY FREQUENCY COMPARISON FREQ SQL SUMMARY Real time 13:62 12:19 12:02 CPU time 2:04 1:63 0:88 Page 27

  28. CONCLUSIONS Do not be afraid to work with big data sets. Just choose the “right” procedure!

  29. T H A N K YO U Daniil Shliakhov daniil.shlyakhov@intego-group.com Kharkiv, Ukraine www.intego-group.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend