Converting High Volume Data challenges to Relevant Clinical Data - PowerPoint PPT Presentation

Converting High Volume Data challenges to Relevant Clinical Data Insight Navneet Kumar Manager , CDM Icon Clinical Research plc

Introduction

Focus Area ü Introduction Why Data is important? ü Data Challenges Changing Paradigm in Industry ; Data Challenges Types ü Overcoming Data Challenge Architecture Framework; Data Scaling; Data Wrangling, Data Lakes, Clinical text Mining ü New Approach to Clinical Data Management Data Slicing ; Aggregate Data Review; Risk Based Data Quality Management

90% 2.5+ Data Volume nearly doubles every two years of World’s data exabytes of data are generated in last decade created each day ü By 2020, 1.7 megabytes of new information will be created for every human being on earth ü Digital universe of data will grow from 4.4trillion zettabytes to around 44.44 Zettabytes ü Massive growth of unstructured data: Ø 1 trillion photos Ø 300 hours of videos uploaded every min ü 6.1 million smartphones users Deliver the right insights, to the right person, in real-time

Data Types Epidemiology data Genomic Data ü The Surveillance Epidemiology and End Results ü Human Genome consists of 3 billion Program (SEER )at NIH. pairs of bases and particular order of As, ü Publishes cancer incidence and survival data Ts, Cs, and Gs is extremely important from population-based cancer registries ü Size of single human is about 3 GB covering approximately 28% of the population ü Whole genome sequence data is being of the US. currently annotated but not many ü Collected over the past 40 years (starting from analytics applied on this relatively new January 1973 until now) data ü Contains a total of 7.7M cases and >350,000 cases are added each year. ü Collect data on patient demographics, tumor site, tumor morphology and ü stage at diagnosis, first course of treatment, and follow-up for vital status . Source: Tutorial presented on SIAM International Conference

Image Data is really big ü Average hospitals will have two thirds of petabytes (665 terabytes) of patient data, of which 80% of data will be unstructured image data ü Medical imaging archives are increasing by 20%-40%

Better integrating big data, healthcare could save as much as $300 billion a year — that’s equal to reducing costs by $1000 a year for every man, woman, and child. For a typical Fortune 1000 company, just a 10% increase in data accessibility will result in more than $65 million additional net income.

What we want to Achieve Lower Cost Evidence + Insight Improved Outcomes Source: Tutorial presented on SIAM International Conference

Data Challenges

Changing Paradigm Shift Towards the Patient Precision Care Technology 1-1 Relationship Mobile Health Ø Precision Medicines Ø Real time insightful decision making Regulation Expectations Care every ware Affordable Health care Ø Faster Treatment Ø Access to medicine Ø 24*7 Personalized Care Ø

Roadblock to convert Data into Insight DATA CHALLENGES 01 Data challenges are the group of the challenges pertains to the characteristics of the data itself and its characteristics MANAGEMENT CHALLENGES PROCESS CHALLENGES This concerns the legal and ethical issues related to accessing data. All the challenges encountered while processing the Big Data; starts with capture step and ends with presenting the output to clients, to 03 02 understand the overall picture (PDF) Big Data Challenges

Data Challenges Infographic Diagram w/ 8 Parts for PowerPoint VOLUME Complex trials, EHRs, Insurance VERACITY 01 08 penetration Surveillance data etc. Biases, uncertainties, impression, untruths and missing values in the data. VARIETY 07 Data type , format, sensors and smart 02 devices etc. Only about 20% of data can be processed by current traditional systems and the QUALITY remaining 80% are not analyzed and thereby not utilized for decision making All data is updated, free of any data issues, and insight processes. 06 data is available per request and data is up 03 to date VELOCITY Capacity of the current software DISCOVERY application to handle and process data 05 04 stream generated continuously and To identify right data for our analysis constantly at a pace which becomes critical due to the short shelf-life of the data which need to be analyzed in near real time if we plan to find insight in that data . DOGMATISM VOLATILITY Enhance domain understanding , look for Data Validity, duration to keep the data things happening around us

Process Challenges Medical data Payment data Legacy data Video/images Social Data Data Acquisition Data Acquisition Smart filters • Date reduction • Data Analysis Automatic Meta data • generation Data Fidelity • • Noisy, untrustworthy , 2 Extraction & Cleaning heterogeneous data Data Cleaning • Integrating DB systems & analytical system Converting structure • • Analytics on the fly less data to analytics Integration &Aggregation friendly format Extracting right • information Data Insight Adequate error models • 4 Analysis & Reporting Data Aggregation Wrong modeling • Erroneous data used • Heterogeneity of data • Automation of data • integration and 5 Interpretation aggregation

Management Challenges Security Variety, velocity and volume attributes of big data amplifies the security management challenges, Distributed nature of data Governance Privacy To make decision with confidence, to plan There is an increasing fear of inappropriate accurately for future , to avoid costs use of personal data especially when resulted from low quality data and need to combining this data from multiple sources. re-do the work again, and provide big data reporting compatible with government standards Legal and Ethical aspect of data

Overcoming Data Challenges

Framework to Manage Data Volumes Data Data Data Source Data tools Applications Transformation Hadoop Middleware Queries Map Reduces Internal Pig External ETL Reports Data Raw Transformed Hive Multiple Analytics data data Format Oozie Data Multiple OLAP Mahout Wrangling locations SAS Multiple applications Others Traditional way Data Mining

Level of Detailing v Every Piece of data has value h ü Information t ü Knowledge p ü Wisdom e D s v Depth of analysis i ü Descriptive s y ü Diagnostic l ü Predictive a ü Prescriptive n A D a t a S c a l e

Data Wrangling Discovery Structuring Cleaning Enriching Validating Publishing Use Case –II Use case-I : Accelerating Detection of Adverse Drug Sanofi Accelerated the Standardization Reaction in pharmacovigilance of Clinical Trial, Marketing and ü Better collaboration Commercial Data to Deliver New Insights ü Provide right information to agencies, on Consumer Health and Drug healthcare providers and patients Development using Data Wrangling ü Improve response times software Trifacta ü Resolve drug safety concerns quickly Source: https://www.trifacta.com/data-wrangling/

Data Lakes v Build Application v Flexibility & Accessibility v Data Authenticity ISASA v Speed I ngest S tore A nalyze S urface A ct v Explore and Analysis Source: https://40uu5c99f3a2ja7s7miveqgqu-wpengine.netdna-ssl.com/wp-content/uploads/2017/02/Understanding-data-lakes-EMC.pdf

Clinical Text Mining ü Text Mining ü Context Analysis-Negation Ø Information Extraction Ø NegEx ü Name entity recognition Ø NegExpander Ø Informational retrieval Ø NegFinder Ø Index of words Ø Ranking of matching ü Context Analysis-Temporality documents ü Clinical text vs Biomedical text Ø Biomedical Text- medical literatures Ø Clinical text: Clinical notes ü Auto encoding Ø Extracting codes from clinical text Source: Tutorial presented on SIAM International Conference

New Approach to Clinical Data Management

Risk based Data Quality Management ü Monitor data taking into account risk factors and categories in order to track study progression and solve critical situations. ü Focus on data directly impacting primary and secondary objectives. ü Develop Data checks based on data peculiarity Source: Reflection paper on risk based quality management in clinical trials -EMA/269011/2013

Focusing on Trend and Fraud

New Approach to Data Management

Summary ü Health care and life sciences are a data rich domain . ü Unraveling huge data complexities can provide many insights about making the right decisions at the right time for the patents ü Efficiently utilizing the colossal data can help in improving patient outcome and also reducing cost

Converting High Volume Data challenges to Relevant Clinical Data - PowerPoint PPT Presentation

Converting High Volume Data challenges to Relevant Clinical Data Insight Navneet Kumar Manager , CDM Icon Clinical Research plc Introduction Focus Area Introduction Why Data is important? Data Challenges Changing Paradigm in

Volume Visualization Overview: Volume Visualization (1) Introduction to volume visualization On

Data.dcs: Converting Legacy Data into Linked Data Matthew Rowe Organisations, Information and

Volume Presentation Volume of a Volume with sphere, cone, Slant height cylinder, and pyramid.

ZERO Digital Converting Machine Machine Overview The fastest Digital Converting Machine ZERO

4.8: Converting Regular Expressions and FA to Grammars In this section, we give simple algorithms

4.8: Converting Regular Expressions and FA to Grammars In this section, we give simple algorithms

Computer Graphics - Volume Rendering - Philipp Slusallek Overview Motivation Volume

Direct Volume Rendering Han-Wei Shen The Ohio State University Volume Rendering A method to

Smart Converting Solutions Company Overview Market-leading at One of the worlds home with a

ECOMMERCE How to Earn BIG with Holiday Season WEBINAR NOVEMBER 26, 2019 CONVERTING TRAFFIC

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

Comp/Phys/Mtsc 715 3D (Volume) Scalar Fields: Direct volume rendering, Slices, (Textured)

Part I: Soil Mechanics Volume-Volume relation Mass-Mass relation Mass-Volume relation

Challenges in Con erting the Challenges in Converting the National Crime Victimization Survey to

Converting Millilitres and Litres Aim I can convert metric measures involving volume and

EMRs - Realizing Personalized Medicine Guna Rajagopal PhD Executive Director, Bioinformatics,

Forum Webinar March 8, 2018 We bina r Pur pose a nd De sir e d Outc ome Purpose : Meeting

21 November, 2019 Paris Agreement: Opportunity for Climate Markets Climate markets can mobilize

IRONMAN: International Registry for Men with Advanced Prostate Cancer Sponsor: Prostate Cancer

Linear Regression 18.05 Spring 2014 Agenda Fitting curves to bivariate data Measuring the

Regression Analysis in Stata Hsueh-Sheng Wu CFDR Workshop Series February 18, 2019 1 Overview

Getting to Regression: The Workhorse of Quantitative Political Analysis Department of

ECON 950 Winter 2020 Prof. James MacKinnon 3. Methods Based on Linear Regression The methods

Sambuz

Useful Links

Newsletter

Mail Us