STEAMS THE ANTI- HEART DISEASE WARRIORS Mason Chen Black Belt, - - PowerPoint PPT Presentation

steams the anti heart disease warriors
SMART_READER_LITE
LIVE PREVIEW

STEAMS THE ANTI- HEART DISEASE WARRIORS Mason Chen Black Belt, - - PowerPoint PPT Presentation

STEAMS THE ANTI- HEART DISEASE WARRIORS Mason Chen Black Belt, Stanford OHS 1 st Place Best Contributed Paper, 2018 JMP Discovery Summit, CARY NC 1 Project Scope and Presentation Flow Many people like chocolate, but have some concerns


slide-1
SLIDE 1

Mason Chen Black Belt, Stanford OHS

1st Place Best Contributed Paper, 2018 JMP Discovery Summit, CARY NC

THE ANTI- HEART DISEASE WARRIORS

1

STEAMS

slide-2
SLIDE 2

2

S T E A M S

  • 1. Anti-Oxidant Science

Literature Research

  • 2. Clustering Nutritions

& Science

  • 3. Clustering Chocolate

Types

  • 4. Missing Value Neural

Imputation of Cocoa%

  • 5. DSD Optimization of

Neural Setting

Project Scope and Presentation Flow

  • Many people like chocolate, but have some concerns that chocolate is unhealthy.
  • Some people who have heart

t di diseases es might need to eat chocolate, but do not know which one to eat.

slide-3
SLIDE 3

3

S T E A M S

Mason C., “STEAMS” Methodology

  • logy of
  • f Conduct

ucting Chocolat late Science Research”, submi mitte ted to to NSTA STEM Expo Ingred redien ents s and Nutri rition tions Chocolate late Prod

  • ducts

ucts Chocolate late Proc

  • ces

ess s (Technology

  • logy and Enginee

eeri ring) g) Anti-Oxi Oxidan ant

Science

Clusteri tering Prod

  • ducts

ucts (Arti tifi ficial al Intell llige gence) e) Clusteri tering Nutri rition tions (Arti tifi ficial al Intell llige gence) e)

Linkage ge Choices es

Sing ngle Complete Cent ntroid

  • id

Clusteri tering Algori rithm thm (Mat ath) DSD Optimi mizat zation

  • n

(Statis tistics) s) Neural ral Imputatio ation (Arti tifi ficial al Intell llige gence) e)

STEAMS DIAGRAM

1 1 2 3 3 4 5

slide-4
SLIDE 4

20 2015 15-2018 “STEAMS” Jou Journe ney

S T E M S T E A M M S S T E A M

De Develop

  • p Mat

ath & S Sci cience ce Found ndat ation ion (Stan anford ford OHS)

Math, Physics, Biology, Chemistry, JAVA, Stati atisti tics Literature Research/Writing

Certify ify 6 Professi

  • fessional
  • nal

Certific ificat ates es:

IBM SPSS Statistics IBM Modeler DA/DM DM (IBM000129876) IASSC YB/GB/BB BB (GR764000541MC) JMP Stati atistical tical Thin inki king g (2018 Goal) JMP DOE OE (2019 Goal) JMP Scrip ipt t Specialist (2019 Goal)

Enhan ance ce STEAM AMS S Skil ills

JM JMP/Pr P/Pro, Python Latex Paper Proceedings Oral/Poster Presentations Team Building

Learning “STEAMS” techniques help motivate school learning on project-based and practical way

4

S T E A M S

Fun, Re Real al Hands ds-On On

slide-5
SLIDE 5

Gl Glob

  • bal

l Vi Visio ion Lea eade ders rshi hip: p: 20

2017-2018 2018 Con

  • nfe

ference rences s

IEOM/A /ASQ SQ STEM, , Palo

  • Alto,
  • , CA

AQI Six Sigma a Nashv shvil ille, le, TN ASA JSM, , Baltim imor

  • re,

e, MD IEOM OM Bogota,

  • ta, Colom
  • mbia

ia IEOM OM Rabat at, , Moroc

  • cco

IWSM, M, Groning

  • ningen,

en, Nether herlands lands

JMP,

, Prag ague ue, , Czech Republi ublic

FSDM, , Hualien alien, , Taiwa wan IEOM, , Bandu dung ng, Indon

  • nesia

esia ISF, , Cairns, irns, Australia alia ASA JSM, , Vancou

  • uve

ver, , CAN ISF, , Boulde lder, , CO ASA, SDSS, , Reston,

  • n, VA

IBC, Barcelona elona, Spain IWSM, M, Bristol, stol, UK ISCB, , Melbourne

  • urne

Australia alia

JMP, Cary, y, NC

FSDM, , Bangkok, gkok, Thail iland nd IEOM, , DC

5

S T E A M S

IEOM/A /ASQ SQ STEM, , Santa a Clara, CA IEOM, , Paris, is, Franc ance

Cu Cult ltur ure-Sho Shocki cking ng

slide-6
SLIDE 6
  • 1. Chocolate

Anti-Oxidant Science

6 JMP13 13 >> Analy lyze e >> Fit t Y by X > >> Nonpar Den ensi sity ty

Mason C., (2018 July) y) “Multivariate Statist stics of

  • f Antioxid

xidan ant Chocolate”, SMS IWSM Bristol tol Proc

  • cee

eedings, gs, Vol 2 37 37-40 40

slide-7
SLIDE 7

IS EATING CHOCOLATE UNHEALTHY?

Chocolate te has not be been n proven en harmful ul.

Life Expectancy8 (2015 Estimate) Median ian: : 74.75 #9: 82.50 #32: 80.57 #31: 80.68 #33: 80.54 #20: 81.70 #15: 81.98 #13: 82.15 #24: 81.23 #43: 79.68 #19: 81.75

7 JMP13 13 >> Analy lyze e >> Fit t Y by X > >> Nonpar ar Density nsity

Dark chocolate is a powerful source of

  • antioxidant. If chocolate’s serving size is

equal to that of an apple, it has the highest amount of antioxidant.

E N GI GI N E E RI RI N G

Anti-Oxi Oxidan ant t Capaci acity ty/gram /gram

slide-8
SLIDE 8

CHOCOLATE & ATRIAL FIBRILLATION (AF)

Lower Cardi diovas vascul cular r Heart t Dise sease se (CHD) risk if taking 2 Chocolate servings per week (1 serving = 30 g)

  • Chocolate may be inversely associated with AF
  • Dark chocolate may be a healthy snacking
  • ption
  • AF = Atrial Fibrillation (a cardiovascular

disease)

  • Next, how Chocolate can reduce CHD risk and

AF associated cardiovascular disease

https://heart.bmj.com/content/103/15/1163 https://www.bmj.com/content/343/bmj.d4488

8

T E C H N O L O G Y

slide-9
SLIDE 9

FLAV0NOIDS SCIENCE & STRUCTURE

  • Flavonoids are the most abundant polyphenols in human diet that have antioxidant

properties.

  • Flavonoids have the general structure of a 15- carbon skeleton C6-C3-C6.

▪ Consists of two phenyl rings (A and B) and a heterocyclic ring (C).

  • There are seven different types of flavonoids based on its chemical structure:

▪ Flavones, flavanol, flavanones, isoflavones, anthocyanidins, chalcones, catechins

  • Chocolate flavonoids are flavanols which can promote healthy blood flow from head

to toe.

9

S CI CI E N C E

slide-10
SLIDE 10

FREE RADICALS AND ANTIOXIDANTS

▪ Free radicals are atoms with odd number of electrons ▪ Antioxidants reduce free radical formation ▪ Reactive free radicals causes cells mal-function ▪ Excess free radicals damages blood vessel ▪ After the oxidation of free radicals, LDL (Low-density Lipoprotein)can cause CVD (Cardiovascular Disease)

▪ The oxidized components attract macrophages which absorb & deposit Cholesterol

10

S CI CI E N C E

slide-11
SLIDE 11

DARK CHOCOLATE LITERATURE RESEARCH

  • Bene

nefits: fits:

– A lot of soluble fiber – A lot of minerals: iron, magnesium, copper, manganese, potassium, phosphorus, zinc,

selenium

– Powerful source of antioxidant – Improve blood flow and lower blood pressure – Increases HDL (good cholesterol) and decreases LDL (bad cholesterol) – Lower risk of cardiovascular disease (CVD) – Improve brain function1

  • Conc

ncerns: ns:

– Causes migraines – Increases chance of ki

kidney ney stone nes

– Side effects from caffeine such as irregular

ar heartbe tbeat at

11

T E C H N O L O G Y

slide-12
SLIDE 12
  • 2. Clustering

Nutritions & Science

12

(3 (3) JMP 13 >> A

Analy lyze ze >> Clusteri tering >> Cluste ter Varia riable bles

(1 (1) JMP 13 >> Analy

lyze e >> Distribution istribution

(2 (2) JMP 13 >> Analyze

yze >> Multivari tivariate ate Metho thods ds >> Multivari tivariate ate

(4 (4) JMP 13 >> Analyze

yze >> Multivari tivariate ate Metho thods ds >> Pr Prin incip ciple le Components ents >> Bi-Plo Plot

  • Mason C., (2018 July),

y), “Choose Healt lthy Chocolate”, IEOM Europe

  • pe Pari

ris Proc

  • ceed

edings, gs, 434-441 441

slide-13
SLIDE 13

(1) CHOCOLATE NUTRITION DISTRIBUTION

Most Dark Chocolate (Qualitat tative Clustering Criteria) has:

  • 1st Cluster: higher Cocoa percent, Dietary Fiber and Iron
  • 2nd Cluster: lower Cholesterol, Calcium, and Sugar

Chocolate Product Nutrition data has indicated that Dark Chocolate is healthier than the Milk and White Chocolate

  • 60+ Chocolate nutrition data collected from “Target” store.
  • The quantities of the eight most critical ingredients were analyzed.

13 JMP 13 >> Analy lyze e >> Distribution istribution

S T A T I S T I C S

slide-14
SLIDE 14

(2) DARK CHOCOLATE CORRELATION

  • 1st Cluster: Sugar and Cocoa_Percent have a

negative correlation of -0.9162.

  • 2nd Cluster: Dietary Fiber and Iron have a

positive correlation of 0.7722. Any other better way to cluster nutritions?

14 JMP 13 >> Analy lyze e >> Multivariate tivariate Metho thods ds >> Multivari tivariate ate

SC SC I EN EN CE CE

Pair-Wise se Pearson arson Correlat relation

  • n
slide-15
SLIDE 15

(3) VARIABLE CLUSTERING

Clustering Nutritions can interpret the relevant Chocolate Science insight well:

Cluster 1: the higher the saturated fat, the higher the total fat, and the higher the calories. Cluster 2: Calcium/Cholesterol, and Cocoa percent have a negative correlation. Cluster 3: the higher the sugar, the higher the carbohydrates. Cluster 4: Iron and dietary fiber are positively correlated.

15 JMP 13 >> Analy lyze e >> Clusteri tering >> Cl Cluster ster Vari riab able les

SC SC I E N CE CE & AI AI

Comm mmon n Sense se

Signal nal Noise se S-N Ra Ratio tio

slide-16
SLIDE 16

(4) Principle Component Bi-Plot

16

1st Cluster: Cocoa Percent, Dietary Fiber, and Iron are near each other (Higher for Dark Chocolate) 3rd Cluster: Calcium, Sugar, and Cholesterol are near each other (Higher for Milk/White Chocolate)

JMP 13 >> Analy lyze e >> Multivariate tivariate Methods thods >> Princip ciple le Componen nents ts >> Bi-Pl Plot

  • t

M A T H H & AI AI

2nd Cluster: Total Fat, Saturated Fat, and Calories

Visual isualizati ization

  • n
slide-17
SLIDE 17

Comparing Four Clustering Methods

17

E N GI GI N E E RI RI N G

  • Four different clustering methods show similar clustering patterns
  • Clustering “Statistics and Engineering” results match Chocolate “Science and

Technology” Literature Research well (STEAMS).

slide-18
SLIDE 18
  • 3. Clustering

Chocolate Types

18 JMP 13 >> Analy lyze e >> Clusteri tering >> Hiera erarchic rchical al Cluster ter JMP 13 >> Analy lyze e >> Dist stribution ribution JMP 13 >> Analy lyze e >> Clusteri tering >> Hiera erarchi chical cal Cluster ter >> Column mn Summary mary JMP 13 >> Analy lyze e >> Clusteri tering >> Hiera erarchi chical cal Cluster ter >> Clustering stering Distance istance Method thod JM JMP 13 >> Analy lyze e >> Cluste terin ring g >> Hierarchi erarchical cal Cluste ter >> Constel tellatio lation Plot JMP 13 >> Analy lyze e >> Multi Multivaria variate te Methods thods >> P Prin incip ciple le Components ents >> Ei Eigen envalues values JM JMP 13 >> Analy lyze e >> Distribution istribution

  • Mason C., (2018 Dec.),

), “Statistics Applicat ation

  • n on
  • n the Study of
  • f Chocolat

late Science with Heart rt Disease”, ASA SDSS Proc

  • cee

eeding

slide-19
SLIDE 19

CLUSTERING PRODUCTS

Objective: find a way to identify healthy chocolate products for Heart rt Disease e patients.

  • Use hierarchical clustering to cluster chocolate products
  • All Milk and white chocolate form the third cluster while

dark chocolate split between the first and second cluster.

  • Why are there two clusters for dark chocolate (why not
  • ne cluster for each Chocolate Type)?

19 JM JMP 13 >> Analy lyze e >> Clusteri tering >> H Hie ierarc archical hical Cluster ster JMP 13 >> Analy lyze e >> Dist stribution ribution

S T A TI TI S TI TI C S AI AI

Cluste ter too smal all, l, miss ssing ing 39 po points ts

slide-20
SLIDE 20

PRINCIPLE CLUSTERING DECIDING FACTORS

  • 1st Cluster: Dark Chocolate, High Cocoa%, and Low Calcium, Most Health

lthy? y?

  • 2nd Cluster: Dark Chocolate, Medium Cocoa%, and Low Calcium.
  • 3rd Cluster: Milk/White Chocolate, Low Cocoa%, and High Calcium

20 JMP 13 >> Analy lyze e >> Clusteri tering >> H Hie ierar rarchical chical Cluster ster >> Column Summary ary

A I I & T E C H N O L O G Y

slide-21
SLIDE 21

CLUSTERING DISTANCE METHODS

Clustering patterns dependent on the cluster number observations, cluster variance, and outlier

21 JM JMP 13 >> Analy lyze e >> Cluste terin ring g >> Hierarchi erarchical cal Cluste ter

Lin inkag age Choic ices

Single Compl mplet ete Cent ntro roid id

M A T H

Avera rage ge ANOVA VA (MS) Mi Min Center er-Cent enter er Ma Maxim imum um

slide-22
SLIDE 22

Selecting DISTANCE METHODS

22

M A T H

Depending on the data distribution, selecting an appropriate Clustering Distance algorithm is critical to Clustering Pattern Analysis

Cent ntro roid id

slide-23
SLIDE 23

Single (Join Larger Variances) Ward (Join Smaller Observations)

23

WARD VS SINGLE METHOD (10 Clusters)

JMP 13 >> Analy lyze e >> Clusteri tering >> H Hie ierar rarchical chical Cluste ter >> Constel tellatio lation Plot

AI AI & M A T H

Risk sk on Mis is- class ssifi ification cation of Healthy althy Chocolate

  • late
slide-24
SLIDE 24

DETERMINE NUMBER OF CLUSTERS

From both the scree plot and PCA eigenvalues (80% Pareto), we can pick 4 clusters

24 JM JMP 13 >> Analy lyze e >> Multivaria tivariate te Metho thods ds >> Pri rincip ciple le Components ents >> Ei Eigenval envalues ues

M A T H

Clustering pattern result is highly dependent on the number of clusters

4

slide-25
SLIDE 25

WARD VS SINGLE (3-5 CLUSTERS)

25

  • Single does not show any significant difference between 3, 4, or 5 clusters
  • Ward clusters become more similar in size with the higher the number of clusters

JMP 13 >> Analy lyze e >> Distribution istribution

S T A TI TI S TI TI C S

Better tter Choice ice Worse se Choice ice

slide-26
SLIDE 26
  • 4. Missing

Value Neural Imputation

26 JMP 13 >> Analy lyze e >> Clusteri tering >> Hiera erarchi chical cal Cluster ter >> Miss issing g Valu lue e Imputa tation tion JMP 13 >> Analy lyze e >> Predictive dictive Model eling >> Parti tition tion JMP 13 >> Analy lyze e >> Predictive dictive Model eling >> Neural ral JMP 13 >> Analy lyze e >> Pre redictiv dictive e Model eling >> Neural al >> Dia iagram ram JMP 13 >> Analy lyze e >> Scree eening ing >> Explore re Missing issing Valu lues es

  • Mason C., “Ne

Neural ural Network work Algori

  • rithm of
  • f Missi

sing Value Imput utat ation

  • n for

for Chocolate late Science Rese searc arch“ submitt tted ed to to SIAM SDM19 19

slide-27
SLIDE 27

Explore MISSING VALUES

  • Objective: among 63 commercial chocolate products, 39 have missed the Cocoa %

information (m (most are Milk C Chocolate ates)

27 JMP 13 >> Analy lyze e >> Screenin ening g >> Explore re Miss issing Valu lues es

AI I & M A T H

  • Any other

r better er im impu puta tation ion method

  • d?
slide-28
SLIDE 28

JMP Neural Network Platform

  • Implements a fully connected Perceptro

ceptron n (hid idden en nodes) es) with one layer.

  • Mai

ain ad advan anta tage: can efficiently model different response surfaces given enough hidden nodes and layers.

  • Mai

ain dis isad advan vanta tage: e: results are not easily interpretable, since there are intermediate layers (Blac ack Box)

Standard ndard JMP Editi ition:

  • n:
  • Only

y TanH activa ivati tion n functio tion

  • Can fit

t with th one hidden dden laye yer. 28

AI AI

TanH TanH More re Powerf erful l Ex Exponen enti tial al Tra ransf nsforma mation tion than an PCA Linear near Transf nsform rmation ation Percep ceptr tron

slide-29
SLIDE 29

MISSING VALUE - Neural Network

  • The R-square of both Training and

Validation are above 0.7.

  • Though validation portion is weaker

(typical Over-fit concern for Neural).

  • Chocolate Type, Calcium, and Sugar as

top three predictors for predicting the Cocoa%

29 JMP 13 >> Analy lyze e >> Predi dictive ctive Mode deli ling g >> Neural ral

AI AI & S T A TI TI S TI TI C S

Stronge ger

slide-30
SLIDE 30

Neural Network- Estimates (JMP Default Setting)

30 JMP 13 >> Analy lyze e >> Predi dictive ctive Model eling >> Neural ral >> D Dia iagr gram am

AI AI

Hig igher er Sensit itiv ivit ity: y:

  • 3rd

rd Node

e

  • Choco

colat ate e Type pe

Neural ral TanH Neural al TanH

slide-31
SLIDE 31
  • 5. DSD of

Neural Setting

31 JM JMP 13 >> DOE OE >> Definiti finitive ve Scre reenin ening Desi esign gn JMP 13 >> DOE OE >> Desi sign n Dia iagnostic nostic >> Evaluate uate Desi esign gn JMP 13 >> Analy lyze e >> Fit Model el JM JMP 13 >> Analy lyze e >> Fit Y b by X JMP 13 >> Analy lyze e >> Distribution istribution JMP 13 >> Save e Script pt >> To D Data ta Table ble

  • r T

r To Scri ript pt Window dow >> Ed Edit/ it/Save/ Save/Run Scrip ipt

  • Mason C., “Op

Opti timi mize ze Neural al Networ

  • rk Algori

rith thm of

  • f Missing Value Imputat

tation

  • n“, submitt

tted ed to to 2019 ASA ENAR Spring Meeting

slide-32
SLIDE 32

Resolve Neural Over-Fit Concern

32

Four DO DOE E Inpu put t Va Varia iables: s:

  • Validation Method (Categorical)
  • Validation Setting (Continuous) “Nested”

under Validation Method

  • Random Seed (Categorical)
  • Number of Hidden Nodes (Continuous)

Two DO DOE Ou E Output put Respo pons nses: s:

  • R-Square of Training Set
  • R-Square of Validation Set (More

e Import portan ant- Neural al Over-fi fit) t) Object ctive: ive: optimize Neural settings to resolve

  • ver-fit by improving R-Square of both Training

and Validation for Coco coa a Mis issin ing g Imput putat ation ion JMP Neural al Va Valid idat ation ion Method

  • ds:

s:

  • Holdb

dbac ack: : randomly divides the original data into the training and validation (holdback portion) sets.

  • Kfold
  • ld:

: divides the data into K subsets. Eac ach K set used to validate the model fit on the rest

  • f the data, fitting a total of K models
  • els. Chose

model giving the best val alid idat ation ion stat atis isti

  • tic. Best

for small data sets (makes efficient use of limited data)

JMP 13 >> DOE OE >> Definiti finitive ve Scree eenin ning Desi esign gn

E N GI GI N E E RI RI N G

slide-33
SLIDE 33

Evaluate DSD of Optimizing Neural Settings

33

14 DS DSD D Runs

✓18 DS

DSD Ru D Runs s is is saf afer er on Power er Power er Test of Sig ign (> 90%) Correl relat ation ion of Confoundin

  • unding

g (<0.3 0.3) Un Unif ifor

  • rmi

mity ty of Predic edicti tion

  • n

Power er

JMP 13 >> DOE OE >> Desi sign n Dia iagnostic nostic >> Evaluate uate Desi esign gn

E N GI GI N EE EE RI RI N G

Add d Fou

  • ur

r Ra Rando dom Corner rner Poin ints ts

slide-34
SLIDE 34

34

Fit Model (Nested) and Set Desirability

Constr truct ct Model el Effec fects ts:

  • Validation Setting is “Nest

sted ed” under Validation Method

  • Choose Resp

sponse nse Surface ace (RS) JMP 13 >> Analy lyze e >> Fit Model el 34

S T A T I S T IC IC S

Resol

  • lve

ve Neural al Over er-Fit Fit

slide-35
SLIDE 35

Optimal Neural Network Setting

Opt ptim imal al Neura ral Settin ing: g:

  • Kfold
  • ld is better than Holdback (small

sample size and favor validation)

  • 5

5 Kfold ld numbers (24/5 ~ 5 data for validation set)

  • Use Random Seed=

d= 5 to improve reproducibility

  • 4 H

Hid idde den n Nodes es is best (Constrained by 15 input variables for one layer)

  • Achieve >99% R-Squ

quare are fit on predicting Cocoa%

JMP 13 >> Analy lyze e >> Fit Y b by X 35

S T A T I S T I C S

Nested ed

Future e Work rk:

  • R-Square

quare > 1 100%, not followi wing Normal mal Dist stribution ribution?

  • Wide

der r Confide fidence ce Interval: erval: Smal all l Vali alida dation tion Datas ataset, et, or Outli liers ers?

slide-36
SLIDE 36

Understand Neural Optimization (Future Work)

36

S T A T I S T I C S

T

V

T T T

Holdb dbac ack Porti tion

  • n = 0.2

Kfold

  • ld K=5

K=5, , Select ct the Best t am among g 5 Mode dels ls Consider ider Neural al Over er-fi fit t (lower wer Va Valid idat ation ion R-Squar uare) e)

  • If K is

is lar arge, small size in each K cluster, making validation Over-Fit concern worse

  • If K is

is smal all, losing advantage of using Kfold over Holdback

  • When

n total al sam ampl ple e siz ize is is smal aller, er, may ay pr prefer fer Kfold

  • ld method
  • d

wit ith smal aller er K number ber Coin inci ciden dence ce wit ith Four Hid idden den Nodes? es?

  • The optimal Neural suggests four hidden nodes of

transforming the 15 Input Nutrition Variables

  • Section 2 Clustering Variables also suggests four clusters
  • Neural

al relat ated ed to PCA Eig igen al algorit ithm hm (Tan anH ~ Lin inear ar)? Why Kfold

  • ld over

r Holdb dbac ack? k?

slide-37
SLIDE 37

Neural Model Enhancement

Origi iginal nal Neural ral Setti ting Optimal timal DSD SD Neural al Setti ting

Indi dica cati ting ng 4th

th Choco

colat ate e Type pe- Fruit it Choco colat late e (Vi Vitam amin in C)

37

E N GI GI N E E RI RI N G

Validation R-Square improved by 20%

JMP 13 >> Analy lyze e >> Predi dictive ctive Mode deli ling g >> Neural ral

Fruit it

slide-38
SLIDE 38

Default: Missing Value Imputation Optimal Neural (Predicted Cocoa %)

38

E N GI GI N E E RI RI N G

Reduc uce e Risk sk Misc isclassi lassifi ficat ation ion of unhea ealth thy y Dark ark Chocolates

  • lates
slide-39
SLIDE 39

Achievements AND FUTURE RESEARCH

Achievements: ✓ Adopted and Integrated “STEAMS” methodology successfully ✓ Learned Chocolate Products, Nutrition Anti-Oxidant Science ✓ Applied Multivariate Statistics, Clustering and Neural Algorithms ✓ Conducted DSD optimization on Resolving Neural Overfit Future JMP Research:  Investigate “Fruit” Chocolate Type, Outlier Effect  JMP Pro Partition: Bootstrap Forrest, Boosted Tree, K-Nested, Naïve Bayes  JMP Pro Neural: Deep Learning, Hidden Layer Structure, Fitting Options  Certify JMP Script Specialist

39

slide-40
SLIDE 40

Q&A Thank YOU

40

JAVA VA/La /Latex tex Advisor: isor: Dr

  • Dr. Yin

ing Huan ang Bio iology

  • gy/Wr

Writ itin ing g Advis visor:

  • r: CMQ/OE

OE Pat atric ick k Giu iulia iano Bio iology

  • gy Advi

visor

  • r:

: Dr

  • Dr. I-

Chen Chen Robotic

  • tics

s Advis visor:

  • r: CQE/CRE

RE Rolan and d Jones Stat atis istics// ics//MAT MATH/A /AI I Advis visor:

  • r: Dr

Dr. Char arles es Chen STEAMS AMS Advisor: isor: ASQ Fellow w Dr Dr. John n Flai aig STEAM AMS S Advisors: isors: Stan anfor ford OHS STEAM AMS S Advisor: isor: JMP

Chuck Boiler

STEAM AMS S Advisor: isor: ASA Dr Dr. Chris is Bar arker er STEAM AMS S Advisor: isor: IEO EOM M Dr

  • Dr. Ali

i Ahad ad, IEOM Dr

  • Dr. Do

Don Reim imer er

slide-41
SLIDE 41

Optimize Desirability Function Importance

40

E N GI GI N E E RI RI N G

Optimize the Number of Hidden Nodes:

  • Higher R-Square of Training Set at Nodes=3
  • Higher R-Square of Validation Set at Nodes=4
  • Relative Importance can impact the Optimal Setting

Little room for further improvement on setting the relative importance between Training Set and Validation Set

Conduct a 2-Factor & 3-Level Full Factorial on comparing the relative importance in (1,2) range

  • Set the Desirability Function Range of (0.999, 0.95,

0.9)

  • In General, the optimal result shows the similar trend:

3 hidden nodes favor training set and 4 hidden nodes favor validation set

slide-42
SLIDE 42

Document Key JMP Scripts

42

Partition( Y( :Cocoa_Percent ),

X( :Type, :Name( "Calories (g)" ), :Name( "Calories_from_Fat (g)" ), :Name( "Total_Fat (g)" ), :Name( "Saturated_Fat (g)" ), :Trans_Fat, :Name( "Cholesterol (mg)" ), :Name( "Sodium (mg)" ), :Name( "Carbs (g)" ), :Name( "Dietary_Fiber (g)" ), :Name( "Sugar (g)" ), :Name( "Protein (g)" ), :Vitamin_A, :Vitamin_C, :Calcium, :Iron ),

Minimum Size Split( 3 ), Validation Portion( 0.6 ),

Split History( 1 ), Informative Missing( 1 ), Column Contributions( 1 ), Initial Splits( :Name( "Cholesterol (mg)" ) >= 5 ), SendToReport( Dispatch( {}, "Partition", FrameBox, {Frame Size( 480, 56 )} ) ) );

Neural( Y( :Cocoa_Percent ),

X( :Name( "Calories (g)" ), :Name( "Calories_from_Fat (g)" ), :Name( "Total_Fat (g)" ), :Name( "Saturated_Fat (g)" ), :Trans_Fat, :Name( "Cholesterol (mg)" ), :Name( "Sodium (mg)" ), :Name( "Carbs (g)" ), :Name( "Dietary_Fiber (g)" ), :Name( "Sugar (g)" ), :Name( "Protein (g)" ), :Vitamin_A, :Vitamin_C, :Calcium, :Iron, :Type ),

Informative Missing( 0 ), Validation Method( "KFold", 5 ), Fit( NTanH( 4 ), Diagram( 1 )

),

Fit Model( Y( :Name( "R-Square of Training Set" ), :Name( "R-Square of Validaiton Set" ) ),

Effects( :Validation Setting[:Vaidation Method], :Vaidation Method, :Random Seed, :Hidden Nodes & RS, :Vaidation Method * :Random Seed, :Vaidation Method * :Hidden Nodes, :Random Seed * :Hidden Nodes, :Hidden Nodes * :Hidden Nodes ),

Personality( "Standard Least Squares" ),

Emphasis( "Effect Screening" ), :Name( "R-Square of Training Set" ) << {Summary of Fit( 0 ), Analysis of Variance( 0 ), Lack of Fit( 0 ), Sorted Estimates( 0 ), Plot Actual by Predicted( 1 ), Plot Regression( 0 ), Plot Residual by Predicted( 1 ), Plot Studentized Residuals( 1 ), Plot Effect Leverage( 0 ), Box Cox Y Transformation( 1 )}, :Name( "R-Square of Validaiton Set" ) << {Summary of Fit( 0 ), Analysis of Variance( 0 ), Lack of Fit( 0 ), Sorted Estimates( 0 ), Plot Actual by Predicted( 1 ), Plot Regression( 0 ), Plot Residual by Predicted( 1 ), Plot Studentized Residuals( 1 ), Plot Effect Leverage( 0 ), Box Cox Y Transformation( 1 )} ),

42 JMP 13 >> Save e Script pt >> To D Data ta Table ble

  • r T

To Script pt Window dow >> Edit/Save/ it/Save/Run Run Scrip ipt

AI AI