Extracting Metadata from Stata Datasets Suzanna Vidmar and Luke - - PowerPoint PPT Presentation

extracting metadata from stata datasets suzanna vidmar
SMART_READER_LITE
LIVE PREVIEW

Extracting Metadata from Stata Datasets Suzanna Vidmar and Luke - - PowerPoint PPT Presentation

Extracting Metadata from Stata Datasets Suzanna Vidmar and Luke Stevens Clinical Epidemiology and Biosta;s;cs Unit Murdoch Childrens Research Ins;tute Da Data sharing and s a sharing and storag age e To enable data sharing, the data


slide-1
SLIDE 1

Extracting Metadata from Stata Datasets

Suzanna Vidmar and Luke Stevens Clinical Epidemiology and Biosta;s;cs Unit Murdoch Children’s Research Ins;tute

slide-2
SLIDE 2

Da Data sharing and s a sharing and storag age e

  • To enable data sharing, the data should be stored in a format

that does not required a par;cular version of a par;cular sta;s;cal package

  • At the conclusion of a study, data should be stored in a

retrievable format, and not one that may become obsolete

  • The safest retrievable format is to have the data stored in

CSV or text files

  • Stata’s export delimited command writes data from a

Stata dataset to a text file

slide-3
SLIDE 3

But what do the data me mean?

Without a descrip;on of the data, the data file is of limited use

slide-4
SLIDE 4

Me Metada adata a

  • Metadata is data that describes other data
  • My focus is on variable-level meta data, also known as a data

dic;onary

  • Examples of variable-level metadata are data types, variable

labels and value labels

Metadata is a love note to the future

slide-5
SLIDE 5

Extrac8ng the data dic8onary from m Stata

filename.CSV

slide-6
SLIDE 6

But wait, there’s mo more!

Data and metadata can be imported into data capture soOware such as REDCap

slide-7
SLIDE 7

Fe Feature res of REDCa REDCap

  • Secure, web-based applica;on for research databases and

surveys

  • Very easy to use
  • Audit trail
  • User permission controls
  • Data quality measures
  • Data export to sta;s;cal soOware
  • Generate summary report and leQers

hQps://projectredcap.org/

slide-8
SLIDE 8

8

Bu Building a a REDCa REDCap d datab abase ase

  • As with all data capture soOware, data entry forms can be developed

within REDCap

  • A REDCap database can also be built by uploading an external data

dic;onary

slide-9
SLIDE 9

metadatacsv.ado

slide-10
SLIDE 10

Examp mple using me metadatacsv.ado

example.dta dict_example.csv

slide-11
SLIDE 11

Dir Direct ectory and file y and file name me

describe, replace local fullpath: char _dta[d_filename] mata: st_local("fullname", pathbasename("`fullpath'")) local length=strpos("`fullname'",".")-1 local filestub=substr("`fullname'",1,`length')

slide-12
SLIDE 12

Dir Direct ectory and file y and file name me

describe, replace local fullpath: char _dta[d_filename]

  • di "`fullpath'"
  • C:\Users\suzanna.vidmar\Documents\Suzanna\Metadata\example.dta

mata: st_local("fullname", pathbasename("`fullpath'")) local length=strpos("`fullname'",".")-1 local filestub=substr("`fullname'",1,`length')

slide-13
SLIDE 13

Dir Direct ectory and file y and file name me

describe, replace local fullpath: char _dta[d_filename]

  • di "`fullpath'"
  • C:\Users\suzanna.vidmar\Documents\Suzanna\Metadata\example.dta

mata: st_local("fullname", pathbasename("`fullpath'"))

  • di "`fullname'"
  • example.dta

local length=strpos("`fullname'",".")-1 local filestub=substr("`fullname'",1,`length')

slide-14
SLIDE 14

Dir Direct ectory and file y and file name me

describe, replace local fullpath: char _dta[d_filename]

  • di "`fullpath'"
  • C:\Users\suzanna.vidmar\Documents\Suzanna\Metadata\example.dta

mata: st_local("fullname", pathbasename("`fullpath'"))

  • di "`fullname'"
  • example.dta

local length=strpos("`fullname'",".")-1

  • di "`length'"
  • 7

local filestub=substr("`fullname'",1,`length')

slide-15
SLIDE 15

Dir Direct ectory and file y and file name me

describe, replace local fullpath: char _dta[d_filename]

  • di "`fullpath'"
  • C:\Users\suzanna.vidmar\Documents\Suzanna\Metadata\example.dta

mata: st_local("fullname", pathbasename("`fullpath'"))

  • di "`fullname'"
  • example.dta

local length=strpos("`fullname'",".")-1

  • di "`length'"
  • 7

local filestub=substr("`fullname'",1,`length')

  • di "`filestub'"
  • example
slide-16
SLIDE 16

Sa Saving ving da data a dic dic8o 8onar nary y

export delimited "dict_`filestub'.csv", replace

Saves the data file: dict_example.csv

slide-17
SLIDE 17

describe, replace

  • describe usually produces a wriQen report
  • When the replace op;on is specified, instead of a report the data

in memory are replaced with dataset containing the informa;on that would have been presented in the report. The new dataset has an

  • bserva;on for each variable in the original data.
slide-18
SLIDE 18

describe describe, replace

slide-19
SLIDE 19

uselabel

Creates a dataset containing value-label informa;on

slide-20
SLIDE 20

Ex Extr trac ac8ng v 8ng value label alue label name mes

gen recnum=_n

  • recnum contains the number of the current
  • bserva;on

levelsof lname, local(levels) `"coblab"' `"genderlab"' `"noyes"'

  • These are stored in the local macro `levels'
slide-21
SLIDE 21

Cr Crea ea8ng t the c e con

  • nten

ents of ea

  • f each

ch v value l e label el

foreach x of local levels { local fullab qui su recnum if lname=="`x'" local j=r(min) local k=r(max) forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } local lenlab=strlen("`fullab'")-2 local fullab=substr("`fullab'",1,`lenlab') }

slide-22
SLIDE 22

foreach x of local levels { local fullab qui su recnum if lname=="`x'" local j=r(min) local k=r(max) forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } local lenlab=strlen("`fullab'")-2 local fullab=substr("`fullab'",1,`lenlab') }

Cr Crea ea8ng t the c e con

  • nten

ents of ea

  • f each

ch v value l e label el

slide-23
SLIDE 23

Examp mple with co coblab

forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } `i'=1

  • 1, Missing |
slide-24
SLIDE 24

Examp mple with co coblab

forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } `i'=2

  • 1, Missing | 1, Australia |
slide-25
SLIDE 25

Examp mple with co coblab

forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } `i'=3

  • 1, Missing | 1, Australia | 2, United Kingdom |
slide-26
SLIDE 26

Examp mple with co coblab

forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } `i'=4

  • 1, Missing | 1, Australia | 2, United Kingdom | 3, Vietnam |
slide-27
SLIDE 27

Examp mple with co coblab

forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } `i'=5

  • 1, Missing | 1, Australia | 2, United Kingdom | 3, Vietnam | 4, China |
slide-28
SLIDE 28

Examp mple with co coblab

forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } `i'=6

  • 1, Missing | 1, Australia | 2, United Kingdom | 3, Vietnam | 4, China | 5, Singapore |
slide-29
SLIDE 29

Examp mple with co coblab

forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | }

`i'=7

  • 1, Missing | 1, Australia | 2, United Kingdom | 3, Vietnam | 4, China | 5, Singapore | 6, New Zealand |
slide-30
SLIDE 30

Examp mple with co coblab

foreach x of local levels { … forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } local lenlab=strlen("`fullab'")-2 local fullab=substr("`fullab'",1,`lenlab') }

  • 1, Missing | 1, Australia | 2, United Kingdom | 3, Vietnam | 4, China | 5, Singapore | 6, New Zealand |
  • 1, Missing | 1, Australia | 2, United Kingdom | 3, Vietnam | 4, China | 5, Singapore | 6, New Zealand
slide-31
SLIDE 31

Allowing for extreme mely long strings

tempname mem file write `mem' "`x'" _tab "`fullab'" _newline

  • file allows for extremely long string values, up to 2-billion

characters

  • With postfile the limit is 2045 characters
slide-32
SLIDE 32

One week aOer submiing my abstract for this mee;ng …

slide-33
SLIDE 33
slide-34
SLIDE 34

Bea Beaten en t to t

  • the

e punc punch h

Seth LireQe et al Alfred Russel Wallace

slide-35
SLIDE 35

metadatacsv.a do

slide-36
SLIDE 36

The redcapture command

slide-37
SLIDE 37

redcapture syntax

redcapture varlist, file(string) form(string) [text(varlist) dropdown(varlist) radio(varlist) header(string) validate(varlist) validtype(validtypes) validmin(minlist)validmax(maxlist) matrix1(varlist) matrix2(varlist) matrix3(varlist) matrix4(varlist) matrix5(varlist) matrix6(varlist) matrix7(varlist) matrix8(varlist) matrix9(varlist) matrix10(varlist)]

slide-38
SLIDE 38

First, some background on

slide-39
SLIDE 39

REDCa REDCap field field t typ ypes es

slide-40
SLIDE 40

REDCa REDCap v valid alida8 a8on

  • ns f

s for t

  • r text

field fields

slide-41
SLIDE 41

Ca Capturing c categ egor

  • rical d

data i in REDCa REDCap

slide-42
SLIDE 42

Examp mple Stata dataset

slide-43
SLIDE 43

Examp mple script

redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)

  • Metadata are saved in example.csv. This is the data dic4onary that will be

uploaded to REDCap.

  • The form/instrument name in REDCap is example_form
  • Its header is "Example"
slide-44
SLIDE 44

redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3) For categorical variables. They must be numeric with value labels aEached.

Examp mple script

slide-45
SLIDE 45

redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)

  • These are text fields
  • All variables in the validate() op4on must be declared as text fields

Examp mple script

slide-46
SLIDE 46

redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)

  • id is a social security number
  • bdate is a date field in YMD format
  • dbp is an integer
  • comment is a string

Examp mple script

slide-47
SLIDE 47

redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)

  • To omit range checks for any or all of the valida4on variables, "none" should

be entered into the corresponding loca4on

  • These are soM checks

Examp mple script

slide-48
SLIDE 48

redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)

  • Radio fields with a common set of response op4ons can be grouped in a

matrix

  • See next slide

Examp mple script

slide-49
SLIDE 49

Ma Matrix trix o

  • f fields

f fields

slide-50
SLIDE 50

Da Data dic8o a dic8onar nary y

The redcapture command created this data dic;onary …which can be uploaded into REDCap

slide-51
SLIDE 51

In In c con

  • nclu

clusion sion … …

  • 1. Ensure data will be retrievable 10 or 20 years from now
  • 2. Ensure the next genera;on of researchers will be able

to understand currently archived data How? By storing both data and metadata in text files Stata's export delimited and redcapture commands facilitates this Data and metadata can be uploaded to data capture soOware such as REDCap