6th WCRI 2019 Effectiveness of data auditing as a tool to - - PowerPoint PPT Presentation

6th wcri 2019 effectiveness of data auditing as a tool to
SMART_READER_LITE
LIVE PREVIEW

6th WCRI 2019 Effectiveness of data auditing as a tool to - - PowerPoint PPT Presentation

6th WCRI 2019 Effectiveness of data auditing as a tool to reinforce good Research Data Management (RDM) practice Yusuf Ali Assistant Professor Lee Kong Chian School of Medicine Nanyang Technological University Singapore 5 June 2019 1 Why


slide-1
SLIDE 1

6th WCRI 2019 Effectiveness of data auditing as a tool to reinforce good Research Data Management (RDM) practice

Yusuf Ali Assistant Professor Lee Kong Chian School of Medicine Nanyang Technological University Singapore 5 June 2019

1

slide-2
SLIDE 2

Why focus on RDM?

(Source: Campos-Varela & Ruano-Raviña)

Reason for retraction Articles, n (%) Misconduct, n (%) Plagiarism 354 (32.7) Yes No Uncertain 354 (100) Data* 352 (32.5) Yes No Uncertain 129 (36.6) 1 (0.3) 222 (63.1) Review process compromised 152 (14.1) Yes No Uncertain 152 (100)

Table 1. Top three reasons for retraction for 1082 retracted papers. Reproduced from Campos‐ Varela & Ruano‐Raviña, 2018. *Data: unreliable results due to honest errors or deliberate fabrication or manipulation of data or images

2

slide-3
SLIDE 3

Why focus on RDM?

(Source: Retraction Watch)

Figure 1. 445 out of 1721 papers were retracted due to improper data management from 1 January 2018 to 10 April 2019. Adapted from Retraction Watch, 2019.

Data management issues (26%)

  • Concerns/issues about data/image
  • Error in data/image
  • Unreliable data/image
  • Non‐reproducible data

445 1276

3

slide-4
SLIDE 4

Aim

  • Good Research Data Management (RDM) safeguards data integrity and reproducibility.
  • Data management plan (DMP) was instituted to reinforce RDM.
  • Since 14 April 2016, release of research funds in NTU required a DMP.
  • As of July 2018, many research staff and students were unaware of DMPs and there were no

compliance checks on DMPs..

Hypothesis: Audits of DMP will improve RDM awareness and compliance in the research laboratories.

(pre‐registered in Open Science Framework DOI 10.17605/OSF.IO/694E7)

4

slide-5
SLIDE 5

Methods (1): Survey of PIs and Researchers

Survey Researchers (n = 20) Pre‐audit Post‐audit Shapiro‐Wilk Test Paired t‐test (Total scores) α = 0.05 Sign test (Individual questions) α = 0.05

  • 12 questions on:

– awareness of RDM – compliance to storing data in the school central data repository – receptiveness to DMP

  • If multiple answers were

accidentally chosen, the answer was considered invalid.

Least favourable reaction to audit Most favourable reaction to audit

Research PIs (n = 15) Pre‐audit Post‐audit

4 weeks

5

slide-6
SLIDE 6

Methods (2): Data usage

Data usage Audited labs (n = 7) Controls (n = 5) Shapiro‐Wilk Test nparLD (F1‐LD‐F1) (Noguchi et al.,2012) α = 0.05

  • Our medical school mandates

that primary data should be stored in the data repository.

– This has to be stated in the DMP.

Pre‐audit 0 week Start of audit 2 weeks End of audit 4 weeks Post 1 month 8 weeks Post 3 months 16 weeks Post 6 months 28 weeks

6

Friedman Test α = 0.05 Sign Test with Bonferroni correction α = 0.017

slide-7
SLIDE 7

Results (1): Survey

(Individual Question, Research PIs)

Figure 2. Numerical difference in responses for each question between pre‐ and post‐audit for research PIs. Table 2. Results of sign test for audits of research PIs (n ≥ 14).

RDM Data DMP

Question

No significant difference in individual questions for research PIs

7

Sign test

slide-8
SLIDE 8

Total score

Results (1): Survey

(Total score, Research PIs)

Figure 3. Graph of mean of total scores of pre‐ audit vs post‐audit surveys for research PIs. Error bar represents standard deviation.

Research PIs

*

8

Paired t‐test p = 0.03

  • Audits had an overall positive

impact on research PIs.

slide-9
SLIDE 9

Figure 4. Numerical difference in responses for each question between pre‐ and post‐audit for researchers Table 3. Results of sign test for audits of researchers (n ≥ 19).

  • Q8. If storage of research data on the school central data repository is not

mandatory, rate how likely you will store research data within it.

RDM Data DMP

Question

9

Sign test

Results (2): Survey

(Individual Question, Researchers)

slide-10
SLIDE 10
  • Researchers felt that they are more likely to store data in the school central data repository

system after the audit, even if it is not mandatory.

– They work with data daily and are in charge of data storage on the repository. – They had higher contact time with the auditor during the audit and most was spent on checking the data in the repository.

Results (2): Survey

(Individual Question, Researchers)

10

slide-11
SLIDE 11

Results (2): Survey

(Total score)

Figure 5. Graph of mean of total scores of pre‐ audit vs post‐audit surveys for research staff. Error bar represents standard deviation.

Researchers

Total score

11

  • Researchers generally gave high

scores for the pre‐audit survey.

Paired t‐test p = 0.086

slide-12
SLIDE 12

Results (3): Data Usage

Figure 6. Rate of increase of data usage over different time periods.

Pre‐audit 0 week Start of audit 2 weeks End of audit 4 weeks Post 1 month 8 weeks

Table 4. Results of nparLD (F1‐LD‐F1).

Pre-Start Start-End End-Post 1 Month 20 40 60 80 100 500 1000 1500 2000

Rate of increase of data usage

Time Rate of increase (gb/week) Audited (n = 7) Control (n = 5)

12

slide-13
SLIDE 13

Rate of increase (gb/week)

  • 6 out of 7 audited laboratories did

not store data in the school central data depository before the audit and were using alternative forms of storage.

– Large input of data before the audit

Results (3): Data Usage

Figure 7. Rate of increase of data usage

  • ver different time periods.

Pre‐audit 0 week Start of audit 2 weeks End of audit 4 weeks Post 1 month 8 weeks

13

Friedman Test p = 0.013 Sign test p = 0.016 p = 0.016

Rate of increase of data usage * *

slide-14
SLIDE 14

Audit Lapses

Figure 8. Common lapses from data audits (n=17). One did not have any lapse. 34.1% (14) 22.0% (9) 14.6% (6) 12.2% (5) 7.3% (3) 9.8% (4)

Common Lapses from Data Audit

Missing file name in lab notebook/File name not unique/organisation of folders not robust DMP not updated Storage device accountability log absent/not updated Missing primary data No person in charge of data documentation and training Others (e.g. missing protocols, staff not given access to data repository and irregular backup

  • n data repository)

14

slide-15
SLIDE 15

Conclusion

  • The audit had helped research staff understand the importance of storing data in the data

repository.

  • The audit had increased the positive outlook of research PIs towards RDM, usage of data

repository and DMP.

  • Audit had triggered data storage in the data repository before the audit but did not change

the culture of laboratories.

  • Limitations

– Surveys are not anonymous – Selection of controls – Different data production patterns

15

slide-16
SLIDE 16

Acknowledgements

This research is supported by the Singapore Ministry of Education under its Singapore Ministry of Education Academic Research Fund Tier 1 (RGI03/18).

16

  • Ms Celine Lee
  • Ms Lau Hui Xing
  • Ms Goh Su Nee
  • Mr Alan Loe

I would like to thank Prof James Best, Prof Russell Gruen and Prof Fabian Lim for their support.

slide-17
SLIDE 17
  • 1. Campos‐Varela, I., & Ruano‐Raviña, A. (2018). Misconduct as the main

cause for retraction. A descriptive study of retracted publications and their

  • authors. Gaceta sanitaria. doi:10.1016/j.gaceta.2018.01.009
  • 2. Retraction Watch. (2019). The Retraction Watch Database. Retrieved 10

April 2019 http://retractiondatabase.org/RetractionSearch.aspx

  • 3. Noguchi, K., Gel, Y. R., Brunner, E., & Konietschke, F. (2012). nparLD: an R

software package for the nonparametric analysis of longitudinal data in factorial experiments. Journal of Statistical Software, 50(12).

References

17

slide-18
SLIDE 18

Thank you

18