A Systematic Literature Review on Evaluation of Digital Tools for - - PDF document

▶

Dec 10, 2022 144 likes •237 views

A Systematic Literature Review on Evaluation of Digital Tools for Authoring Evidence - based Clinical Guidelines Soudabeh KHODAMBASHI and ystein NYTR Department of Computer Science, Norwegian University of Science and Technology, Trondheim,

SLIDE 1

A Systematic Literature Review on Evaluation of Digital Tools for Authoring Evidence-based Clinical Guidelines

Soudabeh KHODAMBASHI and Øystein NYTRØ Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway

Abstract. To facilitate the clinical guideline (GL) development process, different

groups of researchers have proposed tools that enable computer-supported tools for authoring and publishing GLs. In a previous study we interviewed GL authors in different Norwegian institutions and identified tool shortcomings. In this follow-up study our goal is to explore to what extent GL authoring tools have been evaluated by researchers, guideline organisations, or GL authors. This article presents results from a systematic literature review of evaluation (including usability) of GL authoring tools. A controlled database search and backward snow-balling were used to identify relevant articles. From the 12692 abstracts found, 188 papers were fully reviewed and 26 papers were identified as relevant. The GRADEPro tool has attracted some evaluation, however popular tools and platforms such as DECIDE, Doctor Evidence, JBI-SUMARI, G-I-N library have not been subject to specific evaluation from an authoring perspective. Therefore, we found that little attention was paid to the evaluation of the tools in general. We could not find any evaluation relevant to how tools integrate and support the complex GL development

workflow. The results of this paper are highly relevant to GL authors, tool

developers and GL publishing organisations in order to improve and control the GL development and maintenance process.

Keywords. Evidence-based medicine, clinical guidelines, evaluation

Introduction Clinical guidelines (GL) are “systematically developed statements to assist practitioner and patients decision about appropriate health care for specific clinical circumstances” [1]. To facilitate the GL development process, different groups of researchers have proposed and developed tools that enable computer-supported (digital) authoring and publishing [2, 3]. These tools partially or fully support the GL development process. Different organisations seem to employ varying strategies and methods in the GL authoring workflow. The reported software tools also vary in their functionalities and

features. Hence, there is no “standard” tool (set) for GL development. Some of the tools

focus on the GL development and maintenance process, but do also support publishing, presentation and dissemination [4]. In our previous case study, we identified substantial shortcomings of GL tools (including content management systems (CMS)) in a total of four organisations in Norway [5]. The study was based on interviews and observations of authors maintaining digital GLs. As part of that empirical study, we concluded that a review of

SLIDE 2

tool evaluation was necessary. Hence, this paper systematically reviews the literature regarding evaluation of GL development tools identified and discussed in our previous studies (see Section 2) [2, 3]. We note that our literature search did not include software tools for developing computer-interpretable/executable GLs.

1. Digital Tools for Authoring Clinical Guidelines

According to our previous review; based on a systematic literature search using PubMed and Google Scholar [3], contacting Norwegian GL authoring organisations [2, 5], and overviews made by Guidelines International Network (G-I-N) [6]; we identified a total of 21 unique tools and platforms/repositories supporting GL. We categorised the identified tools according to the parts of the GL authoring workflow they covered [3]. Figure 1 presents the identified tools and intended process coverage. The Håndboka, ‘The Handbook’, is one representative (Norwegian) CMS designed specifically for GL authoring.

Figure 1. Tools supporting guideline authoring process.

2. Material and Methods

To find published evaluations of these tools, we searched PubMed and Google Scholar; as shown in Figure 2. Tool names were used as the search criteria. The most recent search was conducted in March 2017. Titles and abstracts of all hits were retrieved and then screened. For cases in which the title and abstract of the reviewed papers were unclear or ambiguous, we screened the full text. After screening for relevance, and removing duplicates, 182 papers were retained for further review. In order to find further relevant literature, we used backward snowballing; i.e. checking papers cited by the 182 papers. This added 6 extra papers for full review. After screening the resulting 188 papers, only 26 papers were specifically identified as relevant to include in this study.

SLIDE 3

Figure 2. Selection process of articles.

3. Results

Table 1 presents a compressed review of the evaluation reported for each tool, showing the focus of each evaluation. Some of the tools (the DECIDE tool, Doctor Evidence platforms, JBI-SUMARI, GIN guideline library) mentioned in Figure 1 were not found to be the subject of any evaluation, therefore these are not present in the table.

Table 1. Details of evaluation of each tool (based on Figure 1). Tool Evaluation details MAGICapp How it supports the World Health Organisation (WHO) GL authoring process (functions and features) [2], feedback from users regarding how it supports them in the process of authoring (regarding supported functionalities: collaboration, communication, version control, archiving system, reference manager, support

f standard terminologies (ICPC, ICD, SNOMED-CT, ATC, RxNorm, MeSH),

managing users’ feedback, guideline template, drawing of workflow based GL, export file format.) in an organisation [5], feedback based on restructuring of available recommendations into a multilayered format as supported by MAGICapp [7]. GRADEPro Support of WHO GL development process [2], software tested with users in workshops and in authoring processes [8], feedback from GL methodologists about the Evidence to Decision (EtD) framework [9], assessed GRADE methodology and the tool for systematic reviews (SR) in terms of inter-rater agreement and identify areas of uncertainty [10], formative evaluation during the design and update of EtD framework and then summative user testing [11], assessed the effects of formatting alternatives in GRADE evidence profiles on GL panelists’ preferences, comprehension, and accessibility [12], evaluated how authors explain the reasons for downgrading and upgrading strength of recommendations according to GRADE methodology and where guidance is needed to create concise, clear, accurate, and relevant explanatory footnotes and comments [13], tested what information to include in the evidence summaries table and how to present it in tables [14]. Internet Portal for guideline development Feedback on using the tool and its effect on reduction of time and cost development [15] Håndboka Support of WHO GL development process [2].

SLIDE 4

Tool Evaluation details BRIDGE-Wiz Support of WHO GL development process [2], perceived usefulness and usability [16]. CAN- IMPLEMENT Comparison of this tool with ADAPTE (a methodological template for guideline adaptation) in guideline adaptation process [17]. RevMan 5 Comparison of this tool to EPPI-Reviewer 4 in performing SR in Software Engineering [18], brief review of this tool in comparison with Abstrackr and EPPI-Reviewer 4 in SR (not an evaluation) [19]. DistillerSR Brief review of this tool in comparison with Abstrackr and RevMan in performing SR (not an evaluation) [19]. Rayyan Beta testing on different features of the tool, conducted a survey from users of the tool to collected feedback through a built-in feature [20], comparing Ryaan to Covidence with regard to feaures, usability, compatibility, strengths, and weaknesses [21]-[22], evaluated whether the tool is robust enough to capture most errors in real-world data sets and identified quantitative or qualitative errors in the tool [23]. SRDR Examined major nontechnical challenges related to future governance of the repository, as well as user certification, data curation and quality control, and intellectual property rights [24], error rates and the time taken to complete abstraction [25], comparison of Abstrackr and SRDR and discussed their advantages and disadvantages [26]. EPPI-Reviewer 4 Comparison of this tool to RevMan in performing SR in Software Engineering [18], Brief review of this tool in comparison with Abstrackr and RevMan in performing SR (not an evaluation) [19]. CREBP SRA Evaluated the effectiveness of this tool in duplicate detection compare to EndNote for SR [27]. Covidence Comparison of this tool to Ryaan regarding features, usability, compatibility, strengths, and weaknesses [21]-[22]. Epistemonikos Investigated whether Epistemonikos can replace The Cochrane Library by comparing the results of similar search strategies in both sources [28]. Abstrackr Comparison of Abstrackr and SRDR, their advantages and disadvantages [26], survey literature about the extent it reduces workload and the tool limitations (its effect on recall) [29], evaluate the performance of Abstrackr (accuracy: “according to the number of relevant studies missed, the workload saving, the false negative rate, and the precision of the algorithm to correctly predict relevant studies for inclusion”) [30].

4. Discussion and Conclusion

Based on the results given in Table 2, GRADEPro has attracted the most attention for evaluation, whilst evaluation of the DECIDE tool, Doctor Evidence platforms, JBI- SUMARI, GIN guideline library were not addressed at all. In some of the reviewed articles( [8], [11], [15], [19]) details of the evaluation has not been thoroughly reported, therefore it is unclear how the evaluation was conducted. We could not see any formative evaluation during the design of any of the tools except in [11]. Although we could see comparison of tools for performing SR, the articles did not evaluate the tool, rather they reviewed its features [19] or explored application of the tool in other domains such as Software Engineering [18]. Hence, little attention was paid to the

SLIDE 5

evaluation of the tools specifically for SR. According to the results, we could identify a total of seven evaluation themes: 1) full support of the GL development process ([2]), 2) usefulness of tool(s) in practice (regarding implemented features or perceived usefulness and usability) ([5], [16], [21]-[22]), 3) formatting alternatives ([7], [12], [14]), 4) testing or evaluating the implemented functions and features ([8], [9], [10], [11], [13], [20], [21], [22], [23], [24]), 5) tool performance (i.e. error rate, time taken, duplicate detection, recall, reduce workload) ([25], [27], [29], [30]), 6) effect of using tool on reduction of time and cost ([15]), and 7) comparing of tools ([17], [18], [19], [21], [22], [26], [27], [28]). Although the evaluation of the effects of formatting alternatives (i.e. multilayered presentation, how to present a summary of the evidence in table) have been addressed in MAGICapp and GRADEPro, We believe that optimising the presentation of GLs and numeric data are not only important during GL authoring but can also benefit GL users, according to results in [31]. Full support of the GL development process have been the subject of one study [2] (MAGICapp, GRADEpro, and Håndboka have been evaluated compared to the WHO guideline manual). However, we could not find any

ther tool evaluation report on support of GL manual published by other organisations.

Although there are software tools designed to support specific aspects of the GL authoring process, not all of them are commonly used. Furthermore, to date, little is known about the usefulness of these tools in practice, and frequently only speculation concerning their potential use for GL authoring is reported. It is also important to evaluate how tools that do not fully support the authoring process can be integrated regarding data exchange. We envisage an optimised process where authors can export data/information from one tool to another in order to complete the process of GL authoring (i.e. identify the scope, plan the development, perform the systematic review, appraise the evidence, extract and synthesise data, develop recommendations, draft the GL, and create a publishing layout). Therefore, not only is the evaluation of one tool as itself is necessary, but it is also important to evaluate the integration of the tool regarding data exchange (import/export) that are supposed to support and streamline different steps in GL development. To aid tool development, help supporting of tools becoming more mature, we encourage GL authors and authoring organisations to report back on their experiences regarding tool use. To this end results of this paper may be useful to other GL developers as well as GL authoring organisations. References

[1] Field MJ, Lohr KN. Guidelines for clinical practice: from development to use: National Academies Press; 1992. [2] Khodambashi S, Nytrø Ø, editors. Information System Support for the WHO Clinical Guideline Development Process: A Case Study Approach. ECIME2016-10th European Conference on IS Management and Evaluation: ECIME 2016; 2016: Accepted, under publishing process, Academic Conferences and publishing limited. [3] Khodambashi S, Nytrø Ø. Reviewing Clinical Guideline Development Tools: Features and

Characteristics. 2017:30.

[4] Bernstam E, Ash N, Peleg M, Tu S, Boxwala AA, Mork P, Shortliffe EH, Greenes RA, editors. Guideline classification to assist modeling, authoring, implementation and retrieval. Proceedings of the AMIA Symposium; 2000: American Medical Informatics Association. [5] Khodambashi S, Nytrø Ø, editors. Tool Support for Maintaining Clinical Guidelines: A Case Study. ECIME2015-9th European Conference on IS Management and Evaluation: ECIME 2015; 2015: Academic Conferences and publishing limited.

SLIDE 6

[6] G-I-N. Guideline process - useful tools. 2015 [cited 2016 May]; List of useful tools for the guideline process, their features and cross-compatibilities]. Available from: https://docs.google.com/spreadsheets/d/1XE-zbU- FqK08nFyfhvB17tXeq_r2j5BRAeZZ4Xa5Axg/edit#gid=0. [7] Kristiansen A, Brandt L, Agoritsas T, Akl EA, Berge E, Jacobsen AF, Granan L-P, Halvorsen S, Guyatt G, Vandvik PO. Applying new strategies for the national adaptation, updating, and dissemination of trustworthy guidelines: results from the Norwegian adaptation of the antithrombotic therapy and the prevention of thrombosis: American College of Chest Physicians evidence-based clinical practice

guidelines. CHEST Journal. 2014;146(3):735-61.

[8] Brozek J, Akl E, Falck-Ytter Y, Kunstman P, Meerpohl J, Mustafa R, Nowak A, Oxman A, Santesso N, Wiercioch W. 046 Guideline Development Tool (GDT)–Web-Based Solution for Guideline Developers and Authors of Systematic Reviews. BMJ Quality & Safety. 2013;22(Suppl 1):A26-A. [9] Neumann I, Brignardello-Petersen R, Wiercioch W, Carrasco-Labra A, Cuello C, Akl E, Mustafa RA, Al-Hazzani W, Etxeandia-Ikobaltzeta I, Rojas MX. The GRADE evidence-to-decision framework: a report of its testing and application in 15 international guideline panels. Implementation Science. 2016;11(1):93. [10] Hartling L, Fernandes RM, Seida J, Vandermeer B, Dryden DM. From the trenches: a cross-sectional study applying the GRADE tool in systematic reviews of healthcare interventions. PloS one. 2012;7(4):e34697. [11] Schünemann HJ, Mustafa R, Brozek J, Santesso N, Alonso-Coello P, Guyatt G, Scholten R, Langendam M, Leeflang MM, Akl EA. GRADE Guidelines: 16. GRADE evidence to decision frameworks for tests in clinical practice and public health. Journal of clinical epidemiology. 2016;76:89-98. [12] Vandvik PO, Santesso N, Akl EA, You J, Mulla S, Spencer FA, Johnston BC, Brozek J, Kreis J, Brandt

L. Formatting modifications in GRADE evidence profiles improved guideline panelists comprehension

and accessibility to information. A randomized trial. Journal of clinical epidemiology. 2012;65(7):748- 55. [13] Langendam M, Carrasco-Labra A, Santesso N, Mustafa RA, Brignardello-Petersen R, Ventresca M, Heus P, Lasserson T, Moustgaard R, Brozek J. Improving GRADE evidence tables part 2: a systematic survey of explanatory notes shows more guidance is needed. Journal of clinical epidemiology. 2016;74:19-27. [14] Mustafa RA, Wiercioch W, Santesso N, Cheung A, Prediger B, Baldeh T, Carrasco-Labra A, Brignardello-Petersen R, Neumann I, Bossuyt P. Decision-making about healthcare related tests and diagnostic strategies: user testing of GRADE evidence tables. PloS one. 2015;10(10):e0134553. [15] Höhne W, Karge T, Siegmund B, Preiss J, Hoffmann J, Zeitz M, Fölsch U. An internet portal for the development of clinical practice guidelines. Applied Clinical Informatics. 2010;1(4):430. [16] Shiffman RN, Michel G, Rosenfeld RM, Davidson C. Building better guidelines with BRIDGE-Wiz: development and evaluation of a software assistant to promote clarity, transparency, and

implementability. Journal of the American Medical Informatics Association. 2012;19(1):94-101.

[17] Harrison MB, Graham ID, Fervers B, van den Hoek J. Adapting knowledge to a local context. Knowledge translation in health care: Moving from evidence to practice. 2009:73-82. [18] Marshall C, Brereton P, Kitchenham B, editors. Tools to support systematic reviews in software engineering: a cross-domain survey using semi-structured interviews. Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering; 2015: ACM. [19] Cook DA, West CP. Conducting systematic reviews in medical education: a stepwise approach. Medical education. 2012;46(10):943-52. [20] Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Systematic reviews. 2016;5(1):210. [21] Couban R. Covidence and Rayyan. Journal of the Canadian Health Libraries Association/Journal de l'Association des bibliothèques de la santé du Canada. 2016;37(3). [22] Clarke K. Medically Assisted Death in Canada* Unsettled (and Unsettling?) Law. VOLUME 37, NUMBER/NUMÉRO 3 DECEMBER 2016 DÉCEMBRE. 2016:118. [23] Abedjan Z, Chu X, Deng D, Fernandez RC, Ilyas IF, Ouzzani M, Papotti P, Stonebraker M, Tang N. Detecting Data Errors: Where are we and what needs to be done? Proceedings of the VLDB

Endowment. 2016;9(12):993-1004.

[24] Lau J, Hadar N, Iovin R, Ip S, Balk EM. Proposed Governance and Data Management Policy for the Systematic Review Data Repository. Agency for Healthcare Research and Quality (US), Rockville (MD); 2012. [25] Saldanha IJ, Schmid CH, Lau J, Dickersin K, Berlin JA, Jap J, Smith BT, Carini S, Chan W, Bruijn B. Evaluating Data Abstraction Assistant, a novel software application for data abstraction during systematic reviews: protocol for a randomized controlled trial. Systematic reviews. 2016;5(1):196.

SLIDE 7

[26] Bonardi A, Clifford CJ, Hadar N. A Structured Approach Using the Systematic Review Data Repository (SRDR) Building the Evidence for Oral Health Interventions in the Population With Intellectual and Developmental Disability. Evaluation Review. 2016:0193841X16664811. [27] Rathbone J, Carter M, Hoffmann T, Glasziou P. Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module. Systematic reviews. 2015;4(1):6. [28] Stromme H, Straumann GH, Kirkehei I, Heintz M, Hafstad E. Can Epistemonikos replace The Cochrane Library? [29] Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review automation

technologies. Systematic reviews. 2014;3(1):74.

[30] Rathbone J, Hoffmann T, Glasziou P. Faster title and abstract screening? Evaluating Abstrackr, a semi- automated online screening program for systematic reviewers. Systematic reviews. 2015;4(1):80. [31] Khodambashi S, Nytrø Ø, editors. Usability Evaluation of Published Clinical guidelines on the Web: A Case Study. 2016 IEEE 29th International Symposium on Computer-Based Medical Systems (CBMS); 2016: IEEE.