The Tools of the Trade: How to The Tools of the Trade: How to Find - - PowerPoint PPT Presentation

the tools of the trade how to the tools of the trade how
SMART_READER_LITE
LIVE PREVIEW

The Tools of the Trade: How to The Tools of the Trade: How to Find - - PowerPoint PPT Presentation

The Tools of the Trade: How to The Tools of the Trade: How to Find or Create the Evaluation Find or Create the Evaluation Tools You Need Tools You Need Dan A. McDonald Dan A. McDonald Donna J. Peterson Donna J. Peterson CYFERnet Evaluation


slide-1
SLIDE 1

The Tools of the Trade: How to The Tools of the Trade: How to Find or Create the Evaluation Find or Create the Evaluation Tools You Need Tools You Need

Dan A. McDonald Dan A. McDonald Donna J. Peterson Donna J. Peterson CYFERnet CYFERnet Evaluation Evaluation The University of Arizona The University of Arizona

slide-2
SLIDE 2

CYFERnet CYFERnet Evaluation Web Resources Evaluation Web Resources

Designing a Program Evaluation Designing a Program Evaluation Process Evaluation Tools and Instruments Process Evaluation Tools and Instruments Outcome Evaluation Tools and Instruments Outcome Evaluation Tools and Instruments Data Analysis and Reporting Data Analysis and Reporting Evaluating Early Childhood Outcomes Evaluating Early Childhood Outcomes Evaluating School Age Outcomes Evaluating School Age Outcomes Evaluating Teen Outcomes Evaluating Teen Outcomes Evaluating Parent/Family Outcomes Evaluating Parent/Family Outcomes Evaluating Community Outcomes Evaluating Community Outcomes Evaluating Organizational Capacity Evaluating Organizational Capacity Evaluating Technology Use Evaluating Technology Use Evaluating Program Sustainability Evaluating Program Sustainability Building Capacity for Evaluation Building Capacity for Evaluation

slide-3
SLIDE 3

CYFERnet CYFERnet Evaluation Resources Evaluation Resources

slide-4
SLIDE 4

Reliability and Validity Reliability and Validity

A quick Reading Assessment A quick Reading Assessment How reliable is this measure? How reliable is this measure? How valid is this measure? How valid is this measure?

slide-5
SLIDE 5

Reliability and Validity Reliability and Validity

Reliability: Reliability: Are things being measured Are things being measured consistently? consistently? Validity: Validity: Are we measuring what we think Are we measuring what we think we are? we are? Bathroom Scale example Bathroom Scale example

slide-6
SLIDE 6
slide-7
SLIDE 7

Why are these concepts important? Why are these concepts important?

Without the agreement of independent Without the agreement of independent

  • bservers able to replicate
  • bservers able to replicate

research/evaluation procedures, or the research/evaluation procedures, or the ability to use research tools and ability to use research tools and procedures that yield consistent procedures that yield consistent measurements, researchers/evaluators measurements, researchers/evaluators would be unable to satisfactorily draw would be unable to satisfactorily draw conclusions, formulate theories, or make conclusions, formulate theories, or make claims about the claims about the generalizability generalizability of their

  • f their

work. work.

slide-8
SLIDE 8

Reliability Reliability

Extent to which an experiment, test, or any Extent to which an experiment, test, or any measuring procedure yields the same measuring procedure yields the same result when repeated result when repeated Refers to the precision of a measurement Refers to the precision of a measurement Are things being measured consistently? Are things being measured consistently?

slide-9
SLIDE 9

Four Types of Reliability Four Types of Reliability

Equivalency or Parallel Forms Reliability Equivalency or Parallel Forms Reliability Stability or Test Stability or Test-

  • Retest Reliability

Retest Reliability Internal Consistency Internal Consistency Interrater Interrater or

  • r Interobserver

Interobserver Reliability Reliability

slide-10
SLIDE 10

Equivalency or Parallel Forms Equivalency or Parallel Forms Reliability Reliability

Extent to which two items/sets of scores Extent to which two items/sets of scores measure identical concepts at an identical level measure identical concepts at an identical level

  • f difficulty
  • f difficulty

Two different instruments designed to measure Two different instruments designed to measure identical constructs are developed and the identical constructs are developed and the degree of relationship (correlation) assessed degree of relationship (correlation) assessed The higher the correlation The higher the correlation coefficient, statistically coefficient, statistically referred to as referred to as r r, the better , the better

slide-11
SLIDE 11

Stability or Test Stability or Test-

  • Retest Reliability

Retest Reliability

Consistency of repeated measurements on the Consistency of repeated measurements on the same subjects same subjects To determine stability, a measure or test is To determine stability, a measure or test is repeated on the same subjects at two repeated on the same subjects at two different times and results are correlated different times and results are correlated Two possible drawbacks: Two possible drawbacks:

  • 1. A person may have changed
  • 1. A person may have changed

between the first and second between the first and second measurement measurement

  • 2. The initial administration of an
  • 2. The initial administration of an

instrument might in itself induce instrument might in itself induce a person to answer differently a person to answer differently

  • n the second administration
  • n the second administration

( (“ “practice effect practice effect” ”) )

slide-12
SLIDE 12

Internal Consistency Reliability Internal Consistency Reliability

Extent to which tests or procedures assess the Extent to which tests or procedures assess the same characteristic, skill or quality same characteristic, skill or quality Do the items in a measure correlate highly? Do the items in a measure correlate highly? Cronbach Cronbach’ ’s s alpha is used to show how well the alpha is used to show how well the items complement each other in measuring items complement each other in measuring different aspects of the same variable different aspects of the same variable

– – alpha reliabilities above .70 are considered good alpha reliabilities above .70 are considered good

Helps researchers interpret data and predict the Helps researchers interpret data and predict the value of scores and the limits of the relationship value of scores and the limits of the relationship among variables among variables

slide-13
SLIDE 13

Interrater Interrater Reliability Reliability

Extent to which two or more individuals (coders, Extent to which two or more individuals (coders, raters, observers) agree raters, observers) agree Addresses the consistency of the Addresses the consistency of the implementation of a rating system implementation of a rating system Interrater Interrater reliability is dependent upon the ability reliability is dependent upon the ability

  • f two or more individuals to be consistent
  • f two or more individuals to be consistent
slide-14
SLIDE 14

Validity Validity

Extent to which the measurement Extent to which the measurement procedure actually measures the concept procedure actually measures the concept that it is intended to measure that it is intended to measure Refers to whether a measurement actually Refers to whether a measurement actually taps into some underlying taps into some underlying ‘ ‘reality reality’ ’ Are we measuring what we think we are? Are we measuring what we think we are?

slide-15
SLIDE 15

Internal and External Validity Internal and External Validity

Internal validity: Internal validity:

– – evidence that what you did in the study (i.e., evidence that what you did in the study (i.e., the program) caused what you observed the program) caused what you observed (i.e., the outcome) to happen (i.e., the outcome) to happen

External validity: External validity:

– – extent to which the results of a study are extent to which the results of a study are generalizable generalizable or

  • r transferable

transferable to other to other persons in other places and at other times persons in other places and at other times

slide-16
SLIDE 16

Types of Internal Validity Types of Internal Validity

Face Validity Face Validity Criterion Related Validity Criterion Related Validity Construct Validity Construct Validity Content Validity Content Validity

slide-17
SLIDE 17

Face Validity Face Validity

Does it seem that we are measuring what Does it seem that we are measuring what we claim? we claim? Does the measure seem like a reasonable Does the measure seem like a reasonable way to gain the information we are way to gain the information we are attempting to obtain? attempting to obtain? A subjective measure of validity A subjective measure of validity

slide-18
SLIDE 18

Content Validity Content Validity

Extent to which items in the instrument Extent to which items in the instrument reflect the purpose of the data collection reflect the purpose of the data collection effort effort Does the content of the measuring Does the content of the measuring instrument reflect the specific intended instrument reflect the specific intended domain of the concept? domain of the concept?

slide-19
SLIDE 19

Criterion Related Validity Criterion Related Validity

Demonstrates the accuracy of a measure or Demonstrates the accuracy of a measure or procedure by correlating it with another measure procedure by correlating it with another measure

  • r procedure which has been demonstrated to
  • r procedure which has been demonstrated to

be valid (called the criterion) be valid (called the criterion) Concurrent Concurrent criterion validity: criterion validity:

– – are results of a new questionnaire consistent with are results of a new questionnaire consistent with results of established measures, e.g., results of established measures, e.g., a "gold a "gold standard" standard"

Predictive Predictive criterion validity: criterion validity:

– – assesses the ability of a survey to predict future assesses the ability of a survey to predict future phenomena phenomena

slide-20
SLIDE 20

Construct Validity Construct Validity

Seeks agreement between a theoretical concept and a Seeks agreement between a theoretical concept and a specific measuring device or procedure specific measuring device or procedure Does the measured concept relate empirically to other Does the measured concept relate empirically to other measured variables in ways that are theoretically measured variables in ways that are theoretically expected? expected? Convergent validity: Convergent validity:

– – the agreement among ratings (gathered independently) where the agreement among ratings (gathered independently) where measures should be theoretically related measures should be theoretically related – – extent to which different data collection approaches produce extent to which different data collection approaches produce similar results similar results

Discriminant Discriminant validity: validity:

– – the lack of a relationship among measures which theoretically the lack of a relationship among measures which theoretically should not be related should not be related – – extent to which results do not correlate with similar but distin extent to which results do not correlate with similar but distinct ct concepts concepts

slide-21
SLIDE 21

Enhancing Internal Validity Enhancing Internal Validity

Think through your constructs carefully Think through your constructs carefully before developing an instrument. before developing an instrument. Have experts critique your instruments. Have experts critique your instruments. Implement multiple versions of an Implement multiple versions of an instrument to see if they produce the same instrument to see if they produce the same results. results. Implement multiple measures of key Implement multiple measures of key constructs and show that they behave as constructs and show that they behave as you theoretically expect. you theoretically expect.

slide-22
SLIDE 22

Enhancing Internal Validity (contd.) Enhancing Internal Validity (contd.)

Keep the population with which the Keep the population with which the measure was measure was normed normed in mind. in mind. Be aware that participants may try to Be aware that participants may try to guess what the purpose of the study is guess what the purpose of the study is and answer questions based on their and answer questions based on their guess. guess. Make participants feel comfortable. Make participants feel comfortable. Do not consciously or unconsciously bias Do not consciously or unconsciously bias participant responses. participant responses.

slide-23
SLIDE 23

Enhancing External Validity Enhancing External Validity

Be thoughtful when selecting a sample. Be thoughtful when selecting a sample. Once selected, try to keep dropout rates Once selected, try to keep dropout rates low. low. Thoroughly document your sampling Thoroughly document your sampling methods, participant characteristics, and methods, participant characteristics, and study procedures. study procedures. Do your study in a variety of places, with Do your study in a variety of places, with different people and at different times. different people and at different times.

slide-24
SLIDE 24

Qualitative Research Qualitative Research Quality Standards Quality Standards

Confirmability Confirmability (equivalent to objectivity) (equivalent to objectivity) Dependability (equivalent to reliability) Dependability (equivalent to reliability) Credibility (equivalent to internal validity) Credibility (equivalent to internal validity) Transferability (equivalent to external Transferability (equivalent to external validity or validity or generalizability generalizability) )

slide-25
SLIDE 25

Confirmability Confirmability

Equivalent to objectivity Equivalent to objectivity Are the data and interpretations grounded in the Are the data and interpretations grounded in the participants participants’ ’ actual descriptions and not in the actual descriptions and not in the researcher researcher’ ’s imagination? s imagination? Questions to assess this standard include: Questions to assess this standard include:

– – 1) Are the study 1) Are the study’ ’s methods described explicitly and in detail? s methods described explicitly and in detail? – – 2) Can the sequence of data collection, processing, 2) Can the sequence of data collection, processing, transforming, and conclusion drawing be followed? transforming, and conclusion drawing be followed? – – 3) Are the conclusions linked to displayed data? 3) Are the conclusions linked to displayed data? – – 4) Has the researcher been explicit about personal assumptions, 4) Has the researcher been explicit about personal assumptions, values, biases, and affective states and how they may have values, biases, and affective states and how they may have played a role in the study? played a role in the study? – – 5) Are the data available for reanalysis by other individuals? 5) Are the data available for reanalysis by other individuals?

slide-26
SLIDE 26

Dependability Dependability

Equivalent to reliability Equivalent to reliability Was the data collection process consistent and Was the data collection process consistent and stable over time and across researchers and stable over time and across researchers and methods? methods? Questions to assess this standard include: Questions to assess this standard include:

– – 1) Are the research questions clear and the study 1) Are the research questions clear and the study procedures congruent with them? procedures congruent with them? – – 2) Were data collected across the full range of 2) Were data collected across the full range of appropriate settings, times, respondents, etc., appropriate settings, times, respondents, etc., suggested by the research questions? suggested by the research questions? – – 3) Are findings meaningfully parallel across data 3) Are findings meaningfully parallel across data sources? sources? – – 4) Were any forms of peer or colleague review in 4) Were any forms of peer or colleague review in place? place?

slide-27
SLIDE 27

Credibility Credibility

Equivalent to internal validity Equivalent to internal validity Is there a match between the realities described by Is there a match between the realities described by participants and those represented by the researcher? In participants and those represented by the researcher? In

  • ther words, are the findings credible and do they
  • ther words, are the findings credible and do they

present an accurate description of the topic being present an accurate description of the topic being examined? examined? Questions to assess this standard include: Questions to assess this standard include:

– – 1) How context 1) How context-

  • rich and meaningful are the descriptions?

rich and meaningful are the descriptions? – – 2) Does the account make sense and seem convincing or 2) Does the account make sense and seem convincing or plausible to the reader? plausible to the reader? – – 3) Is the account comprehensive? 3) Is the account comprehensive? – – 4) Did the original informants feel the conclusions were 4) Did the original informants feel the conclusions were accurate? accurate?

slide-28
SLIDE 28

Transferability Transferability

Equivalent to external validity or Equivalent to external validity or generalizability generalizability. . Are the conclusions of a study transferable to other Are the conclusions of a study transferable to other contexts? contexts? Questions to assess this standard include: Questions to assess this standard include:

– – 1) Are the characteristics of the original sample, settings, and 1) Are the characteristics of the original sample, settings, and methods described in enough detail to permit comparisons to methods described in enough detail to permit comparisons to

  • ther samples?
  • ther samples?

– – 2) Do the findings include enough thick description for readers 2) Do the findings include enough thick description for readers to to assess the potential transferability or application to their own assess the potential transferability or application to their own situations? situations? – – 3) Does a range of readers report the findings to be consistent 3) Does a range of readers report the findings to be consistent with their own experience? with their own experience? – – 4) Are the processes and outcomes generic enough to apply to 4) Are the processes and outcomes generic enough to apply to

  • ther settings?
  • ther settings?
slide-29
SLIDE 29

Critiquing Measures Critiquing Measures

What are the positive and negative What are the positive and negative aspects of these measures for the sample aspects of these measures for the sample program? program? What can you tell about the reliability and What can you tell about the reliability and validity of these instruments? validity of these instruments?

slide-30
SLIDE 30

Critiquing Measures Critiquing Measures

Program 1: Community Enhancement Program 1: Community Enhancement

Ages 4 Ages 4-

  • 13

13— —community, literacy and life skills community, literacy and life skills

Program 2: Seeds to Success Program 2: Seeds to Success

High school age High school age— —work force prep, life skills, work force prep, life skills, service learning service learning

Program 3: Parent Education Program 3: Parent Education

Adults Adults— —parenting and life skills parenting and life skills

slide-31
SLIDE 31

Electronic Survey Formats Electronic Survey Formats

Distributed as e Distributed as e-

  • mail messages

mail messages Posted as web forms online Posted as web forms online Distributed via publicly available Distributed via publicly available computers in high computers in high-

  • traffic areas such as

traffic areas such as libraries and shopping malls libraries and shopping malls Placed on a laptop and completed there Placed on a laptop and completed there rather than on paper rather than on paper

slide-32
SLIDE 32

Strengths of Electronic Surveys Strengths of Electronic Surveys

Cost savings Cost savings Ease of editing/analysis Ease of editing/analysis Faster transmission time Faster transmission time Easy use of pre Easy use of pre-

  • letters

letters Higher response rate Higher response rate More candid responses More candid responses Potentially quicker response time with Potentially quicker response time with wider magnitude coverage wider magnitude coverage

slide-33
SLIDE 33

Weaknesses of Electronic Surveys Weaknesses of Electronic Surveys

Sample demographic limitations Sample demographic limitations Lower levels of confidentiality Lower levels of confidentiality Layout and presentation issues Layout and presentation issues Computer equipment Computer equipment Additional orientation/instructions Additional orientation/instructions Potential technical problems with Potential technical problems with hardware and software hardware and software Response rate Response rate

slide-34
SLIDE 34

Dillman Dillman Method Method

Prenotice Prenotice – – 3 days before questionnaire is 3 days before questionnaire is sent sent Questionnaire Questionnaire Thank You/Reminder/Replacement Thank You/Reminder/Replacement Questionnaire Questionnaire – – 1 week after first 1 week after first questionnaire questionnaire Additional contacts do Additional contacts do not not significantly significantly increase response rate increase response rate

slide-35
SLIDE 35

A Few Online Survey Sites A Few Online Survey Sites

SurveyMonkey SurveyMonkey QuestionPro QuestionPro Zoomerang Zoomerang

slide-36
SLIDE 36

Factors to Consider Factors to Consider

When selecting an online survey source, When selecting an online survey source, consider: consider:

1.

  • 1. Number of responses

Number of responses

2.

  • 2. Number of surveys

Number of surveys

3.

  • 3. Number of pages and/or questions

Number of pages and/or questions

4.

  • 4. Question format/styles

Question format/styles

5.

  • 5. Ease of survey design

Ease of survey design

6.

  • 6. Customizability

Customizability

7.

  • 7. Download data for analysis in SPSS or Excel

Download data for analysis in SPSS or Excel

8.

  • 8. Reporting features

Reporting features

9.

  • 9. Price

Price

slide-37
SLIDE 37

References References

Barribeau Barribeau, P., Butler, B., , P., Butler, B., Corney Corney, J., , J., Doney Doney, M., , M., Gault Gault, J., Gordon, J., , J., Gordon, J., Fetzer Fetzer, R., Klein, A., Ackerson Rogers, C., Stein, I. F., Steiner, C., , R., Klein, A., Ackerson Rogers, C., Stein, I. F., Steiner, C., Urschel Urschel, , H., Waggoner, T., & H., Waggoner, T., & Palmquist Palmquist, M. (2005). , M. (2005). Survey research. Survey research. Writing@CSU Writing@CSU. . Colorado State University Department of English. Retrieved 4/28/ Colorado State University Department of English. Retrieved 4/28/08 from 08 from http:// http://writing.colostate.edu writing.colostate.edu/guides/research/survey/. /guides/research/survey/. Dillman Dillman, D. A. (2006). , D. A. (2006). Mail and internet surveys: The tailored design method Mail and internet surveys: The tailored design method 2007 update (2 2007 update (2nd

nd Ed.)

Ed.). Wiley Publications. . Wiley Publications. Fisher, S., Andersen, R., & Heath, A. (no date). Fisher, S., Andersen, R., & Heath, A. (no date). Survey research methods: Survey research methods: Reliability, validity and scale construction Reliability, validity and scale construction. Retrieved 4/28/08 from . Retrieved 4/28/08 from http://malroy.econ.ox.ac.uk/fisher/survey/week2.ppt http://malroy.econ.ox.ac.uk/fisher/survey/week2.ppt Guba Guba, E. G., & Lincoln, Y. S. (1985). , E. G., & Lincoln, Y. S. (1985). Fourth generation evaluation Fourth generation evaluation. Newbury . Newbury Park, CA: Sage Publications. Park, CA: Sage Publications. Howell, J., Miller, P., Howell, J., Miller, P., Hee Hee Park, H., Sattler, D., Park, H., Sattler, D., Schack Schack, T., , T., Spery Spery, E., , E., Widhalm Widhalm, S., & , S., & Palmquist Palmquist, M. (2005). , M. (2005). Reliability and validity. Reliability and validity. Writing@CSU Writing@CSU. . Colorado State University Department of English. Retrieved 4/28/ Colorado State University Department of English. Retrieved 4/28/08 from 08 from http:// http://writing.colostate.edu/guides/research/relval writing.colostate.edu/guides/research/relval/. /.

slide-38
SLIDE 38

References (contd.) References (contd.)

Kitchenham Kitchenham, B. & Lawrence, S. (2002). Principles of survey research: Part , B. & Lawrence, S. (2002). Principles of survey research: Part 4 4 – – questionnaire evaluation. questionnaire evaluation. ACM SIGSOFT Software Engineering Notes, ACM SIGSOFT Software Engineering Notes, 27 27(3), 20 (3), 20-

  • 23. Retrieved 4/28/08 from
  • 23. Retrieved 4/28/08 from

http://delivery.acm.org/10.1145/640000/638580/p20 http://delivery.acm.org/10.1145/640000/638580/p20-

  • kitchenham.pdf?key1=638580&key2=1905207021&coll=

kitchenham.pdf?key1=638580&key2=1905207021&coll=GUIDE&dl GUIDE&dl=GUIDE =GUIDE &CFID=22360817&CFTOKEN=37007042 &CFID=22360817&CFTOKEN=37007042 Miles, M. B., & Miles, M. B., & Huberman Huberman, A. M. (1994). , A. M. (1994). Qualitative data analysis: An Qualitative data analysis: An expanded sourcebook expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage Publications. (2nd ed.). Thousand Oaks, CA: Sage Publications. Russ Russ-

  • Eft

Eft, Darlene F. (1980). Validity and reliability in survey research , Darlene F. (1980). Validity and reliability in survey research (Technical report No. 15). Washington, DC: National Center for E (Technical report No. 15). Washington, DC: National Center for Education ducation

  • Statistics. Retrieved 4/28/08 from:
  • Statistics. Retrieved 4/28/08 from:

http://eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01 http://eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019 /0000019 b/80/34/a3/4a.pdf b/80/34/a3/4a.pdf Trochim Trochim, W. K. (2006). Research methods knowledge base. Retrieved , W. K. (2006). Research methods knowledge base. Retrieved 4/29/08 from http:// 4/29/08 from http://www.socialresearchmethods.net/kb/index.php www.socialresearchmethods.net/kb/index.php Walonick Walonick, D. (no date). Survival statistics: Validity and reliability. R , D. (no date). Survival statistics: Validity and reliability. Retrieved etrieved 4/28/08 from 4/28/08 from http://www.statpac.com/statistics http://www.statpac.com/statistics-

  • book/basics.htm#ValidityandReliability

book/basics.htm#ValidityandReliability

slide-39
SLIDE 39

Contact Information Contact Information

Dan McDonald Dan McDonald – – mcdonald@ag.arizona.edu mcdonald@ag.arizona.edu Donna Peterson Donna Peterson – – pdonna@ag.arizona.edu pdonna@ag.arizona.edu

slide-40
SLIDE 40

Thank You! Thank You!

Questions? Questions?