« PreviousHomeNext »
Construct validity refers to the degree to which inferences can legitimately be made from the operationalizations in your study to the theoretical constructs on which those operationalizations were based. Like external validity, construct validity is related to generalizing. But, where external validity involves generalizing from your study context to other people, places or times, construct validity involves generalizing from your program or measures to the concept of your program or measures. You might think of construct validity as a "labeling" issue. When you implement a program that you call a "Head Start" program, is your label an accurate one? When you measure what you term "self esteem" is that what you were really measuring?
I would like to tell two major stories here. The first is the more straightforward one. I'll discuss several ways of thinking about the idea of construct validity, several metaphors that might provide you with a foundation in the richness of this idea. Then, I'll discuss the major construct validity threats, the kinds of arguments your critics are likely to raise when you make a claim that your program or measure is valid. In most research methods texts, construct validity is presented in the section on measurement. And, it is typically presented as one of many different types of validity (e.g., face validity, predictive validity, concurrent validity) that you might want to be sure your measures have. I don't see it that way at all. I see construct validity as the overarching quality with all of the other measurement validity labels falling beneath it. And, I don't see construct validity as limited only to measurement. As I've already implied, I think it is as much a part of the independent variable -- the program or treatment -- as it is the dependent variable. So, I'll try to make some sense of the various measurement validity types and try to move you to think instead of the validity of any operationalization as falling within the general category of construct validity, with a variety of subcategories and subtypes.
The second story I want to tell is more historical in nature. During World War II, the U.S. government involved hundreds (and perhaps thousands) of psychologists and psychology graduate students in the development of a wide array of measures that were relevant to the war effort. They needed personality screening tests for prospective fighter pilots, personnel measures that would enable sensible assignment of people to job skills, psychophysical measures to test reaction times, and so on. After the war, these psychologists needed to find gainful employment outside of the military context, and it's not surprising that many of them moved into testing and measurement in a civilian context. During the early 1950s, the American Psychological Association began to become increasingly concerned with the quality or validity of all of the new measures that were being generated and decided to convene an effort to set standards for psychological measures. The first formal articulation of the idea of construct validity came from this effort and was couched under the somewhat grandiose idea of the nomological network. The nomological network provided a theoretical basis for the idea of construct validity, but it didn't provide practicing researchers with a way to actually establish whether their measures had construct validity. In 1959, an attempt was made to develop a method for assessing construct validity using what is called a multitrait-multimethod matrix, or MTMM for short. In order to argue that your measures had construct validity under the MTMM approach, you had to demonstrate that there was both convergent and discriminant validity in your measures. You demonstrated convergent validity when you showed that measures that are theoretically supposed to be highly interrelated are, in practice, highly interrelated. And, you showed discriminant validity when you demonstrated that measures that shouldn't be related to each other in fact were not. While the MTMM did provide a methodology for assessing construct validity, it was a difficult one to implement well, especially in applied social research contexts and, in fact, has seldom been formally attempted. When we examine carefully the thinking about construct validity that underlies both the nomological network and the MTMM, one of the key themes we can identify in both is the idea of "pattern." When we claim that our programs or measures have construct validity, we are essentially claiming that we as researchers understand how our constructs or theories of the programs and measures operate in theory and we claim that we can provide evidence that they behave in practice the way we think they should. The researcher essentially has a theory of how the programs and measures related to each other (and other theoretical terms), a theoretical pattern if you will. And, the researcher provides evidence through observation that the programs or measures actually behave that way in reality, an observed pattern. When we claim construct validity, we're essentially claiming that our observed pattern -- how things operate in reality -- corresponds with our theoretical pattern -- how we think the world works. I call this process pattern matching, and I believe that it is the heart of construct validity. It is clearly an underlying theme in both the nomological network and the MTMM ideas. And, I think that we can develop concrete and feasible methods that enable practicing researchers to assess pattern matches -- to assess the construct validity of their research. The section on pattern matching lays out my idea of how we might use this approach to assess construct validity.
« PreviousHomeNext »
Copyright ©2006, William M.K. Trochim, All Rights Reserved
Purchase a printed copy of the Research Methods Knowledge Base
Last Revised: 10/20/2006
Construct validity can be viewed as an overarching term to assess the validity of the measurement procedure (e.g., a questionnaire) that you use to measure a given construct (e.g., depression, commitment, trust, etc.). If you are unsure what we mean by terms such as constructs, variables, and conceptual and operational definitions, we would recommend that you first read the articles in the section on Constructs in quantitative research.
Construct validity is considered an overarching term to assess the measurement procedure used to measure a given construct because it incorporates a number of other forms of validity (i.e., content validity, convergent and divergent validity, and criterion validity) that help in the assessment of such construct validity (Messick, 1980). It is for this reason that construct validity is viewed as a process that you go through to assess the validity of a measurement procedure, whilst a number of other forms of validity are procedures (or tools) that you use to more practically assess whether the measurement procedure measures a given construct (Wainer & Braun, 1988). We explain this distinction, as well as the relationship between construct validity and other forms of validity, in the first section of this article: What is construct validity?
Overall, you should be aware that even if a measurement procedure is shown to have strong construct validity, this is something that develops gradually over time. You cannot say that a measurement procedure has permanently or absolutely established construct validity. Rather, this is an ideal. With each additional study that shows a measurement procedure to have strong construct validity, especially in a wide range of contexts/situations, the claim of strong construct validity becomes greater. In this article, we (a) explain what construct validity is, (b) discuss the various threats to construct validity that you may face; and (c) show you where you can found out more.
What is construct validity?
As briefly discussed above, construct validity can be viewed as an overarching term to assess the validity of the measurement procedure (e.g., a questionnaire) that you use to measure a given construct (e.g., depression, commitment, trust, etc.). This is because it incorporates a number of other forms of validity (i.e., content validity, convergent and divergent validity, and criterion validity) that help in the assessment of such construct validity (Messick, 1980). In this sense, construct validity is a process that you work through, involving a number of procedures (i.e., tests of validity, such as content validity, convergent validity, etc.) to assess the validity of the measurement procedure that you use in your dissertation to measure a given construct.
For example, let's imagine that we were interested in studying the construct, post-natal depression. In order to do this, new mothers taking part in the research were asked (a) to complete a 10-question survey (i.e., as a form of self-assessment) to assess various characteristics of post-natal depression, and (b) to be observed (i.e., participant observation) by trained psychiatric nurses, who used a scale to measure these different characteristics of post-natal depression. When assessing the construct validity of these two measurement procedures to measure the construct, post-natal depression, we would want to know:
Are the elements/questions used in the 10-question survey and the participant observation scale relevant and representative of the construct, post-natal depression, which they were supposed to be measuring? In terms of relevance, are the elements/questions appropriate considering the purpose of the study and the theory from which they are drawn? Furthermore, does the measurement procedure include all the necessary elements/questions? Is there an appropriate balance of elements, or are some over- or under-represented? This reflects the desire to assess the content validity of the measurement procedure [see the article: Content validity].
Do the 10 questions and participant observation scale only measure the construct we are interested in (i.e., post-natal depression), and not one or more additional constructs; perhaps constructs such as post-partum mood, stress or anxiety? After all, when assessing the construct validity of a measurement procedure, we should not only check that the contents (i.e., elements) are relevant and representative of the construct we are interested in, but also that the measurement procedure is not measuring something that is should not be measuring. When this happens, the results can be confounded, which threatens the internal validity and external validity of your study [see the articles: Internal validity and External validity]. This reflects the desire to assess the divergent validity of the measurement procedure [see the article: Convergent and divergent validity].
Since the study used two different measurement procedures, how confident can we be that both measurement procedures were measuring the same construct (i.e., post-natal depression)? If both measurement procedures were new (i.e., you created them for your dissertation), we would want to assess their convergent validity, but if one was new (e.g., the 10-question survey), but the other was well-established (e.g., the participant observation scale), we would assess their concurrent validity [see the articles: Convergent and divergent validity and Criterion validity: (concurrent and predictive validity)].
Do the scores from the two measurement procedures used make accurate predictions (i.e., both theoretically and logically) about the construct they represent (i.e., post-natal depression)? This reflects the desire to assess the predictive validity of the measurement procedure [see the article: Criterion validity: (concurrent and predictive validity)].
Ultimately, for construct validity to exist, there needs to be (a) a clear link between the construct you are interested in and the measures and interventions that are used to operationalize it (i.e., measure it), and (b) a clear distinction between different constructs (Cronbach and Meehl, 1955; Nunnally, 1978). This involves creating clear and precise conceptual and operational definitions of the constructs you are interested in [see the section on Constructs in quantitative research], as well as performing various tests of validity.
You will not be able to demonstrate construct validity in a single study, although it is good practice, and valued by dissertations supervisors, when you approach a study wanting to establish as much construct validity as possible. Clearly, there are some tests of validity that you will need to carry out during your study that will help to improve the construct validity of your measurement procedure (e.g., content validity). It may also be possible, and will certainly be desirable, to carry out other tests of validity that will give you more confidence that your measurement procedure is construct valid (e.g., convergent and divergent validity, and concurrent and predictive validity). However, one of the more difficult assessments of construct validity during a single study, which is extremely important, but less likely to be carried out, is the need to ensure that the scores that are attained from your measurement procedure for a given construct behave in a way that is consistent with that construct. For example, imagine that you are interested in the construct, post-natal depression, and want to create a single measurement procedure to measure post-natal depression. Imagine also that a number of studies have shown that another construct, financial stress, is strongly related to post-natal depression; that is, as financial stress increases, post-natal depression increases by a certain amount. Whilst your dissertation on post-natal depression may not have looked at financial stress at all, you need to show that the scores you obtained from your measurement procedure are consistent with the scores (i.e., behaviour) from the related construct (i.e., financial stress).
In the section that follows, we discuss potential threats to construct validity.