Desvousges, Mathews and Train (2015) [DMT] find that their contingent valuation method (CVM) survey does not pass the adding up test. In a previous post (Whitehead, Sept. 20, 2016) I examined the data used by Desvousges, Mathews and Train (2015) and find that "it is ... not clear if [the data] passes the most basic validity test in contingent valuation over 89% of the range of bids."
There are other things wrong with the paper. DMT take the opportunity to make another comment on Haab et al. (2013) in the context of their empirical study. Here is the introduction to their section III titled "potential difficulties in implementing the adding-up test":
There are several potential difficulties that must be addressed in implementing an adding-up test. Haab et al. (2013) describe these issues and seem to suggest that the potential problems are so great that they outweigh the potential benefits of the test. We believe that these issues need to be considered on a case-by-case basis. In the paragraphs below, we describe these potential difficulties and how they are addressed in our application.
There are four issues: cognitive burden, income effects, provision mechanism and cost. Here is what DMT says about cost:
The adding-up test is usually more expensive to apply than a scope test because it requires at least one more subsample. Fielding the survey is only one element of the overall cost of a project, and so a study with, for example, three subsamples is not 50% more expensive than a study with two subsamples. In our application, the cost of fielding one additional subsample increased the overall cost by less than 5%. ...
A focus on the percentage increase in the cost of the study obscures the magnitude of the additional cost of the study. The total cost is the sum of fixed costs and variable costs. A capital intensive project will make variable costs a small percentage. Variable costs on a CVM study include mostly labor and survey costs. Large labor costs will make the survey costs a small percentage. Any increase in the survey costs will be small when expressed as a percentage.
An inexpensive online panel (i.e., not GfK Knowledge Panel) like that used by DMT will cost about $5 per completed response (I've purchased samples anywhere from $3-$8 per complete). An additional sample the size of the average of DMT's versions one, three and four will cost about $1000 (where each is about n = 200). This is not a lot of money for a survey. But the money cost of an additional adding up treatment could be high, especially if the surveys are fielded on a high quality survey sample as in Chapman et al. (2009). An additional sample of n = 200 that cost $50 per unit is $10,000 (and n=200 is likely too small).
These are accounting costs. The true costs are the opportunity costs. Suppose you have a fixed survey budget and choose to have an additional subsample to conduct an adding up test. If your survey budget allows for n = 1000 the split-sample external scope test has n = 500 in the base and scope treatments (surveys 1 and 2). An additional subsample reduces the subsample sizes to n = 333 (surveys 1, 2 and 3) and reduces the power of the statistical tests. DMT (2015) fielded three additional samples with n = 293, 164, 175. The latter two samples are of questionable validity (i.e., demand doesn't slope down over 89% of the price range) so low statistical power could be a problem. In this study, the opportunity cost of the additional treatments is additional sample size in the other treatments.
There is another opportunity cost with an adding up subsample. The cost is the value of the other statistical tests that could be conducted with the additional sample instead of an adding up test. In Whitehead (2016) I argue that it is more important to identify another point on the total benefit curve (i.e., conduct another scope test beyond the typical base and scope treatments) to identify the curvature of the total benefit curve. This is mostly true in the context of benefit-cost analysis where maximizing net benefits of a policy is important. Knowledge of the shape of the total benefit curve can get policy makers a better handle on the optimal amount of pollution or other quantity.
However, this all changes in the context of natural resource damage assessment. If you are working for a plaintiff (e.g., the State of Alaska) the goal of the study is to demonstrate validity of the method and estimate defensible damages [1]. Pursuing this goal is relatively expensive due to the probability-based surveys that must be employed. When you are working for the defendent (e.g., Exxon) the goal is to demonstrate that a method is not valid and estimate damages that are lower than the plaintiff's damages. Pursuing this goal is relatively inexpensive when telephone, mall-intercept and non-probability online survey modes used and small samples of data are collected. These samples are of lower quality than probability based samples and will tend to yield lower quality results [2].
Finally, the last sentence of DMT's cost paragraph is misleading:
... Given that the adding-up test potentially addresses the expert panel’s concern about adequate response while the scope test does not, and that the burden of meeting the panel’s concern “must rest” with the researcher, the extra cost seems justified, at least in some studies.
The NOAA panel was more concerned about the magnitude of the differences in willingness to pay across scope treatments, not the adding up test that was developed by researchers for the defendent in the Exxon Valdez oil spill. Scope elasticity is a more useful approach to assessing plausibility, which was the intent of the panel, not adequacy (see Whitehead 2016). So, this justification for the extra cost of the adding up test is an assertion developed by researchers searching for a "critical test" to discredit the CVM.
In my opinion, the net benefits of conducting the adding test are low relative to all of the other useful things that can be done to improve the CVM with the additional money. Number one on that list is an additional scope treatment to establish the curvature of the total benefit curve.
That said, I don't want to discourage researchers who are curious about the adding up test. I just don't think its conduct should be required like price and quantity tests (i.e., bid curve and scope) have become for the CVM.
Note:
[1] Tim and I worked for the State of Florida in the BP/Deepwater Horizon case (Larkin 2016). We made a number of decisions focused on developing a conservative willingness to pay estimate.
[2] A non-probability sample can serve many useful research purposes. Inexpensive non-probability samples are useful for exploratory research when budgets are limited. However, researchers should be cautious. Negative or non-results may be more likely with non-representative samples populated by low-paid survey subjects.
References
Desvousges, William, Kristy Mathews, and Kenneth Train. “An Adding Up Test on Contingent Valuations of River and Lake Quality.” Land Economics 91(2015): 556-571. http://le.uwpress.org/content/91/3/556.refs
Kennedy, Courtney, Andrew Mercer, Scott Keeter, Nick Hatley, Kyley McGeeney, and Alejandra Gimenez, Evaluating Online Nonprobability Surveys, Pew Research Center, May 2, 2016. http://www.pewresearch.org/2016/05/02/evaluating-online-nonprobability-surveys/
Larkin, Sherry, "The Deepwater Horizon Oil Spill," AERE Newsletter 36(1): 24-28, May 2016. http://www.aere.org/newsletters/
Whitehead, John C. "Plausible responsiveness to scope in contingent valuation." Ecological Economics 128 (2016); 17-22. http://www.sciencedirect.com/science/article/pii/S0921800916302890
Whitehead, John C., "A comment on '“An Adding Up Test on Contingent Valuations of River and Lake Quality,'” The Environmental Economics Blog, September 20, 2016. http://www.env-econ.net/2016/09/a-comment-on-an-adding-up-test-on-contingent-valuations-of-river-and-lake-quality.html
Yeager, David S., Jon A. Krosnick, LinChiat Chang, Harold S. Javitz, Matthew S. Levendusky, Alberto Simpser, and Rui Wang. "Comparing the accuracy of RDD telephone surveys and internet surveys conducted with probability and non-probability samples." Public Opinion Quarterly (2011). http://poq.oxfordjournals.org/content/early/2011/10/05/poq.nfr020.short