When dichotomous choice CVM data is of low quality, the measure of central tendancy is sensitive to assumptions. As I showed in a paper presented earlier this year (Landry and Whitehed 2020), with the highest quality data it makes no difference the WTP estimator that is used. The Turnbull, Kristrom, linear logit (under both zero WTP assumptions), and linear probability models all produce the same estimate.

As data quality falls, however, the choice of WTP estimate can matter a great deal. In this situation, so as to avoid sponsor and other biases, it is important for the CVM researcher to present the full range of WTP estimates and avoid the impression that results have been cherry picked. This range of WTP provides a more complete depiction of analyst uncertainty and allows for sensitivity and other analyses.

I have grown accustomed to intense suspicion whenever I see hypothesis tests conducted with only the Turnbull WTP estimate. First, it is a lower bound WTP estimate and potential differences are minimized. Second, its standard errrors are smaller (relative to the mean) than parametric WTP estimates. This second observation is due to the way that the standard errors are calculated and to the fact that the data are smoothed when there are non-monotonicities. As Haab and McConnell (1997, p 253) explained (emphasis added): "We demonstrate that the Turnbull model ... provides a straightforward alternative to parametric models, **so long as one simply wants to estimate mean willingness to pay**." When hypothesis tests are being conducted, a range of WTP estimates should be used to determine if the results are robust to estimation method.

So, is it reasonable to include the linear-in-bid parametric model in this collection of WTP estimates? Hanemann (1984, 1989) showed that in a linear utility model, U = a(Q) + bY where Q is a good and Y is income, the mean (and median) willingness to pay is WTP = -a*/b, where a* is the change in utility from changes in Q and b is the marginal utility of income. One benefit of this estimate is that it is insensitive to fat tails. However, this estimate allows for negative WTP values unless the probability of a yes response to a dichotomous choice question is 100% when the bid amount is zero. Negative WTP values can enter into the analysis in two ways. First, the WTP estimate itself can be negative. This will occur when the probability of a yes response at the lowest bid amount is less than 50% (it is this possibility that, I think, motivated Haab and McConnell). The second possibility is that the empirical distribution of WTP can include negative values. This is of little consequence to the analysis unless the confidence interval includes zero. Both circumstances arise with the DMT (2015) data.

DMT (2020) dismiss outright the possibility of negative WTP. Their dismissal is consistent with Haab and McConnell's argument that since public goods are freely disposable, negative WTP is only an empirical artifact of a distributional assumption. But, with government policy free disposal is not always possible. In the case of a clean up of natural resource damages, the clean up could be considered a wasteful intrusion into a private business decision. Bohara, Kerkvliet and Berrens (2001) discuss how and why negative WTP values might arise, along with empirical examples. Considering this, I would not be surprised if some of the respondents to CVM scenarios demanded compensation for environmental cleanup.

There have been a number of suggestions about how to handle negative willingness to pay. Many of these involve obtaining more data with follow-up questions (Landry and Whitehead). Unfortunately, the DMT (2015) survey data does not have any of this supplemental information. In that case, in my opinion, an assumption that negative WTP is a possibility can not be ruled out. Inclusion of the linear model allowing for negative WTP, as long as it is presented along with other estimates, should not be dismissed outright.

DMT (2020) state: "This means that adding-up passed in his calculations on linear models not because of the data but because of his implausible additional assumption that many people have a negative WTP for the environmental programs." It is not true that the linear model finds that "many" people have negative willingness to pay in each of the scenarios. According to the Krinsky-Robb WTP simulation, the percentage of negative WTP values for the whole, first, second, third, and fourth scenarios in the DMT (2020) data are 2%, 0.01%, 77%, 25% and 0.83%. The WTP from the second scenario is negative (situation 1 above). The WTP from the third scenario has a Delta method confidence interval that includes zero (situation 2 above).

If the negative mean WTP from the second scenario is set equal to zero then the difference in WTP for the whole and the sum of its parts is statistically significant at the p=0.088 level with the Delta Method confidence intervals. The Krinsky-Robb confidence interval is [68, 788] which includes the sum of the WTP for the parts with WTP from the second scenario set equal to zero ($467) indicating that the adding-up test is supported. It is still my contention that the adding-up passed in the (untruncated) linear model not because of the data.

My conclusion is that the negative WTP values do not have an important effect on the adding-up tests. Dismissing these tests because negative WTP values are implausible ignores the literature and the empirical evidence.

References

Bohara, Alok K., Joe Kerkvliet, and Robert P. Berrens. "Addressing negative willingness to pay in dichotomous choice contingent valuation." Environmental and Resource Economics 20, no. 3 (2001): 173-195.

Landry, Craig, and John Whitehead, "Estimating Willingness to Pay with Referendum Follow-up Multiple-Bounded Payment Cards," paper presented at the 2020 W-4133, Athens, GA, February.

Haab, Timothy C., and Kenneth E. McConnell. "Referendum models and negative willingness to pay: alternative solutions." Journal of Environmental Economics and Management 32, no. 2 (1997): 251-270.

Hanemann, W. Michael. "Welfare evaluations in contingent valuation experiments with discrete responses." American journal of agricultural economics 66, no. 3 (1984): 332-341.

Hanemann, W. Michael. "Welfare evaluations in contingent valuation experiments with discrete response data: reply." American journal of agricultural economics 71, no. 4 (1989): 1057-1061.

When dichotomous choice CVM data has a negative WTP problem, one of the standard corrections is to estimate a log-linear model and present the median WTP. With many estimated log-linear models the mean WTP is undefined. This is because the log-linear model flattens the estimated survival curve and, in contrast to a linear model, the probability of a no response does not approach zero at any reasonable bid amount. The median WTP estimate from the log-linear model tends to be a useful supplement to the welfare measures available from the linear model.

Desvousges, Mathews and Train (2020) rightly argue that (1) the sum of medians is not equal to the median of the sums and (2) the mean WTP estimates for each of their five scenarios are basically infinity (or in the millions of dollars when the data are estimated with a log-linear probit). Given this empirical fact, there seems to be no median estimate available for the sum of the four individual WTP amounts (that could be compared to the median of the whole). However, DMT (2020) are able to estimate the "median of the sum of WTPs through simulation" to be $4904. They then explain that since $4904 is 24 times that of the median of the whole scenario, $201, is "clearly a violation of adding up". Given that the confidence interval from the Delta Method, [-47, 449], does not include $4904 we could reject the notion that the willingness to pay estimates pass this version of the adding-up test.

Previously however, I've argued that these standard errors are likely the wrong ones to use since the cost parameter is measured without much precision. In this case the Krinsky-Robb confidence intervals are more appropriate. The Krinsky-Robb confidence interval for the median WTP estimate for the whole scenario is [54, 9558]. Since the median of the sum of the WTP estimates from the four adding-up scenarios, estimated by DMT (2020) to be $4904, lies within the 95% Krinsky-Robb confidence interval then we fail to reject the adding-up hypothesis at the 95% confidence level. In contrast, the 95% Krinsky-Robb confidence interval estimated with the whole scenario data from Chapman et al. (2009), which I've argued is higher quality data, is relatively tight around the median of $167: [134, 217].

The only other adding-up test that can be conducted with median WTP estimates is to compare the median for the whole with the sum of the medians for the four parts. In this test I found that the median WTP estimates passed the adding up test using the confidence intervals from the Delta Method (Whitehead 2020). It still seems to me that this test is a useful supplement when one is inclined to consider the robustness of the adding-up test conducted with only the Turnbull estimator (as in DMT 2015). Otherwise, we're treating mean WTP estimates in the millions as if they are meaningful.

To answer the question in the title of this post, the log-linear model is not meaningless. In fact, it is a better model statistically than the linear model for the first and third WTP scenarios in DMT (2015). Information gleaned from the log-linear model provides insights into the quality of the DMT (2015) data.

While I argue with DMT over the minutiae of these tests and different estimators, the reader shouldn't lose site of how silly the debate over DMT (2015) has become. The bottom line is that the DMT (2015) data is of low quality data and do not rise to the threshold that is needed to support an adding-up test, which requires estimates of willingness to pay as ratios of coefficients.

In Whitehead (2020) I describe the problems in the DMT (2015) data. It is full of non-monotonicities, flat portions of bid curves and fat tails. A non-monotonicity is when the percentage of respondents in favor of a policy increases when the cost increases. In other words, for a pair of cost amounts it appears that respondents are irrational when responding to the survey. This problem could be due to a number of things besides irrationality. First, respondents may not be paying close attention to the cost amounts. Second, the sample sizes may be simply too small to detect a difference in the correct direction. Whatever the cause, non-monotonicities increase the standard errors of the slope coefficient in a parametric model.

Flat portions of the bid curve exist when the bid curve may be downward sloping but the slope is not statistically different from zero. This could be caused by small differences in cost amounts and/or it is due to sample sizes that are too small to detect a statistically significant difference. For example, there may be little perceived difference between a cost amount of $5 and $10 compared to $5 and $50. And, even if the percentage of responses in favor of a policy is economically different between two cost amounts, this difference may not be statistically different due to small sample sizes.

Fat tails may exist when the percentage of respondents who are in favor of a policy is high at the highest cost amount. However, this is only a necessary condition. A sufficient condition for a fat tail is when the percentage of respondents who are in favor of a policy is high at two or more of the highest cost amounts. In this case, the fat tail will cause a parametric model to predict a very high cost amount that drives the probability that respondents are in favor of a policy to (near) zero. A fat tail will bias a willingness to pay estimate upwards because much of the WTP estimate is derived from the portion of the bid curve when the cost amount is higher than the cost amount in the survey.

DMT (2020) state that these problems also occur in Chapman et al. (2009) and a number of my own CVM data sets. They are correct. But, DMT (2020) are confusing the existence of the problem, in the case of non-monotonicity and flat portions, with the magnitude of the problem. And, they are assuming that if the necessary condition for fat tails exists then the sufficient condition also exists. Many, if not most, CVM data sets will exhibit non-monotonicities and flat portions of the bid curve. But, these issues are not necessarily an empirical problem. The extent of the three problems in DMT (2015) is severe -- so severe that it makes their attempt to conduct an adding up test (or any test) near impossible.

To prove this to myself I estimated the logit model, WTP value and 95% Krinsky-Robb confidence intervals for 20 data sets. Five of the data sets are from DMT (2015), 2 are from Chapman et al. (2009) and 13 are from some of my papers published between 1992 and 2009 (DMT (2020) mention 15 data sets but two of the studies use the same data as in another paper). The average sample size for these 20 data sets is 336 and the average number of cost amounts is 5.45. The average sample size per cost amount is 64, which is typically sufficient to avoid data quality problems (a good rule of thumb is that the number of data points for each cost amount should be n > 40 in the most poorly funded study).

These averages obscure differences across study authors. The average sample size for the DMT (2015) data sets is 196. With 6 cost amounts the average sample size per cost amount is 33. The Chapman et al. (2009) study is the best funded and the two sample sizes are 1093 and 544. With 6 cost amounts the sample sizes per cost amount are 182 and 91. The Whitehead studies have an average sample size of 317 and with an average of 5 cost amounts, the sample size per cost amount is 65 (the variance of these means are large). Already, differences across these three groups of studies emerge.

There are a number of dimensions over which to compare the logit models in these studies. My preferred measure is the ratio of the upper limit of the 95% Krinsky-Robb confidence interval for WTP to the median WTP estimate. This ratio will be larger the more extensive is the three empirical problems mentioned above. As this problem worsens, hypothesis testing with the WTP estimates (again, a function of the the ratio of coefficients) becomes less feasible. It is very difficult to find differences in WTP estimates when the confidence intervals are very wide. To suggest that this measure has some validity, the correlation between the ratio and the p-value on the slope coefficient is r = 0.96.

The results of this analysis are shown below. The ratio of the upper limit of the confidence interval to the median is sorted from lowest to highest. The DMT (2015) values are displayed as orange squares, the Chapman et al. (2009) values are displayed as green diamonds and the Whitehead results are displayed as blue circles and one blue triangle. The blue triangle is literally "off the chart" so I have divided the ratio by 2. This observation, one of three data sets from Whitehead and Cherry (2007), does not have a statistically significant slope coefficient.

Considering the DMT data, observation 19, with a ratio of 4.82 (i.e., the upper limit of the K-R confidence interval is about 5 times greater than the median WTP estimate), is the worst data set. Observation 8, the best DMT data set, has their largest sample size of n=293. The Chapman et al. (2009) data sets are two of the three best in terms of quality. The Whitehead data sets range from good to bad in terms of quality. Overall, four of the five DMT data sets are in the lower quality half of the sample (defined by DMT*).

Of course, data quality should also be assessed by the purpose of the study. About half of the Whitehead studies received external funding. The primary purpose of these studies was to develop a benefit estimate. The other studies were funded internally with a primary purpose of testing low stakes hypotheses. In hindsight, these internally funded studies were poorly designed with sample sizes per bid amount too small and/or poorly chosen bid amounts. With the mail surveys the number of bid amounts was chosen with optimistic response rates in mind. With the A-P sounds study a bid amount lower than $100 should have been included. Many of the bid amounts are too close together to obtain much useful information.

In contrast, considering the history of the CVM debate and the study's funding source (Maas and Svorenčík 2017), the likely primary purpose of the DMT (2015) study is to discredit the contingent valuation method in the context of natural resource damage assessment. In that context, the study is very high stakes and, therefore, its problems should receive considerable attention. The DMT (2015) study suffers from some of the same problems that my older data suffers from. The primary problem with the DMT (2015) study is that the sample sizes are too low. It is not clear why the authors chose to pursue valuation of 5 samples instead of 3 to conduct their adding up test (DMT (2012) describe a 3 sample adding up test with the Chapman et al. (2009) study). Three samples may have generated confidence intervals tight enough to conduct a credible test.

In the title of this post I ask "Are the DMT data problems typical in other CVM studies?" This subtitle should really be 'Are the DMT data problems typical of Whitehead's CVM data problems in a different era? The survey mode for my older studies was either mail of telephone. Both survey modes were common back in the old days but they have lost favor relative to internet surveys. The reasons are numerous but one is that internet surveys are much more cost-effective and the uncertainty about a response rate is non-existent. Another reason is that internet survey programming is much more effective (with visual aids, piping, ease of randomization, etc). Many of the problems with my old data was due to small sample sizes. This was a result of either poor study design (in hindsight, many CVM studies with small samples should have reduced their bid amounts) or unexpectedly low mail response rates.

It is not clear why DMT (2020) chose to compare their data problems to those that I experienced 15-30 years ago. Unless, in a fit of pique at my comment on their paper, they decided it would be a good idea to accuse me of hypocrisy. I've convinced myself that my data compares favorable to the DMT (2015) data. Especially considering the goals of the research. My goals were more modest than testing whether the CVM method passes an adding-up test for which a WTP estimate (the ratio of two regression coefficients) is required (as opposed to considering the sign, sign and significance of a regression coefficient).

*****

*Note that there are more Whitehead data sets than are called out by DMT. I haven't had time to include all of these into this analysis. But, my guess is that the resulting circles would be no worse than those displayed in the picture below.

Reference

Maas, Harro, and Andrej Svorenčík. "“Fraught with controversy”: organizing expertise against contingent valuation." *History of Political Economy* 49, no. 2 (2017): 315-345.

As described in the introduction of my (draft) "Reply to 'Reply to Whitehead'", I suspect that I have used the incorrect confidence intervals when analyzing the Desvousges, Mathews and Train (2015) data. Park, Loomis and Creel (1991) introduced the Krinsky-Robb approach for estimating confidence intervals for willingness to pay estimates from dichotomous choice contingent valuation models. Cameron (1991) introduced the Delta Method approch. As indicated by their Google Scholar citations, 461 and 229 respectively, they have both been used extensively in the applied CVM literature. Hole (2007) compares the two approaches (along with Fieller and bootstrap approaches) and finds little difference in the approaches for well-behaved simulated data. However, Hole (2007) points out that the Delta Method requires that the willingness to pay be normally distributed for the confidence interval to be accurate. He states that "... it is likely that WTP is approximately normally distributed when the model is estimated using a large sample and the estimate of the coefficient for the cost attribute is sufficiently precise." (p. 830) In Whitehead (2020) I used the Delta Method confidence intervals in my statistical tests. This is very likely an inappropriate approach due to the imprecision of the estimate of the parameter on the cost amount.

When working on Whitehead (2020) I used NLogit (www.limdep.com) software to estimate the confidence intervals. NLogit allows for both the Delta Method and Krinsky-Robb approaches to be used. But the Krinsky-Robb confidence intervals may require the assumption of normality. Hole (2007): "The [Krinsky-Robb] confidence interval could also be derived by using the draws to calculate the variance of WTP ..., but this approach, like the delta method confidence interval, hinges on the assumption that WTP is symmetrically distributed." (p. 831) Almost all of the Krinsky-Robb confidence intervals estimated by NLogit "blew up" when using the DMT (2015) data, in other words the upper and lower limits were in the 10s and 100s of thousands (positive and negative). This made little sense to me at the time but now my guess is that when the WTP normality assumption is violated the NLogit software can not handle the estimation. Typically, Delta Method and Krinsky-Robb confidence intervals are not very different when estimated in NLogit (as shown below).

Following my reading of Desvousges, Mathews and Train (forthcoming) I thought through the above (obviously, I should have thought through it before) and estimated WTP using with the Krinsky-Robb intervals in SAS (my program is available upon request). My Krinsky-Robb intervals are akin to what Hole (2007) calls the Monte Carlo percentile approach. I take one million draws from the variance-covariance matrix and trip the α/2 highest and lowest WTP values, where α=0.05. Hole's (2007) Krinsky-Robb intervals are based on a resampling approach, but he finds little difference in the resampling and Monte Carlo Krinsky-Robb intervals.

For this analysis I am only using the whole scenario from DMT (2015) since this is sufficient to show that WTP for the whole can not be statistically distinguished from WTP for the sum of the parts with the Krinsky-Robb Monte Carlo percentile intervals. The logit models are presented below for the full sample (n=172), the sample with observations with missing demographics deleted (n=163) and the Chapman et al. (2009) data. In each model the constant and the coefficient on the cost amount are statistically different from zero. But, the precision of the cost coefficients with the DMT (2015) data are low relative to other CVM studies. Combined with small samples, the Desvousges, Mathews and Train WTP estimate may not be normally distributed. The Chapman et al. (2009) study, on the other hand, has a large sample size and a precisely estimated coefficient on the cost amount.

The WTP estimates (restricting WTP to be positive) and confidence intervals are presented below. The Delta Method confidence intervals are estimated in NLogit and the Krinsky-Robb percentile intervals are estimated in SAS. The appropriateness of the Delta Method with the DMT (2015) data is questionnable. First, the Krinsky-Robb lower bound on the DMT (2015) full sample (n=172) estimate is less than 50% of the Krinsky-Robb lower bound. Second, the Krinsky-Robb upper bound is 269% larger than the Delta Method upper bound. The imprecision of the coefficient on the cost amount is driving the asymmetry. The cost estimate in the less than full sample (n=163) is estimated even more imprecisely. The Krinsky-Robb confidence interval includes zero.

These results should be considered in contrast to the WTP estimate from the Chapman et al. (2009) data. The Delta Method and Krinsky-Robb intervals are very close. The symmetric Krinsky-Robb confidence interval estimated in NLogit is [236.90, 320.37] which is also very close to the Delta Method. One benchmark for symmetric confidence intervals in CVM studies, therefore, is a sample size greater than 1000 and a t-statistic on the coefficient for the cost coefficient of -9.5. Of course, sensitivity around these benchmarks should be assessed since not many CVM studies have these characteristics. (note: I'll do some of this sort of work when I go back to my past and analyze some of the CVM data from the old days that Desvousages, Mathews and Train (forthcoming) assert is just as bad as their own data.)

The point estimate of the sum of the WTP parts for the full sample is $1114.36. The WTP for the sum of the parts is within the Krinsky-Robb interval for the whole suggesting that we can not reject the hypothesis that WTP for the whole is equal to WTP for the sum of the parts at the p<0.05 level. The 90% interval is [264.84, 1314.14] which indicates that any statistical equality is at a confidence level below p=0.10. The point estimate of the sum of the WTP parts for the trimmed sample is $1079.73. Again, the WTP for the sum of the parts is within the Krinsky-Robb 95% interval for the whole. Note that the WTP for the whole is not different from zero with this sample so any statistical inference makes less sense than if WTP was different from zero. These results are consistent with my (erroneous) conclusion (Whitehead 2020) that the data in Desvousges, Mathews and Train (2015) are not sufficient to conclude that contingent valuation does not pass the adding up test.

References

Cameron, Trudy Ann. "Interval estimates of non-market resource values from referendum contingent valuation surveys." Land Economics 67, no. 4 (1991): 413-421.

Hole, Arne Risa. "A comparison of approaches to estimating confidence intervals for willingness to pay measures." *Health economics* 16, no. 8 (2007): 827-840.

Park, Timothy, John B. Loomis, and Michael Creel. "Confidence intervals for evaluating benefits estimates from dichotomous choice contingent valuation studies." Land economics 67, no. 1 (1991): 64-73.

Desvousges, Mathews and Train (Land Economics, 2015) use the contingent valuation method (CVM) to conduct an adding-up test (i.e., does WTP_{A} + WTP_{B} = WTP_{A+B}?). They use the nonparametric Turnbull estimator and find that the data do not pass the adding-up test. This suggests that the CVM lacks internal validity.

In September 2016 I began writing a comment on this paper by first posting a series of blog posts questioning the validity of the underlying data and their implementation of the survey. The comment went through several rounds of review, was submitted, reviewed, revised and rejected at Land Econ (due to concerns about the DMT reply), submitted, reviewed, revised and then withdrawn from Economics E-Journal, and submitted, reviewed and accepted for publication at Ecological Economics. The comment goes further than the blog posts by showing that the adding-up test, though flawed in implementation and another hypothesis test is more appropriate, is actually supported in some tests using parametric WTP estimators.

Desvousges, Mathews and Train (Ecological Economics, forthcoming) have now replied to my comment by describing 12 mistakes (12!) that I made. I agree that I made one of the mistakes on their list. I conducted an adding-up test by examining whether the confidence intervals for two willingness to pay estimates (the whole vs the sum of the parts) overlap. It is well-known that confidence intervals can overlap and yet the t-statistic for the test will indicate that the difference in means is statistically different. The mistake that I made was not checking the t-statistic. This is an embarrassing mistake. The worst part is that I teach this to undergraduates in the business statistics course. I tell them not to make this mistake and I've made it in a published journal article. I'm very embarrassed.

There are a variety of reasons, though not excuses, for this mistake which I will describe in another blog post. But today, let me point out another mistake that I made that concerns me almost as much as the t-statistic: *I used the wrong confidence intervals*. In Whitehead (2020) I used the confidence intervals from the Delta Method (a first-order Taylor Series expansion from the variance-covariance matrix) which are symmetric. It is well-known that the distribution of a ratio of parameters (such as WTP) is not necessarily symmetric. The asymmetry gets more severe when the parameter in the denominator is imprecisely estimated as in Desvousges, Mathews and Train (Land Economics, 2015). Another approach that is common is the Krinsky-Robb (KR) confidence intervals. These are based on a simulation from the variance-covariance matrix of the estimated parameters. In a forthcoming blog post I'll show that the KR confidence intervals are very wide. So wide that the WTP for the sum of the parts lies within the confidence interval for the WTP for the whole, supporting the conclusions of Whitehead (2020). I'm embarrassed that I made this mistake too.

My biggest concern, other than my big mistake (and the Delta Method confidence interval mistake) with the Desvousges, Mathews and Train "Reply to Whitehead" is that they do not take the problems with their own research very seriously. In contrast, when I've had papers that have received comments I've tried to learn from the comment and then tried to fix my paper (e.g., see Whitehead, Land Econ, 2004). Desvousges, Mathews and Train instead adopt the strategy that the best defense is a good offense. Their attitude seems to be that their data is no worse than any other CVM data set (in particular, they point to my own data from 15-30 years ago in footnote 3). I don't believe that this approach is the best way to advance economic science.

My comment on Desvousges, Mathews and Train (Land Economics, 2015) addresses three main issues: (1) the data are flawed/low quality, (2) implementation of the adding-up test in the survey is flawed and (3) additional statistical tests for adding-up do not support the DMT (2015) results. None of these issues are refuted by Desvousges, Mathews and Train (forthcoming). Instead, each of these issues has been confused by the Desvousges, Mathews and Train "Reply to Whitehead".

First, I provide a correction to my mistaken describe above. Second, here is a response to my 12 "mistakes":

(1) The log-linear models are not meaningless as claimed by DTM. The log-linear model and median WTP is a simple way of addressing negative WTP. The fact that the mean WTP from these models is infinite is not a functional form problem, it is a data problem. The median of the sum of WTP estimates provided by DMT (2020) lies within the 95% Krinsky-Robb confidence interval for the median WTP of the whole scenario. [more here]

(2) The linear-in-bid model that allows negative willingness to pay is not inappropriate. Negative WTP can arise from this functional form if the percentage of yes responses is less than 50% at the lowest bid or if the WTP estimate is statistically imprecise. The point estimate of mean WTP for this model provides positive WTP estimates in four out of the five scenarios. The negative WTP estimate is from the troublesome second scenario data (see (4)). The third scenario generates negative WTP values from the statistical distribution. Accounting for these in a statistical adding-up test supports my result that the sum of the WTP parts can not be statistically distinguished from the whole scenario. [more here]

(3) Following the approach taken in the correction, the adding-up test passes when respondents with missing demographics are dropped when the more appropriate confidence intervals (KR) are used. [more here]

(4) The weighted data does not support the results in Desvousges, Mathews and Train (Land Economics, 2015) as DMT claim. The weighted data with the whole and second scenarios are "roller coaster" and "Nike swoosh+ shaped instead of downward sloping as required by theory. This suggests that the weighted data reveals some irrationality amongst respondents. DMT's approach is to impose respondent rationality across the scenarios. They constrain the cost coefficients to be equal across scenarios in order to impose a downward sloping cost effect. This is inappropriate when it is done to hide statistically insignificant (roller coaster) and wrong-signed (Nike swoosh) slope coefficient. [more here]

(5) DMT notice that I conducted an adding-up test with the Kristrom nonparametric estimator in a 2016 blog post (here). They claim that I "inadvertently dropped observations" when conducting these calculations. Dropping these observations was not "inadvertant." In the blog post at issue I used a sample size of n=950 which is the same sample size that DMT (2015) used in their Table 5 (dropping observations with a missing age variable).

DMT (2020) report that the adding-up test fails with the Kristrom estimator and I "failed to report relevant findings" because I did not include this in Whitehead (2020). This begs the question: how many additional tests should be conducted in a comment on a paper? In Whitehead (2020) I provided three parametric tests using some the standard models in the literature. I then consider the robustness of these tests with (a) weighted data and (b) the complete case data set (n=934 after dropping those with missing age and income).

(6) Claims that the Chapman et al. (2009) data and a number of my own data sets (circa 1992 - 2011) are of the same low quality as the Desvousges, Mathews and Train (Land Economics, 2015) data are overstated. I showed in an Appendix in Whitehead (2020) that the Chapman et al. (2009) are far superior in quality to the Desvousges, Mathews and Train (Land Economics, 2015) data. Using the length of the upper tail as a measure of quality, I find that my own data mostly ranges between the Chapman and DMT data (one of my data sets is a literal "off the chart" low quality outlier). Quality is an increasing function of sample size. [more here]

(7) Desvousges, Mathews and Train (2015) have not provided their internet survey for review. I asked twice. The first time Bill Desvousges had his assistant send me the Chapman et al. (2009) report containing their in-person surveys. The second time I asked he notified the Economics: E-Journal editor about my request. The editor told me that he thought I had everything I needed to write my replication paper and not to email Bill Desvousges again (I won't). Claims that their survey conveys information about substitution effects to survey respondents are simply assertions. It would be forthright to provide the survey for review.

(8) DMT (2020) are correct by pointing out that "implicit claim" may be poor word choice. In DMT (2015) they have an empirical finding that there are no income effects. But, in mistake (9) they acknowledge that there is a statistically significant income coefficient when they use the weighted data. They have not explained why they chose to impose this "external" income constraint instead of incorporating income effects "internally" in the survey scenarios. Internal/external may be better word choice than implicit/explicit.

(9) My statistically significant income coefficient was found using the models with weighted data. Desvousges, Mathews and Train (2020) state that they re-ran their simulations with the weighted income coefficient and found similar results. But, if they re-ran their simulations with the weighted income coefficient they should have done the test with the weighted WTP models, which lack validity (see (4) above). The "external" income test can not be conducted in a model with consistent assumptions made about the data unless one constrains the cost coefficients to be equal (which is done to hide statistically insignificant and wrong-size cost coefficients).

In Whitehead (2020) also doubt that income is the correct budget constraint. I suspect that survey respondents have some environmental contribution budget in mind when answering CVM questions. In footnote 4 DMT state that this is a violation of microeconomic theory. I assume that they are referring to neoclassical microeconomic and ignore behavioral economics. Even then, a two-stage budgeting decision is consistent with two-stage budgeting where a household first allocates income to different budget categories and then maximizes subutility functions subject to the budget constraint (Deaton and Muellbauer 1980 -- this theory led to the development of the Almost Ideal Demand System econometric model).

(10) My proposed hypothesis, based on my read of Desvousges, Mathews and Train (Land Economics, 2015) and the lack of the survey instrument (see (7)), is a one-tailed scope test. It is not a one-tailed adding-up test. Note also, that any information provided in a CVM survey about "substitute" environmental goods can be interpreted by respondents as complements (see Whitehead and Blomquist, WRR, 1991).

(11) Arrow et al. (1994) regret using the term adequate (in reference to the size of scope effects) in the NOAA Panel report. Instead they suggested the appropriate word is plausible scope effects. I pointed this out in Whitehead (Ecol. Econ 2016) and proposed scope elasticity as a measure. Scope elasticity is a more useful measure of plausibility than the adding-up test is for adequacy given difficulties in conducting an adding-up test.

(12) My Turnbull standard error estimates differ from DMT's (2015) standard errors. I applied the formulas in Haab and McConnell (2002) with pooled (smoothed) data. DMT (2020) report that they used the raw data to construct confidence intervals with the smoothed data WTP estimate. My estimates of the standard errors are larger than DMT's (2020). But, it seems like standard errors with the raw data (not smoothed) should be larger than standard errors from the smoothed data. DMT (2020) do not provide much information on this estimation so it is difficult to say more.

Comments welcome!

I first wrote about this in September 2016. I then submitted the comment to Land Economics. The editor sent me the results of an internal review and I revised it accordingly. Then he sent it out for external review and it received a favorable review in February 2017. But, the referee took issue with the reply to my comment. Apparently, the referee suggested that s/he would write a comment on the reply if it was published. The editor decided to reject the comment/reply because he said it is Land Economic's policy to only publish comments AND replies. That policy seems strange to me as I think it creates an incentive to write an objectionable reply to comments at Land Economics.

Next I sent it to the Economics: The Open Access, Open Assessment E-journal replication section in 2017. Again, I revised the comment before it was sent out for review. Once it was sent out for review I received three supportive reviews (based on my experience, I'm fairly sure they all supported a revision based on tone and suggestions). I revised and resubmited the paper several times from December 2018 through 2019 in response to referees and a few rounds of editor comments. Finally, I felt that the editor, who was "unconvinced by [my] argument as [I] were presenting it, was pushing me in directions that I didn't want to take the comment and I withdrew it from review in November 2019.

I next submitted the paper to Ecological Economics in December 2019. I received three reviews, each of which were thorough and supportive of publication (again, if my experience reading reviews is correct). I revised the paper and it was accepted for publication. Then, the editors sent it to Desvousges et al. for their reply. I have not yet read the reply. I imagine I'll have more to say when I have read it.

This has been a very frustrating process. It is difficult to get a comment published, especially after it was rejected at the journal where the original article was published. But, I'm glad that the paper is published since I don't think the Desvousges et al. data supports their conclusions.

The link to the paper is https://authors.elsevier.com/a/1bTdB3Hb~0IaFx.

The most recent issue of the Journal of Economic Perspectives features a three-article symposium on the 50th anniversary of the Clean Air and Clean Water Acts. (Though that's an iffy anniversary: the CWA was passed in 1972, so it's only 50 with rounding, and the CAA was originally passed in 1963, though the 1970 amendments contained much of the most important stuff.)

In the first article, Janet Currie and Reed Walker provide "a reflection on the 50-year anniversary of the formation of the Environmental Protection Agency, describing what economic research says about the ways in which the Clean Air Act has shaped our society—in terms of costs, benefits, and important distributional concerns."

In the second article, Richard Schmalensee and Robert Stavins discuss the evolution of the CAA over time: "We trace and assess the historical evolution of the Environmental Protection Agency's policy instrument use, with particular focus on the increased use of market-based policy instruments, beginning in the 1970s and culminating in the 1990s."

In the third article, David Keiser and Joseph Shapiro study the CWA and the Safe Drinking Water Act. They summarize four main conclusions: "First, water pollution has fallen since these laws were passed, in part due to their interventions. Second, investments made under these laws could be more cost effective. Third, most recent studies estimate benefits of cleaning up pollution in rivers and lakes that are less than the costs ... Fourth, economic research and teaching on water pollution are relatively uncommon."

I'm looking forward to hearing who gets hired for this job:

Nature, the international weekly journal of science, seeks to appoint an editor specializing in Environmental/Energy Economics to further our aim to publish the world’s best original research linking social sciences – including economics – to the physical and biological processes important to society. Central to this role will be determining how economics-based research is represented in our pages, particularly as it applies in the spheres of environmental, climate, ecological and energy sciences. This will be achieved through the solicitation, selection and preparation of manuscripts for publication, and by interacting closely with the relevant research communities. This is a demanding and intellectually stimulating position, and calls for a keen interest in the practice and communication of science.

via www.nature.com

Note: not that I'll ever submit a paper to Nature.

Here is a comment I received on a referendum repeated contingent valuation (aka, discrete choice experiment) survey:

I seriously dont think this is a realistic way to make decisions regarding such a complex situation. I hope my answers arent used to help anyone. I truly breezed through and became bored by the end. Much too complex for a lighthearted survey.

The respondent was from the SurveyMonkey (SM) Audience Panel which is more expensive and more problematic (the respondents voiced complaints to SM about my survey) than either respondents from the Research Now/SSI (and now they have a new name) or Qualtrics panels.

The good news is they answered the stated attribute non-attendance question consistent with this comment so sensitivity analysis will account for considering this as a "lighthearted survey" (at $8 per complete I don't consider it lighthearted).

I'm shouting because the journal is shouting:

Marine Resource Economicsis now accepting submissions for a new section of the journal titled Case Studies, which is intended to provide an outlet for rigorous, theoretically grounded analyses of the governance of individual fisheries and/or aquaculture systems. The new section will be edited by Tracy Yandle of Emory University, and the editors expect the first Case Study to be published in the forthcoming volume of the journal.“Case studies play a valuable role in the development of our understanding of effective marine resource governance, yet they are underrepresented in the economics literature. This new section presents a unique opportunity for researchers to apply an economic perspective to rigorous case studies—whether comparative case studies, or single case studies focused at a range of scales," said section editor Tracy Yandle. "I look forward to continuing the strong intellectual tradition of

Marine Resource Economics, while expanding its coverage to a broader range of settings and research methods.”The Case Studies section joins four sections currently published in

Marine Resource Economics: Articles, Perspectives, Systematic Reviews, and Book Reviews. Its published pieces will provide description and analysis of a particular regionally defined fishery, aquaculture system, marine resource, or comparisons of two or more cases, with an emphasis on an economic analytical perspective and focus on historic and/or current issues of marine or coastal zone policy and governance."I'm very excited for the potential of this new section to expand the reach of the journal to a wider range of scholars and resource management practitioners," said Joshua K. Abbott,

Marine Resource Economicseditor.The editors encourage submissions focusing on small-scale fisheries and aquaculture in developing nations. Case studies drawing upon quantitative evidence are preferred, though qualitative analyses are also encouraged—particularly in data-poor settings. All submissions to the Case Studies section are subject to a single-blind peer review process. For more information, please review

Marine Resource EconomicsInstructions for Authors webpage.

This is sure to be the home for many studies that haven't been able to find one in the past.