Whatever the result of all the study of the numbers today and the way in which calculations are done, what errors were made the thing you are all missing is that the fundamental ongoing issue is data quality! Assuming we debate and eventually conclude what the correct methodology for handling the data are in terms of computing averages, etc the fact remains that every day as new data are entered and things change (however those changes may come about for whatever reasons) if you are depending on those numbers for serious work you need to have tools to insure data quality.

What does that mean? It means that the NOAA and other reporting agencies should add new statistics and tools when they report their data. They should tell us things like:

a) number of infilled data points and changes in infilled data points
b) percentage of infilled vs real data
c) changes in averages because of infilling
d) areas where adjustments have resulted in significant changes
e) areas where there are significant number of anomalous readings
f) measures of the number of anomalous readings reported
g) correlation of news stories to reported results in specific regions
h) the average size of corrections and direction
i) the number of various kinds of adjustments, comparison of these numbers from pervious periods.

What I am saying has to do with this constant doubt that plagues me and others that the data is either being manipulated purposely or accidentally too frequently. We need to know this but the agency itself NEEDS to know this because how can they be certain of their results without such data? They could be fooling themselves. There could be a mole in the organization futzing with data or doing mischief. Even if they don’t believe there is anything wrong and everything is perfect they should do this because they continue to have suspicion of their data by outside folks who doubt them.

This is standard procedure in the financial industry where data means money. If we see a number that jumps by a higher percentage than expected we have automated and manual ways of checking. We will check news stories to see if the data makes sense. We can cross correlate data with other data to see if it makes sense. Maybe this data is not worth billions of dollars but if these agencies want to look clean and put some semblance of transparency into this so they can be removed from the debate (which I hope they would) then they should institute data quality procedures like I’ve described.

Further of course we need to have a full vetting of all the methods they use for adjusting data so that everyone understands the methods and parameters used and can analyze, debate the efficacy of these methods. The data quality data can then insure those methods appear to be being applied correctly. Then the debate can move on from all of this constant doubt.

As someone has pointed out if the amount of adjustment is large either in magnitude or number of adjustments that reduces the confidence in the data. Calculated data CANNOT improve the quality of the data or its accuracy. If the amount of raw data declines then the certainty declines all else being the same. The point is that knowing the amount of adjustments, the number of adjustments helps to define the certainty of the results. If 30% of the data is calculated then that is a serious problem. If the magnitude of the adjustments is on the order of magnitude of the total variation that is a problem. We need to understand what the accuracy of the adjustments we are making is too. We need statistical validation continuing (not just once but over time continuing proof that our adjustments are making sense and accurate).

In academia we have people to validate papers and there is rigor applied to an extent for a particular paper for some time on a static paper. However, when you are in business applying something repeatedly, where data is coming in continuously where we have to depend on things working we have learned that what works and seems good in academia may be insufficient. I have seen egregious errors by these agencies over the years. I don;t think they can take many more hits to their credibility.