“It’s not that figures lie, but that liars sometimes figure.”

Karl Pearson, Mathematician

On February 26, 2007 the Daily Hampshire Gazette published an array of charts and statistics in a special business newspaper insert, Business 2007: Data Dashboard. While the story was informative and many newspapers use the same types of charts and numbers, it also raised some issues regarding the use of statistics to tell a story or frame a circumstance. Let’s take a closer look.

To the left I posted a gallery of the graphs and the statistics used. (If the print is too small for you, after opening the gallery right-click on each document and save to your computer. Once accomplished, you can then open and enlarge the documents for a clearer view.) Notable regarding the graphs is the lack of citation for those who constructed them, so presumably they are creations of newspaper staff. In addition, please examine the statistics and charts used by the City of Northampton Planning Department for the Sustainability Plan here and see if you can find any of the following:

  • Selection bias: Is there a description of the sample statistics, how they were gathered, their sources, what they infer, and whether they could be classified as representative?

  • Aspect ratio: which is the ratio of the width of the graph to its height. A tall narrow graph with a vertical axis that DOESN’T start at zero enhances differences. Short graphs that DO begin at zero minimize differences. Further, charts using percentages that DO NOT start at zero or DO NOT end at 100 can be misleading as well.

  • Disappearing baseline: comparing raw numbers without adjusting for expected differences in population. For example (not from the Gazette or Planning Dept.), five crimes committed in a community of 3,000 residents would have a different impact than five crimes committed in a community of 30,000, but they could be graphed to show an equal level of crime. Further, comparing financial figures across years or decades without using the Consumer Price Index or otherwise accounting for inflation is another example.

  • Arbitrary or misleading comparisons or selection of data.

  • Unequal class intervals.

  • Distortion of images.

Also of note: several of the references contained in the newspaper insert are somewhat vague. For instance the citation for the article Valley workforce in search of its future is, “The Pioneer Valley Planning Commission, in a 2006 report, took stock of changes afoot in the region’s workforce. Among the findings:” Another example, State workforce growth stalls was cited as such “In a 2006 report, MassINC. Examined the stall in growth of the state’s workforce. Among its findings:” Another, Road woes rank high with residents was cited this way, “In a recent quality of life poll conducted by MassINC, state residents ranked “the roads and traffic situation” second highest as a policy issue most in need of major improvement.”

These citations provide some information and though providing more specifics might push the boundaries of what mainstream news reporting is, more information would be helpful for those people interested. For instance, what report from the Pioneer Valley Planning Commission addressed changes in the region’s workforce and how or where can we examine it?

When reading newspaper articles, government reports, or blogs for that matter, and using them as the basis for determining our public policies, it’s important for elected officials and other government decision makers to consider the sources of the statistics, the parameters of the studies, and the representative-ness of the samples. Understanding better the methods used to compile and then disseminate information will aide officials in establishing more appropriate public policies.

For a pdf file on the use of statistics and the source for some of the above-mentioned information (Columbia) click here.