Introduction: Through forensic auditing a new way to monitor medical data was opened. Forensic auditing uses Benford’s law, which explains the frequency distribution in naturally occurring data sets. We applied this law on data for Maternal Mortality. This is an extremely important number in policy-making for sustainable project implementation. Methodology: The law states that the probability of a leading occurring number can be calculated through the following equation: observed and expected values were compared. To confirm statistical significance examination we used the Chi-square test. Results: The chi-square value for MMR was 21.08 for the 2012 report and 19.97 for the 2014 report. Chi-square was higher than the cut off value, which leads to the rejection the null hypothesis. The rejection of the null hypothesis means that the numbers observed in the publication are not following Benford’s law. Explanations can reach from errors, operational discrepancies and psychological challenges to manipulations in the struggle for international funding. Conclusion: Knowledge on this mathematical relation is not used widely in medicine, despite being a very valuable and quick tool to identify datasets in need of close scrutiny.
During the last years a mathematical law, discovered more than a hundred years ago, gave new insights into the quality of public health data. Operational inefficiencies, systemic flaws and even fraud can be detected easier. This tool is called the “first-digit law” or “Benford’s law” [
The exact mathematics of Benford’s law are beyond our article, but we would like to show that its intuitiveness is easy to understand. A small city with 100 pregnant women needs to double (to grow 100%) until the first-digit in the number of pregnant women will be replaced by a “2” (e.g. 200) as the first number. The number of pregnancies then needs again to grow 50% for the 3 and again at least 25% for the 4, until it ends up with 400 - 499 pregnancies. This shows that it is much easier―meaning here more frequent―to reach the lower numbers than the higher ones. This fact is addressed in Benford’s law.
Benford’s law was in recent years used to detect widespread fraud in big datasets (e.g. for the Greek economy [
Benford’s law states that the probability of a leading occurring number
This distribution shows that the number “1” occurs as first or leading number is much more common than all other numbers―in around 30.1 %of the cases; the number “2” in 17.6%, the number “3” in 12.5%, the number “4” in 9.7%, the number “5” in 7.9%, the number “6” in 6.7%, the number “7” in 5.8%, the number “8” in 5.1% and the number “9” in around 4.6% [
All expected frequencies for the second, third and fourth number can be calculated too [
When natural demographic data cover more than two orders of magnitude, have no artificial cut-off point and provide five data in each group then they are likely to satisfy the law of Benford well [
Deviating and non-conforming data are suspicious of systemic data challenges, arbitrary assignment of numbers, irregularities, psychological considerations, errors or fraud.
Benford’s law was applied to the data internationally available for MMR. Due to its widespread usage in organizations worldwide we used the data published in the UNICEF’s survey “The State of The World’s Children”. The reports of 2012 and 2014 were evaluated. They show adjusted data from 2008 and 2010 [
1) The data of Unicef reports “The State of The Worlds Children” from 2012 and 2014 [
180 countries (2012: 172) were reviewed and all data concerning MMR were counted for the frequency of the occurrence of all numbers from 1 - 9 in the first position. Results of the expected and the observed values are summarized in Tables 1-3.
2) Graphical comparison of the observed and expected data:
3) The calculation of significance:
Chi-square for the Maternal Mortality Rate is 21.08 for 2008 and 19.97 for 2012. The cut off value for a chi- square distribution with eight degrees of freedom and a level of significance of 0.05 (alpha = 0.05) is 15.51 [
Report Year | Available Numbers (years) | Number of Countries | Observed Values | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |||
2012 | 2008 | 172 | 34 | 27 | 20 | 15 | 21 | 15 | 11 | 16 | 13 |
2014 | 2012 | 180 | 30 | 34 | 24 | 19 | 19 | 19 | 13 | 14 | 8 |
Report Year | Expected Values | ||||||||
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
2012 | 51.772 | 30.272 | 21.5 | 16.684 | 13.588 | 11.524 | 9.976 | 8,772 | 7.912 |
2014 | 54.180 | 31.68 | 22.5 | 17.46 | 14.22 | 12.06 | 10.44 | 9.18 | 8.28 |
Report Year | α | df | X2 | Cut off for X² | HO |
---|---|---|---|---|---|
2012 | 0.05 | 8 | 19.97 | 15.51 | Rejected |
2014 | 0.05 | 8 | 21.08 | 15.51 | Rejected |
For 2008 and 2010 (the reports of 2012 and 2014) chi-square was higher than the cut off value. This had to lead to the rejection the null hypothesis.
The rejection of the null hypothesis means that the numbers observed in the publication are not following Benford’s law in the reports for 2010 and 2014. The deviation from the expected values need an explanation other than chance.
The first-digit law or Benford’s law was introduced through the so called “forensic accounting” into the scrutinization of data. It monitors whether data sets are of natural origin or not [
Surprisingly the knowledge of this statistical relation is not widely used in medicine. In our opinion the first-digit law is an easy to use, valuable and quick tool to identify data-sets in need of closer scrutiny. Additionally this law is accessible to the hard working clinician without thorough knowledge in statistics.
Benford’s law was used by us to scrutinize one of the most important data sets for policy making and programme evaluation in international public health. In the Unicef report “The state of the world’s children” [
Nevertheless data for MMR in the reports of 2012 and 2014 significantly deviate from the expected values.
Distributions evaluated through the first-digit law and not following the expected distribution are in need of an explanation for their distribution. Benford’s law only states this fact without offering an explanation other than that the reason is not natural and not by chance. Benford’s law shows that a deeper evaluation of the mere fact of a distribution anomaly is needed. The reasons for the anomaly can reach from computational challenges, human errors or systemic operational discrepancies to psychological challenges and deliberate manipulation in the struggle for international funding.
Our reason to write this article and the purpose of this paper was to show that Benford’s law can be an indicator on when to be careful with data, even when they originate from well established sources and put together with care, expertise and experience. Awareness needs to be raised for the importance of a degree of suspicion towards established data sources. Additionally we wanted to show that this is possible even for the not specially trained clinician through this relatively new tool.
Following limitations of our study need to be considered. It was not possible to find the MMR for several countries in the Unicef report, what lowered the number of available data. The Vatican had to be counted as a country even when it has “by definition” no MMR. Some countries might not be considered as “full” countries by some scholars and other places with questionable status were not incorporated in the list of countries. Countries with an extremely low MMR will be much less affected by a deviation of the values, so that the importance of the registered imbalances lies much more on the shoulders of the (usually) poor countries with a high MMR.
All together we have to raise the suspicion, that these important data were not really reliable and accurate during the last years. Our analysis through the Benford distribution showed that data are not only flawed by the well known difficulties in data collection worldwide, which we all appreciate (remote areas, dictatorships, no money to pay the collectors, computer challenges etc. etc.)―but we have to suspect that there might be another systematic flaw to them. This would be well known from other socioeconomic data (e.g. tax evasion [
International data available for MMR seem not to be as reliable as it is often thought. We should very critically reflect the importance of this finding―especially for future public health planning and funding in resource poor countries.