The Power of Green Tea Extract

A good antioxidant should contain green tea extract. Here’s why:

“Green Tea Extract (120 mg, 95 percent polyphenols)–Green tea antioxidants are of the same family as grape seed and pine bark extracts. They are polyphenols, chief of which are the flavonoids called proanthocyanidins. In green tea, the main proanthocyanidins are the catechins, and the most powerful of the catechins is epigallocatechin gallate (EGCG), found in the highest concentration in green tea. It works to prevent tumors from developing the blood vessels they need to survive (anti-angiogenesis) and it has been shown to inhibit metastasis. It is the first known natural telomerase inhibitor, eliminating the ‘immortality’ of cancer cells, which is what makes them so deadly. Green tea is particularly effective in destroying the causes of leukemia, prostate cancer, and breast cancer. It has also been shown to be effective in regulating blood sugar, reducing triglycerides, and in reversing the ravages of heart disease. (Incidentally, the Japanese, who drink large amounts of green tea, have some of the lowest rates of cardiovascular disease in the world.) Green tea seems to almost totally prevent cancer from causing DNA damage in smokers—a possible explanation as to why the Japanese, who are among the world’s heaviest smokers, have such a low incidence of lung cancer. Finally, green tea has great benefits for the brain as well, serving as an effective monoamine oxidase (MAO) inhibitor, protecting against brain-cell death. The net result is that there are strong indications that green tea extract may play a major role in protecting against both Parkinson’s and Alzheimer’s disease.

  • Note: the consumption of casein from dairy products can completely block the absorption of the main catechins found in green tea. In other words, drink your tea without milk, and take your green tea supplements separate from any dairy in your diet. Or, even better, just think of this as another reason to eliminate dairy from your diet.”


Dandelion Root Health Benefits from

Many think of dandelion as nothing more than a pesky weed. However, that hasn’t always been the case. In fact, gardeners in the past used to actually weed out grass, which they considered a weed, to make room for this medicinal flower. This may seem surprising at first, but less so when you consider that the dandelion is probably more nutritious than most of the vegetables in your garden.

Dandelion is considered a bitter herb that is chock full of vitamins A, B, C, and D and contains minerals such as iron, potassium, and zinc. In fact, it even contains more protein, almost as much iron, and four times the vitamin A content in spinach! It also has more vitamin C than tomatoes! How’s that for a common weed?

The dandelion was an essential herb that all natural healers kept as part of their medicinal arsenal. Today it is believed that it was so effective because it acted more as a multivitamin in a time when the concept of vitamins was completely foreign. In addition to its vitamin content, the dandelion contains powerful phytochemicals that have profound cleansing and healing effects on the body.

Historically the root and leaves have been used to treat liver problems. Native Americans boiled the dandelion in water and also used it to treat a variety of issues from kidney disease, swelling, skin problems, heartburn, and upset stomach. In Chinese medicine, it has been used to treat conditions such as stomach problems and appendicitis. In Europe, it was used for more common ailments such as for fever, boils, diabetes, and diarrhea.

Today, dandelion is commonly used as a diuretic that improves the function of the pancreas, spleen, stomach, and kidneys without depleting potassium from the body. The leaves are also helpful in stimulating the appetite and helping with digestion. But true herbalists know that medicinally the most powerful part of the dandelion is the root, and its true power lies in helping to detoxify the liver. This is because dandelion is one of the strongest herbal lipotropics known. That is to say, it flushes fat deposits from the liver, thereby helping to relieve chronic liver congestion.

It also increases the production of bile. And studies have proven that it actually has “liver healing” properties. Considering its primary benefits for liver and kidneys, it’s not surprisingly, then, that Jon Barron uses dandelion root in both his liver formulas (liver flush tea and liver flush tincture) and his kidney care formula.

Another healthful benefit is every part of the dandelion has some antioxidant properties and can also help improve the immune system. Dandelion root is also highly effective as a blood cleanser that strains and filters toxins from the blood and has beneficial effects on both red blood cell count and hemoglobin count.

In addition, the dandelion has been shown to…

  • Help regulate blood sugar and insulin levels.
  • Regulate blood pressure in the body due to its fiber and potassium content.
  • Help lower and control cholesterol levels.
  • Relieve pain and swelling.
  • Help slow cancer growth and prevent its spread.
  • Help in maintaining bone health.
  • Help treat skin diseases caused by microbial and fungal infections through use of dandelion sap.
  • Help treat acne through use of dandelion juice.

Considering all these health benefits and the fact that a cup of dandelion leaves contains merely 25 calories, you may often find dandelion leaves in salads, sandwiches, and teas. They also make a good addition to your morning smoothie. However, the leaves can have a bitter taste (a trait common to most liver herbs), so it is recommended to blend it with sweeter, flavorful fruits. Some people even use ground and roasted dandelion root as a caffeine-free coffee substitute.

You can find dandelion in a variety of forms from fresh to dried to tinctures, liquid extracts, teas, tablets, and capsules. If using fresh dandelion, you will want to make sure it is organic, or if using from your garden that you use leaves that haven’t been treated with pesticides.



Selecting the Right Doctor for Male Hormone Replacement Therapy

Aside from its findings on the safety and efficacy of various forms of HRT in men, Renew Man provided a valuable service to the public by describing the caution a male patient must exercise in selecting a physician to prescribe HRT initially and monitor his progress throughout the course of treatment. According to Renew Man, because of relatively lax medical laws and regulations in the state of Florida in the United States, many subpar providers of HRT operate out of that state. Among the many errors committed by these subpar providers were using the same HRT protocol for all patients, rather than an individualized approach; failing to test thyroid hormone and DHEA levels; failing to monitor and control hematocrit thereby increasing the risk of stroke some men experience from taking therapeutic doses of testosterone; inconsistent or even nonexistent follow-up biometric testing; and “stacking hormones,” which is the use of several synthetic versions of testosterone rather than a bioidentical hormone product, such as (generic) compounded testosterone.14

Family doctors “though well intentioned,” have woefully inadequate training to begin HRT for male patients. They frequently will attempt to boost testosterone with a synthetic FDA-approved prescription drug without realizing that any increase in testosterone will alter the male patient’s entire hormone cascade. It never dawns on many of these providers that they need to check and prescribe drugs to control the patient’s estradiol, dihydrotestoserone, and even thyroid hormone levels. According to Renew Man, the errors committed by inadequately trained family medicine practitioners include incorrect initial diagnosis, e.g., treating depression and lethargy with an antidepressant rather than recognizing it as a symptom of the larger andropause condition; treating all male patients with the exact same protocol; failing to periodically retest estradiol – which is known to cause myocardial infarctions, prostate cancer, and gynocomastia when elevated in men; failing to test for DHEA and thyroid hormones compounded by failure to recognize low-normal readings on Free T3 reflect hypothyroidism, another medical condition often present with andropause; and inconsistent or nonexistent biometric testing of male patients on HRT.14

“Urologists often have the knowledge base necessary to treat andropause, but will not take the time to provide safe, quality treatment. Urologists are often high-production doctors, and their practices see a high volume of patients. Hormone replacement therapy and treatment (if done correctly) takes time, patience, and follow-up. 14 An award-winning, board certified urologist in Tennessee advised me that he was unfamiliar with and uncomfortable prescribing CC and bioidentical testosterone needed in male HRT.

As to endocrinologists, “hormones are their specialty. However, 90+% of their clients are usually women and diabetics. Endocrinologists may know a lot about hormones, but they treat few if any andropausal men, and surprisingly, they usually lack the knowledge to do so effectively.” 14 My own research confirms these statements from Renew Man. An endocrinologist affiliated with a medical research foundation told me that male patients seeking HRT would be better served with a primary care physician who took the time to learn about male HRT through continuing medical education, rather than the typical endocrinologist. In addition, a board-certified endocrinologist in East Tennessee gave me a list of blood tests he would order for a male patient on HRT, but he failed to check for serum dihydrotestosterone levels – an obvious error.



14. [accessed on Sept. 2, 2014].

Fallacy of Relying Too Much on ORAC Value of Antioxidants











“Even if we assume that the tested ORAC figures are accurate, it is important to understand that having a high ORAC value in and of itself does not confer any particular advantage. That’s because not all antioxidants that are confirmed as present in a test tube can be absorbed and utilized by the human body. It doesn’t matter how high the value is in a test, if it doesn’t work in the body, it has no value to you. In addition, different antioxidants target different free radicals. Taking a supplement with an ORAC value of 17,000 that targets one group of free radicals still leaves you vulnerable to the ones not targeted.

Also, keep in mind that different antioxidants work in different areas of the body. The herb Ginkgo biloba, for example, works in the brain and cardiovascular system, whereas curcumin is active in the colon and silymarin in the liver. Again, having 5,000 ORAC units working in the brain isn’t much consolation if you have liver problems.

ORAC value tells only a very small part of the story. Saying that pycnogenol is twenty times more powerful than vitamin C, for example, is meaningless when it comes to scurvy. In that regard, vitamin C is infinitely more powerful than pycnogenol. Or to say that mangosteen is ten times stronger than noni is also meaningless. When it comes to raising nitric oxide levels, noni is infinitely stronger because mangosteen doesn’t do that. On the other hand, mangosteen appears to have much stronger anti-pathogenic activity than noni. So, ORAC value by itself presents a very incomplete picture.

Finally, there is a limit to how much you can benefit from an increased intake of antioxidants. The maximum number of ORAC units the body can handle in a given day is about 3,000 to 5,000 units. This is because the antioxidant capacity of the blood is tightly regulated, so there is an upper limit to the benefit that can be derived from antioxidants. Taking in 25,000 ORAC units at one time (as reputedly occurs with some mangosteen drinks, if you believe what you read on some websites) would be no more beneficial than taking in a fifth of that amount (at least in terms of its ORAC value). The excess is simply excreted by the kidneys. Let me rephrase this to make it even clearer. Taking more than 3,000-5,000 ORAC units a day of the same antioxidant is a bit like using a tank to go to the grocery store–it’s overkill. And promoting those super high numbers in advertising is a bit like a car dealer trying to convince you to buy that tank for your grocery shopping in the first place. It’s less than honest.

ORAC values are normally calculated on the basis of 100-gram portions. The reason is that ORAC was originally developed to give data on whole foods, and 100 grams works out to just under a 4-ounce portion. It is essential, therefore, to make sure that the comparison cited for ORAC values is based on equivalent volumes (or servings). When sellers of mangosteen drinks claim ORAC values far superior to other antioxidants, are they comparing serving to serving? Probably not. In many cases, they have extended the numbers out to give the ORAC values in a liter/quart of mangosteen juice and then compared that to 1-ounce servings of other liquid antioxidant supplements. To get the true value per recommended 1-ounce serving, you would have to divide by 32, which takes you down to a more reasonable 500–600 ORAC units per serving. Don’t get me wrong: I like mangosteen and use it in some of my own formulations, but I don’t think it’s useful to exaggerate the numbers. And besides, as discussed above, since antioxidant levels in the blood are tightly regulated by your body, there are probably no health benefits to numbers over 3,000–5,000 ORAC per serving of a single antioxidant anyway.

And when it comes to capsules, most capsules are 500 milligrams, which means it would take 56 capsules of an unconcentrated extract to equal an ounce of a food-based source of the antioxidants. In other words, it would take over 200 capsules to give you the same volume as a 4-ounce serving of the same antioxidant-rich whole food. That means the ORAC value of the capsule needs to be 200 times more concentrated than the whole food in order to give you an equivalent value. This can be done by removing the water and fiber, which have no ORAC value. Grape skin extract, for example, has a much higher ORAC value than the whole grape skins, but this does not mean that from a standpoint of cost, dose, and/or serving that the extract is necessarily superior. But keep in mind, there is the convenience factor. Isn’t it worth paying a premium to easily supplement with a full-spectrum antioxidant that works throughout the entire body and on all types of free radicals-an antioxidant that makes up for the fact that you aren’t including all necessary beneficial foods in your daily diet? Most definitely.”

— Jon Barron,

Excerpt from Walt Whitman’s “Song of Myself”

Song of Myself, Walt Whitman (from Leaves of Grass, first published in the 1855 edition)

And I know that the hand of God is the promise of my own,
And I know that the spirit of God is the brother of my own;
And that all the men ever born are also my brothers, and the women, my sisters and lovers;
And that a kelson of the creation . . . . . . is love; [5]

It may be if I had known them I would have loved them,
It may be you are from old people, or from offspring taken
soon out of their mothers’ laps,
And here you are the mothers’ laps.  [6]

It is not chaos or death–—it is form, union, plan–—It is eternal life.
It is HAPPINESS.  [50]

As adapted in the movie Peace, Love, and Misunderstanding (2011)

Jon Barron on Free Radicals & Disease











“A free radical is a cellular killer that wreaks havoc by damaging DNA, altering biochemical compounds, corroding cell membranes, and destroying cells outright. In this sense, a free radical can be thought of as an invader attacking the cells of your body. More technically, a free radical is a molecule that has lost one of its electrons and become highly unbalanced. It seeks to restore its balance by stealing a vital electron from another molecule.

Scientists now know that free radicals play a major role in the aging process as well as in the onset of cancer, heart disease, stroke, arthritis, and possibly allergies and a host of other ailments. The link between free radicals and the “aging diseases” is the most important discovery since doctors learned that some illnesses are caused by germs.

In a very real sense, the free radical process in our bodies is much the same as the process that causes fuel to burn, oil to go rancid, or an apple to turn brown if you slice it open and expose it to air. It is as though our bodies rust from the inside out, causing, among other things, dry, wrinkled skin. But wrinkles are the least of our problems. When the process gets out of control, it can cause tumors, hardening of the arteries, and macular degeneration, to name just a few of the problems. Think of free radicals as ravenous molecular sharks—sharks so hungry that in little more than a millionth of a second, they can be making a frenzied attack on a healthy neighboring cellular molecule. A single free radical can destroy an enzyme, a protein molecule, a strand of DNA, or an entire cell. Even worse, it can unleash, in a fraction of a second, a torrential chain reaction that produces a million or more additional killer free radicals.”

Jon Barron

Excellent Article from Paul Elsass, MSM

367d336Empathize With the Unemployed

Have you ever lost your job? No, I am not talking about the time when you were 16 and you got fired at the local hamburger joint for showing up late and refusing to wear the funny yellow hat you were issued. What about in your professional career? I did, and it was horrible.

Let’s back up for a minute…Like many of you, I had been working in my professional career successfully for many years. Over a 13 year period, I had worked for 7 different employers. Each time I chose to leave a company, I only did it if the move would advance my career and salary. This was working quite well, and I was on top of the world. I was making good money, working as a director with over 70 employees reporting to me, and feeling unstoppable.

It was 2008 and I had a hospital system ask me to leave a great job I had, to come help them open a brand new $21 million dollar facility. I would increase my salary by another 30% in this one move. I figured I had nothing to lose…famous last words, right? Things went extremely well there for the first few months. I was far ahead of goals and the C-suite was very pleased. I felt confident that even though they were hiring a new CEO, that CEO would see me as the golden-boy and I would be the last person he ever fired. Well, I was wrong.

It was February of 2009. I was called to meet with my boss and the company’s lawyer. I never even saw it coming. They were laying off several people, in an effort to reduce expenses and look better on the bottom line. Since the facility was so far ahead on the budget, the new CEO figured it was not necessary to have me running it anymore. He could just put it on auto-pilot and let some lower level manager handle it.

I remember walking home, tears in my eyes, asking God why this was happening to me. What had I done wrong?!? You cannot imagine the pain I felt when I delivered the news to my wife, who immediately burst in tears. We had two boys, a lot of debt and very little savings. I was only given one month’s severance and this was right when the economy tanked.

One task I dreaded, was calling the bank and telling them that I could not pay for my FJ Cruiser anymore. They told me to put the keys on the floorboard and they would come pick it up. When that tow truck drove away with my vehicle, I pretty much lost it. All confidence I had plunged.

Do you know what the worst part was? The rest of the world moved on, as if nothing had even happened. My life felt like it was not even remotely important to anyone. I reached out to every friend, family member and contact I knew. I wasn’t asking them to give me a job or money, I was only asking for leads or suggestions. I will be the first to tell you that prayer is important, but I will also tell you that after you hear 100+ people tell you that they will keep you in their prayers or “thoughts”, you get pretty cynical. Seriously, I thought, the best you can do is “think” about me. How hard is it to make a couple calls or send me some job leads you know about? Each day, while I was desperately searching for a job, I would think about all those people I knew, who were getting up, having their coffee, reading the paper and heading off to work. How lucky they were. I missed it so badly!

If you have never stood in line at the unemployment office, you should try it. It’s an incredibly humbling experience. Here I was, a guy with an MS in Business Management and a BS in Kinesiology, with over a decade of professional management experience, and I was asking the government for some money. I spent every single day, for 3 months, waking up at 5am and applying for every single job I could find, which might at least be more money that what I was getting from Uncle Sam. While I did finally land a job, it was in a place where the cost of living was double what I was accustomed to and the salary was half what I had been making. Still, I was so happy to have any job at all.

Now that I have thoroughly bummed you out, let me tell you 3 things I hope you will take from this:

1.) When someone loses their job, do just a bit more than offer to “think” or “pray” for them. Tell them the names of some folks you think they should call, or call a few folks for them. I had one good friend’s wife who sent me at least 5 job leads per week, while she was working her fulltime job and caring for her family. I will never forget her kindness!

2.) Empathize with them, because not one of us is immune from having our careers take a tragic turn. The day you lose your job, you will be so happy that you helped others, because that kindness will come back to you when you need it most.

3.) Feel life in an incredibly powerful way when you help someone find a job. I recently met a young person through Linkedin who was trying very hard to find a job. She mentioned a company I had worked for in the past and I said I knew some folks there. I made a couple of calls and she got an interview. She nailed the interview and was offered the job. Later, she wrote me and said how much she appreciated what I did. Honestly, I didn’t really do that much, but to her it was life-changing. The warm feeling I got from that cannot be described. Give it a shot and see what happens.


I hope you enjoyed this article! Would you please FOLLOW ME by clicking the yellow button on the top right? What about at least sharing the article? Think of it as an investment. Thanks! Feel free to reach out anytime.

-Paul Elsass Twitter @paulthoughts


Is lipstick still being put on the pig or do we need a rethink about interpreting patient reported outcomes?

PRO article

“Look at any paper or presentation reporting the development or use of a patient reported outcome (PRO) measure and without doubt there will be an array of statistical significance levels, standard deviations, standard errors and correlation coefficients in an attempt to help us understand what the data is telling us. But, is the application of classic statistical methods really telling us what we want to know? Interpretation of data derived from a patient reported outcome (PRO)and experience (PRE) measure can be challenging and even the final FDA PRO guidance admits that judgement is still required when evaluating whether individual’s responses are meaningful. So we ask the question  “Do we need a rethink about interpreting patient reported outcomes?””


Rand Study: Less than 1 million Americans lost health insurance due to ACA changes

  • Mark FloresMark Flores

    Vice President and Co-Founder at AVYM Corporation

    Michael-What do you make of this Rand Study, ( a traditionally right leaning group, that suggests that at least 9.3 million more Americans have health insurance now than in September 2013, virtually all of them as a result of the law?

    Additionally, as summarized in an LA Times article,
    (,0,6208659.column#ixzz2yLGPAsYt) the Rand study confirms other surveys that placed the number of people who lost their old insurance and did not or could not replace it — the focus of an enormous volume of anti-Obamacare rhetoric — at less than 1 million. The Rand experts call this a “very small” number, less than 1% of the U.S. population age 18 to 64.
    Rand acknowledges that its figures have limitations — they’re based on a survey sampling, meaning that the breakdowns are subject to various margins of error, and they don’t include much of the surge in enrollments in late March and early April. Those 3.2-million sign-ups not counted by Rand could “dramatically affect” the figures on total insureds, the organization said.

  • Michael A. S. Guth, Ph.D., J.D.

    ►Health Economist | Population Health Strategist | Healthcare IT Program Manager| Healthcare Management Consulting◄

    Interesting is my first comment. If the number of those who lost their old insurance is really less than 1 million (less than 1% of the population), then that would make an effective TV commercial for Democratic candidates this election year. “Less than 1% of people lost their existing plan….” In my case, Humana gave me the option of keeping my insurance for one more year or terminating coverage with the purchase of a new plan on the marketplace. I could argue either way that I should or should not be counted among those who lost insurance, because I had a delay on my loss but planned to get a new policy anyway.

    My intuition was that those who lost their insurance coverage would be closer to 5 million, because the insurers were free to discharge policy holders, and many insurers made it clear that they wanted to jettison their existing (medically underwritten) individual policy customers in favor of customers with the higher premium policies with no medical underwriting. But my intuition was based on the number of media stories about people losing coverage: was it a case of media distortion of reality?

    Here are two uncertainties with the Rand modeling effort. (1) “The HROS is conducted using the RAND American Life Panel, a nationally representative panel of individuals who
    regularly participate in surveys.” What kind of people regularly participate in surveys? People who like to hear themselves talk? People who can’t wait to have an opportunity to express their opinions? Is that a biased sample at the outset. (2) “We extrapolated from our sample to estimate the number of people in the population as a whole in each insurance category, as discussed in more detail below.” The details indicate various weighted averages were used. Who determined the weights? Rand said it developed the weights in an unbiased manner using census data. I suspect different econometricians could develop different weights all based on an “unbiased” method. Rand states “5 percent of respondents in our survey would be associated with 9.9 million individuals in the population as a whole.”

    They report in Table 2, that 26.2% of the sample (52 million Americans) were uninsured in 2013. That seems about right. Previous estimates were set at 49 million uninsured. This much passes the sanity check. Overall, given the somewhat conservative leanings of Rand Corp, this study seems like good news for the Obama Administration.

Big data: are we making a big mistake? By Tim Harford

By Tim Harford March 28, 2014 11:38 am

Big data is a vague term for a massive phenomenon that has rapidly become an obsession with entrepreneurs, scientists, governments and the media

Five years ago, a team of researchers from Google announced a remarkable achievement in one of the world’s top scientific journals, Nature. Without needing the results of a single medical check-up, they were nevertheless able to track the spread of influenza across the US. What’s more, they could do it more quickly than the Centers for Disease Control and Prevention (CDC). Google’s tracking had only a day’s delay, compared with the week or more it took for the CDC to assemble a picture based on reports from doctors’ surgeries. Google was faster because it was tracking the outbreak by finding a correlation between what people searched for online and whether they had flu symptoms.

Not only was “Google Flu Trends” quick, accurate and cheap, it was theory-free. Google’s engineers didn’t bother to develop a hypothesis about what search terms – “flu symptoms” or “pharmacies near me” – might be correlated with the spread of the disease itself. The Google team just took their top 50 million search terms and let the algorithms do the work.

The success of Google Flu Trends became emblematic of the hot new trend in business, technology and science: “Big Data”. What, excited journalists asked, can science learn from Google?

As with so many buzzwords, “big data” is a vague term, often thrown around by people with something to sell. Some emphasise the sheer scale of the data sets that now exist – the Large Hadron Collider’s computers, for example, store 15 petabytes a year of data, equivalent to about 15,000 years’ worth of your favourite music.

But the “big data” that interests many companies is what we might call “found data”, the digital exhaust of web searches, credit card payments and mobiles pinging the nearest phone mast. Google Flu Trends was built on found data and it’s this sort of data that interests me here. Such data sets can be even bigger than the LHC data – Facebook’s is – but just as noteworthy is the fact that they are cheap to collect relative to their size, they are a messy collage of datapoints collected for disparate purposes and they can be updated in real time. As our communication, leisure and commerce have moved to the internet and the internet has moved into our phones, our cars and even our glasses, life can be recorded and quantified in a way that would have been hard to imagine just a decade ago.

Cheerleaders for big data have made four exciting claims, each one reflected in the success of Google Flu Trends: that data analysis produces uncannily accurate results; that every single data point can be captured, making old statistical sampling techniques obsolete; that it is passé to fret about what causes what, because statistical correlation tells us what we need to know; and that scientific or statistical models aren’t needed because, to quote “The End of Theory”, a provocative essay published in Wired in 2008, “with enough data, the numbers speak for themselves”.

Unfortunately, these four articles of faith are at best optimistic oversimplifications. At worst, according to David Spiegelhalter, Winton Professor of the Public Understanding of Risk at Cambridge university, they can be “complete bollocks. Absolute nonsense.”

Found data underpin the new internet economy as companies such as Google, Facebook and Amazon seek new ways to understand our lives through our data exhaust. Since Edward Snowden’s leaks about the scale and scope of US electronic surveillance it has become apparent that security services are just as fascinated with what they might learn from our data exhaust, too.

Consultants urge the data-naive to wise up to the potential of big data. A recent report from the McKinsey Global Institute reckoned that the US healthcare system could save $300bn a year – $1,000 per American – through better integration and analysis of the data produced by everything from clinical trials to health insurance transactions to smart running shoes.

But while big data promise much to scientists, entrepreneurs and governments, they are doomed to disappoint us if we ignore some very familiar statistical lessons.

“There are a lot of small data problems that occur in big data,” says Spiegelhalter. “They don’t disappear because you’ve got lots of the stuff. They get worse.”

Four years after the original Nature paper was published, Nature News had sad tidings to convey: the latest flu outbreak had claimed an unexpected victim: Google Flu Trends. After reliably providing a swift and accurate account of flu outbreaks for several winters, the theory-free, data-rich model had lost its nose for where flu was going. Google’s model pointed to a severe outbreak but when the slow-and-steady data from the CDC arrived, they showed that Google’s estimates of the spread of flu-like illnesses were overstated by almost a factor of two.

The problem was that Google did not know – could not begin to know – what linked the search terms with the spread of flu. Google’s engineers weren’t trying to figure out what caused what. They were merely finding statistical patterns in the data. They cared about correlation rather than causation. This is common in big data analysis. Figuring out what causes what is hard (impossible, some say). Figuring out what is correlated with what is much cheaper and easier. That is why, according to Viktor Mayer-Schönberger and Kenneth Cukier’s book, Big Data, “causality won’t be discarded, but it is being knocked off its pedestal as the primary fountain of meaning”.

But a theory-free analysis of mere correlations is inevitably fragile. If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down. One explanation of the Flu Trends failure is that the news was full of scary stories about flu in December 2012 and that these stories provoked internet searches by people who were healthy. Another possible explanation is that Google’s own search algorithm moved the goalposts when it began automatically suggesting diagnoses when people entered medical symptoms.

Google Flu Trends will bounce back, recalibrated with fresh data – and rightly so. There are many reasons to be excited about the broader opportunities offered to us by the ease with which we can gather and analyse vast data sets. But unless we learn the lessons of this episode, we will find ourselves repeating it.

Statisticians have spent the past 200 years figuring out what traps lie in wait when we try to understand the world through data. The data are bigger, faster and cheaper these days – but we must not pretend that the traps have all been made safe. They have not.

In 1936, the Republican Alfred Landon stood for election against President Franklin Delano Roosevelt. The respected magazine, The Literary Digest, shouldered the responsibility of forecasting the result. It conducted a postal opinion poll of astonishing ambition, with the aim of reaching 10 million people, a quarter of the electorate. The deluge of mailed-in replies can hardly be imagined but the Digest seemed to be relishing the scale of the task. In late August it reported, “Next week, the first answers from these ten million will begin the incoming tide of marked ballots, to be triple-checked, verified, five-times cross-classified and totalled.”

After tabulating an astonishing 2.4 million returns as they flowed in over two months, The Literary Digest announced its conclusions: Landon would win by a convincing 55 per cent to 41 per cent, with a few voters favouring a third candidate.

The election delivered a very different result: Roosevelt crushed Landon by 61 per cent to 37 per cent. To add to The Literary Digest’s agony, a far smaller survey conducted by the opinion poll pioneer George Gallup came much closer to the final vote, forecasting a comfortable victory for Roosevelt. Mr Gallup understood something that The Literary Digest did not. When it comes to data, size isn’t everything.

Opinion polls are based on samples of the voting population at large. This means that opinion pollsters need to deal with two issues: sample error and sample bias.

Sample error reflects the risk that, purely by chance, a randomly chosen sample of opinions does not reflect the true views of the population. The “margin of error” reported in opinion polls reflects this risk and the larger the sample, the smaller the margin of error. A thousand interviews is a large enough sample for many purposes and Mr Gallup is reported to have conducted 3,000 interviews.

But if 3,000 interviews were good, why weren’t 2.4 million far better? The answer is that sampling error has a far more dangerous friend: sampling bias. Sampling error is when a randomly chosen sample doesn’t reflect the underlying population purely by chance; sampling bias is when the sample isn’t randomly chosen at all. George Gallup took pains to find an unbiased sample because he knew that was far more important than finding a big one.

The Literary Digest, in its quest for a bigger data set, fumbled the question of a biased sample. It mailed out forms to people on a list it had compiled from automobile registrations and telephone directories – a sample that, at least in 1936, was disproportionately prosperous. To compound the problem, Landon supporters turned out to be more likely to mail back their answers. The combination of those two biases was enough to doom The Literary Digest’s poll. For each person George Gallup’s pollsters interviewed, The Literary Digest received 800 responses. All that gave them for their pains was a very precise estimate of the wrong answer.

The big data craze threatens to be The Literary Digest all over again. Because found data sets are so messy, it can be hard to figure out what biases lurk inside them – and because they are so large, some analysts seem to have decided the sampling problem isn’t worth worrying about. It is.

Professor Viktor Mayer-Schönberger of Oxford’s Internet Institute, co-author of Big Data, told me that his favoured definition of a big data set is one where “N = All” – where we no longer have to sample, but we have the entire background population. Returning officers do not estimate an election result with a representative tally: they count the votes – all the votes. And when “N = All” there is indeed no issue of sampling bias because the sample includes everyone.

But is “N = All” really a good description of most of the found data sets we are considering? Probably not. “I would challenge the notion that one could ever have all the data,” says Patrick Wolfe, a computer scientist and professor of statistics at University College London.

An example is Twitter. It is in principle possible to record and analyse every message on Twitter and use it to draw conclusions about the public mood. (In practice, most researchers use a subset of that vast “fire hose” of data.) But while we can look at all the tweets, Twitter users are not representative of the population as a whole. (According to the Pew Research Internet Project, in 2013, US-based Twitter users were disproportionately young, urban or suburban, and black.)

There must always be a question about who and what is missing, especially with a messy pile of found data. Kaiser Fung, a data analyst and author of Numbersense, warns against simply assuming we have everything that matters. “N = All is often an assumption rather than a fact about the data,” he says.

Consider Boston’s Street Bump smartphone app, which uses a phone’s accelerometer to detect potholes without the need for city workers to patrol the streets. As citizens of Boston download the app and drive around, their phones automatically notify City Hall of the need to repair the road surface. Solving the technical challenges involved has produced, rather beautifully, an informative data exhaust that addresses a problem in a way that would have been inconceivable a few years ago. The City of Boston proudly proclaims that the “data provides the City with real-time information it uses to fix problems and plan long term investments.”

Yet what Street Bump really produces, left to its own devices, is a map of potholes that systematically favours young, affluent areas where more people own smartphones. Street Bump offers us “N = All” in the sense that every bump from every enabled phone can be recorded. That is not the same thing as recording every pothole. As Microsoft researcher Kate Crawford points out, found data contain systematic biases and it takes careful thought to spot and correct for those biases. Big data sets can seem comprehensive but the “N = All” is often a seductive illusion.

Who cares about causation or sampling bias, though, when there is money to be made? Corporations around the world must be salivating as they contemplate the uncanny success of the US discount department store Target, as famously reported by Charles Duhigg in The New York Times in 2012. Duhigg explained that Target has collected so much data on its customers, and is so skilled at analysing that data, that its insight into consumers can seem like magic.

Duhigg’s killer anecdote was of the man who stormed into a Target near Minneapolis and complained to the manager that the company was sending coupons for baby clothes and maternity wear to his teenage daughter. The manager apologised profusely and later called to apologise again – only to be told that the teenager was indeed pregnant. Her father hadn’t realised. Target, after analysing her purchases of unscented wipes and magnesium supplements, had.

Statistical sorcery? There is a more mundane explanation.

“There’s a huge false positive issue,” says Kaiser Fung, who has spent years developing similar approaches for retailers and advertisers. What Fung means is that we didn’t get to hear the countless stories about all the women who received coupons for babywear but who weren’t pregnant.

Hearing the anecdote, it’s easy to assume that Target’s algorithms are infallible – that everybody receiving coupons for onesies and wet wipes is pregnant. This is vanishingly unlikely. Indeed, it could be that pregnant women receive such offers merely because everybody on Target’s mailing list receives such offers. We should not buy the idea that Target employs mind-readers before considering how many misses attend each hit.

In Charles Duhigg’s account, Target mixes in random offers, such as coupons for wine glasses, because pregnant customers would feel spooked if they realised how intimately the company’s computers understood them.

Fung has another explanation: Target mixes up its offers not because it would be weird to send an all-baby coupon-book to a woman who was pregnant but because the company knows that many of those coupon books will be sent to women who aren’t pregnant after all.

None of this suggests that such data analysis is worthless: it may be highly profitable. Even a modest increase in the accuracy of targeted special offers would be a prize worth winning. But profitability should not be conflated with omniscience.

In 2005, John Ioannidis, an epidemiologist, published a research paper with the self-explanatory title, “Why Most Published Research Findings Are False”. The paper became famous as a provocative diagnosis of a serious issue. One of the key ideas behind Ioannidis’s work is what statisticians call the “multiple-comparisons problem”.

It is routine, when examining a pattern in data, to ask whether such a pattern might have emerged by chance. If it is unlikely that the observed pattern could have emerged at random, we call that pattern “statistically significant”.

The multiple-comparisons problem arises when a researcher looks at many possible patterns. Consider a randomised trial in which vitamins are given to some primary schoolchildren and placebos are given to others. Do the vitamins work? That all depends on what we mean by “work”. The researchers could look at the children’s height, weight, prevalence of tooth decay, classroom behaviour, test scores, even (after waiting) prison record or earnings at the age of 25. Then there are combinations to check: do the vitamins have an effect on the poorer kids, the richer kids, the boys, the girls? Test enough different correlations and fluke results will drown out the real discoveries.

There are various ways to deal with this but the problem is more serious in large data sets, because there are vastly more possible comparisons than there are data points to compare. Without careful analysis, the ratio of genuine patterns to spurious patterns – of signal to noise – quickly tends to zero.

Worse still, one of the antidotes to the multiple-comparisons problem is transparency, allowing other researchers to figure out how many hypotheses were tested and how many contrary results are languishing in desk drawers because they just didn’t seem interesting enough to publish. Yet found data sets are rarely transparent. Amazon and Google, Facebook and Twitter, Target and Tesco – these companies aren’t about to share their data with you or anyone else.

New, large, cheap data sets and powerful analytical tools will pay dividends – nobody doubts that. And there are a few cases in which analysis of very large data sets has worked miracles. David Spiegelhalter of Cambridge points to Google Translate, which operates by statistically analysing hundreds of millions of documents that have been translated by humans and looking for patterns it can copy. This is an example of what computer scientists call “machine learning”, and it can deliver astonishing results with no preprogrammed grammatical rules. Google Translate is as close to theory-free, data-driven algorithmic black box as we have – and it is, says Spiegelhalter, “an amazing achievement”. That achievement is built on the clever processing of enormous data sets.

But big data do not solve the problem that has obsessed statisticians and scientists for centuries: the problem of insight, of inferring what is going on, and figuring out how we might intervene to change a system for the better.

“We have a new resource here,” says Professor David Hand of Imperial College London. “But nobody wants ‘data’. What they want are the answers.”

To use big data to produce such answers will require large strides in statistical methods.

“It’s the wild west right now,” says Patrick Wolfe of UCL. “People who are clever and driven will twist and turn and use every tool to get sense out of these data sets, and that’s cool. But we’re flying a little bit blind at the moment.”

Statisticians are scrambling to develop new methods to seize the opportunity of big data. Such new methods are essential but they will work by building on the old statistical lessons, not by ignoring them.

Recall big data’s four articles of faith. Uncanny accuracy is easy to overrate if we simply ignore false positives, as with Target’s pregnancy predictor. The claim that causation has been “knocked off its pedestal” is fine if we are making predictions in a stable environment but not if the world is changing (as with Flu Trends) or if we ourselves hope to change it. The promise that “N = All”, and therefore that sampling bias does not matter, is simply not true in most cases that count. As for the idea that “with enough data, the numbers speak for themselves” – that seems hopelessly naive in data sets where spurious patterns vastly outnumber genuine discoveries.

“Big data” has arrived, but big insights have not. The challenge now is to solve new problems and gain new answers – without making the same old statistical mistakes on a grander scale than ever.

Tim Harford’s latest book is ‘The Undercover Economist Strikes Back’. To comment on this article please post below, or email