10 Ways We're Using Data to Fight Disease

Bruce Aylward, World Health Organization assistant director-general, speaks during a press conference on the Ebola roadmap in Geneva, Switzerland. Data science has greatly helped with mapping diseases. Murat Unlu/Anadolu Agency/Getty Images

Big data is one of the most powerful tools we have in the fight against disease. The more data we have in hand, the more educated we can be in the health care choices we make. Data can provide a landscape about the health of a particular community and teach us about patient commonalities so we can estimate risk factors. It can help us learn more about disease and therefore find a cure, or let us see how outbreaks travel to effectively contain them.

Data science is one of the most interdisciplinary fields in existence. Scientists, doctors, mathematicians, computer programmers and epidemiologists are just a few of the professions involved in data science. All people play a part in either collecting data, analyzing it, figuring out how to use it or acting on it.


Here are 10 ways data science has been used with different diseases and epidemics.

10: Preventing Cancer

A woman gets a mammogram at a hospital in Haute-Savoie, France. Recommendations for when to get mammograms have changed in recent years. BSIP/UIG via Getty Images

Not all cancers are preventable, but wouldn't you want to stop the ones that are? Screening for predisposition and early growth exists for cervical, breast, lung, prostate and colon cancers. But how do doctors determine guidelines on who should get screened, how often and when? The answer lies in big data.

The U.S. Preventative Service Task Force uses high-quality big data from large epidemiological studies to determine screening guidelines. For example, from studying the rate of false-positive cancer diagnoses in women in their 40s, the task force determined that getting mammograms before age 50 is unnecessary (unless there is a history of breast cancer in the family) [source: WebMD].


Pulling as much data as possible from cancer patients also teaches doctors about how cancers grow. The Oregon Health and Science University is undertaking trials of gene-sequencing thousands of cancer patients to learn more about how cancer formation occurs in different people so they can offer quicker diagnoses. The university even envisions being able to diagnose cancer within 24 hours by 2020, thanks to what they learn [source: Oregon Health and Science University].

9: Predicting Outbreaks for Mosquito-borne Diseases

Aedes aegypti mosquitoes are seen in a lab at the Fiocruz institute in Recife, Pernambuco state, Brazil. This mosquito transmits the Zika virus and is being studied at the institute. Mario Tama/Getty Images

Mosquitoes have long been spreaders of illnesses like malaria and dengue fever, so gathering information about the types of mosquitoes that carry these diseases and where they live can help us in our fight against these conditions. The more recent outbreak of the mosquito-borne virus Zika has shown us just how scary it can be to have a lack of data on how a disease spreads and what it can do to people.

To help battle these mosquito-spread illnesses, scientists from IBM, Johns Hopkins and the University of California San Francisco have collaborated on creating open source software that allows epidemiologists to make predictive disease models [source: Ungerleider]. The software is designed so that epidemiologists with minimal coding knowledge can still use it to run data analysis, predict the trajectory of outbreaks and plan strategies to contain disease spread.


The program uses data from the World Health Organization that shows a region's general sensitivity to outbreaks, population models of both humans and mosquitoes, and climate data that pinpoints potential outbreak locations. Taken together, this data can slow down the spread of mosquito-borne viruses.

8: Detecting Symptoms of Parkinson's Disease

Boxer Muhammad Ali, who had Parkinson's disease for years, is shown with his wife Yolanda Ali at an event. Axel Koester/Sygma/Sygma via Getty Images

Parkinson's disease, a neurological condition that affects more than 10 million people worldwide, provides a great example of how data collection combined with technology can make a difference in health care [source: Parkinson's Disease Foundation].

A person with Parkinson's often has very severe body tremors. These are caused because his or her brain slowly stops producing a neurotransmitter called dopamine. The less dopamine a person has, the less able he is to control his movements and emotions [source: National Parkinson Foundation].


However, by the time he has visible symptoms (like shaking) and is diagnosed with Parkinson's, as much as 80 percent of the neurons in his brain associated with dopamine have been destroyed [source: Feber]. While there is currently no cure for Parkinson's, there are treatments to keep the symptoms under control. So, if doctors can detect symptoms earlier, then treatment can start sooner.

To this end, several companies have been investigating wearable technology to gather data about barely noticeable tremors, walking gait and sleep quality. As the data is pulled together, it can provide information to the technology wearers about whether they might have a predisposition to Parkinson's and help them get treatment early. Collecting this massive amount of data in a central hub also gives doctors and scientists the ability to search for common threads in Parkinson's patients, perhaps one day leading to a cure.

7: Mapping Ebola Outbreaks

A woman looks at a map at the Dutch National Institute for Public Health and the Environment (RIVM) nationwide telephone information center at The Hague, set up for people who have questions about the virus of Ebola in 2014. VALERIE KUYPERS/AFP/Getty Images

From 2014-2015, a massive outbreak of Ebola occurred, mostly in West Africa. More than 11,000 people died of this disease in that region alone [source: Centers for Disease Control and Prevention (CDC)]. With the outbreak of the virus occurring in some of the poorest countries in the world, it was difficult to get medical information to citizens, and there was little infrastructure to combat the disease. A major concern in the global fight against Ebola was understanding where the virus was spreading in order to determine the areas with the most urgent needs for aid. And this is where data science stepped in.

Using real-time mapping software, scientists and public health workers can track the disease across Africa and predict the most vulnerable areas that might succumb to an outbreak in the future. Culling together data points about the location of bat species (the likely carrier of the Ebola virus), population density, travel time from the nearest major settlement, and a handful of other factors, scientists can get in front of the disease.


The mapping tool was rolled out at a workshop in February 2016. "I can easily go through the maps and see specifically the districts in Ghana where the niche of Ebola virus is, where is there likely going to be an outbreak, and then from there we can do the animal surveillance," said attendee Dr. Richard Suu-Ire, head of the wildlife veterinary unit in Ghana that is responsible for collecting bat samples for Ebola surveillance in his country [source: Fortunati].

6: Calculating Risk for Heart Disease

Lawanda Fearrington (left) and her sister Nicole both have familial dilated cardiomyopathy, a heart condition that killed their father in 2003 (shown in the picture they are looking at). Their other two sisters have the same illness. Michael S. Williamson/The Washington Post via Getty Images

One of the most powerful ways that data can be used in medicine is to calculate risk. When enough data points are gathered and analyzed, physicians and public health workers can determine not only what factors might play a role in a disease, but also the trigger point at which someone might become at high risk for contracting it.

Heart disease is an excellent example of this. It is the No. 1 cause of death in the U.S., attributable to one in four deaths [source: CDC]. Previously, physicians used to calculate the risk of heart disease primarily using cholesterol values. If cholesterol was high, patients were prescribed medication; if low, they were deemed to not be at risk.


However, using a collection of data gathered from multiple sources, the American College of Cardiology and the American Heart Association found commonalities in heart disease patients that extended far beyond simply having high cholesterol. With massive data sets on weight, race, age, history, cholesterol and a few other factors, the groups have generated a test that acts as a much more comprehensive and personalized risk calculator, called the ASCVD Risk Estimator [source: Gaglioti]. As a result, doctors have changed the way they practice and calculate risk for heart disease.

5: Halting Drug Epidemics

A police officer holds bags of heroin confiscated as evidence in Gloucester, Massachusetts. In 2015, Gloucester created the Angel Program, which directs addicts to treatment centers, instead of jailing them. The program has been copied by many police departments. John Moore/Getty Images

Drug use can ravage communities, just as many diseases do. The numbers of deaths from overdose in the United States are staggering – over 47,000 in 2014 alone [source: American Society of Addiction Medicine]. In fact, drug overdose is the leading cause of accidental death in the United States, and opioid addiction is driving the majority of deaths.

Tracking mortality data in different communities can give health care providers, governments and community activists a solid sense of how drugs might be influencing a particular region. Based on this data, they could know where particularly lethal strains of drugs might be infiltrating towns and use government action to stop the spread. Finding out more about where people are dying from overdoses can clue governments in to which communities need interventions, such as rehabilitation services or doctors to provide harm reduction strategies.


This type of strategy has helped many rural communities take action against the opioid epidemic, leading to very positive results. Several rural areas in the U.S. have followed the rehabilitation strategies set forth by the Gloucester, Massachusetts Police Department that, in just one year, led to more than 400 patients being referred to treatment and overnight incarceration costs dropping 75 percent. For instance, anyone with an addiction can walk into the police department and staff on hand will help get them into a treatment program [source: Toliver].

Finally, having drug-related mortality data in hand has led the Centers for Disease Control and Prevention to come up with guidelines for physicians on opioid prescription practices [source: Gaglioti]. Not only does the data help fight the epidemic, but it also gets at the root of the problem and can stop substance abuse before it takes hold.

4: Community-based Causes

Dr. Mona Hanna-Attisha, director of the Pediatric Residency Program at Hurley Medical Center who exposed Flint, Michigan's high lead levels in the water supply, testifies during a hearing on Capitol Hill. SAUL LOEB/AFP/Getty Images

Sometimes the data doesn't need to be "big" to have a major impact on fighting disease. A smaller, focused set of data can be eye-opening about the health of a community. The Flint, Michigan, water crisis is a perfect example.

An investigation by a civil engineer showed water samples from Flint homes contained high levels of lead; however, the evidence he unearthed was not enough to convince government leaders that the water was contaminated. After hearing about the engineer's study, a pediatrician in town decided to cull together her own data set.


Dr. Mona Hanna-Attisha gathered information from hospital records and found extraordinarily high levels of lead in the blood of child patients. Rather than waiting to get her findings published in a medical journal, she held a press conference, and the city officials were forced to listen.

Lead poisoning can have long-term effects on a child's brain development and behavior, and in Flint, nearly 27,000 children were exposed to lead in the city's water [source: D'Angelo]. Without the data set that proved there was something wrong, thousands more children could have been harmed.

3: Long-term Cohort Studies

NYC mayor Bill de Blasio delivered an address at an event honoring FDNY member Ray Pfeifer who died of a rare cancer believed to have come from 8 months' duty at Ground Zero. Pfeifer was an activist for expanded benefits. Andy Katz/Pacific Press/LightRocket via Getty Images)

Pools of big data are great places to go fishing for patterns. Scientists and physicians will sometimes engage in long-term studies of specific groups of people to learn if there are any commonalities in how their health progresses. For example, public health workers are currently engaged in a study of 9/11 first responders to learn the long-term effects of their exposure at Ground Zero. Being able to attribute rare cancers and respiratory illnesses they may develop to this exposure arms physicians and the government with more information about how to set up care and support systems.

One of the most impactful cohort studies is the Women's Health Initiative (WHI). Launched in 1993, this long-term clinical trial gathered data on 161,000 post-menopausal women to learn strategies for preventing heart disease, breast and colorectal cancers, and osteoporotic fractures [source: WHI].


The patterns the scientists noted in these women have changed the way health care providers prevent and treat these diseases, bringing a huge return on investment. Researchers employed a disease simulation model over a nine-year range (2003-2012) to compare the differences in women's health based on the findings from the WHI trials.

The model showed that by following the guidelines from the WHI, there were 76,000 fewer instances of cardiovascular disease, 126,000 fewer breast cancer cases and 4.3 million fewer combined hormone therapy users. Further, the disease model simulation showed that by employing the findings from the WHI over that nine-year stretch, Americans saved an estimated $35.2 billion in direct costs for health care [source: National Institutes of Health].

2: Tracking the Spread of Flu

A woman gets a flu shot at a pharmacy. The website FluNearYou.org allows Americans to post flu symptoms and scientists use the info to track flu trends. Terry Vine/Getty Images

Despite the push every year to encourage people to get vaccinated for the flu, this highly contagious respiratory illness still manages to strike millions of people in the U.S. every year and kill thousands of those who do get ill [source: CDC].

A person with influenza can infect others one day before symptoms are present, and up to seven days after she gets sick, so knowing where and when the flu is hitting its peak around a country is really valuable [source: CDC].


The website FluNearYou.org allows Americans to post symptoms they are having in weekly health reports. Thousands of individuals submit their reports to the website, and scientists map the crowdsourced data to find which symptoms are present and in which locations across the country.

Data science, however, is not always perfect. Google delved into the world of flu predictions with their Google Flu Trends (GFT). Based on people's searches of symptoms, they claimed they could gather enough data to provide accurate estimates of flu prevalence up to two weeks earlier than the CDC [source: Lazer]. Unfortunately, GFT failed to predict a large flu peak in 2013 (its algorithm included too many seasonal search terms unrelated to flu). While GFT failed, the concept of crowdsourcing data to make predictions about disease is one that often works quite well.

1: Crowdsourcing Computers

The World Community Grid asks people to donate the spare computing power of their personal devices to do research calculations for scientists. Kohei Hara/Getty Images

Gathering data into a central hub isn't the only way we can use crowdsourcing to help disease. Crowdsourcing computers to process the information are just as important.

The World Community Grid is an effort spearheaded by IBM that asks people to donate the spare computing power of their personal devices to fight disease. When your device is idle, it can do research calculations for scientists, so results that would have taken decades can be had in months. Crowdsourced computers have run simulations of cellular functions to understand diseases like tuberculosis; screened millions of chemical compounds against the target proteins that Zika likely uses to thrive in human bodies and identified genetic markers to help predict cancer.

More than 700,000 volunteers have signed on to help with these different projects already [source: World Community Grid]. With the amount of idle time that our collective devices could offer to these causes, this is one way that big data can make a big difference.

Lots More Information

Author's Note: 10 Ways We're Using Data to Fight Disease

Reading about the ways in which data can be crowdsourced for the good really made me want to participate in something like FluNearYou. It would feel great to be one of the pieces of data that helps shape the picture of the health landscape, thereby affecting the way doctors choose treatment plans. Everyone can do their own little part!

Related Articles

More Great Links

  • American Society of Addiction Medicine. "Opioid Addiction, 2016 Facts & Figures." (Oct. 6, 2016) http://www.asam.org/docs/default-source/advocacy/opioid-addiction-disease-facts-figures.pdf
  • Centers for Disease Control and Prevention. "Heart Disease Facts." Aug 10, 2015. (Oct. 6, 2016) http://www.cdc.gov/heartdisease/facts.htm
  • Centers for Disease Control and Prevention. "How Flu Spreads." Aug 15, 2015. (Oct. 6, 2016) http://www.cdc.gov/flu/about/disease/spread.htm
  • Centers for Disease Control and Prevention. "Seasonal Influenza, More Information." May 4, 2016. (Oct. 6, 2016) http://www.cdc.gov/flu/about/qa/disease.htm
  • D'Angelo, Chris. "How a Stubborn Pediatrician Forced the State to Take Flint's Water Crisis Seriously." Huffington Post. Jan 23, 2016. (Oct. 6, 2016) http://www.huffingtonpost.com/entry/pediatrician-forced-state-to-take-flint-crisis-seriously_us_569febbfe4b076aadcc5014e
  • Feber, Kit. "How is Data Science Fighting Disease?" LinkedIn. Feb. 19, 2016. (Oct. 6, 2016) https://www.linkedin.com/pulse/how-data-science-fighting-disease-kit-feber
  • Fortunati, Rachel. "Mapping Ebola to prepare for future outbreaks." Institute for Health Metrics and Evaluation. (Oct. 6, 2016) http://www.healthdata.org/acting-data/mapping-ebola-prepare-future-outbreaks
  • Gaglioti, Anne. Assistant Professor of Family Medicine, Morehouse School of Medicine. Personal Interview. Sept. 26, 2016.
  • Lazer, David; Kennedy, Ryan. "What We Can Learn From the Epic Failure of Google Flu Trends." Wired. Oct. 1, 2015. (Oct. 6, 2016) https://www.wired.com/2015/10/can-learn-epic-failure-google-flu-trends/
  • National Institutes of Health. "Health and financial analysis reinforces NIH's decision to fund Women's Health Initiative." May 5, 2014. (Oct 7, 2016) https://www.nhlbi.nih.gov/news/press-releases/2014/health-and-financial-analysis-reinforces-nihs-decision-to-fund-womens-health-initiative
  • Parkinson's Disease Foundation. "Statistics on Parkinson's." 2016. (Nov. 1, 2016) http://www.pdf.org/en/parkinson_statistics
  • Toliver, Zachary. "The Opioid Epidemic: Rural Organizations Fighting Back." The Rural Monitor. June 13, 2016. (Nov. 1, 2016). https://www.ruralhealthinfo.org/rural-monitor/opioid-epidemic-rural-organizations-fight-back/
  • Ungerleider, Neal. "Using Data, Scientists Can Predict Disease Outbreaks." Fast Company. Sept. 30, 2013. (Oct. 6, 2016) https://www.fastcompany.com/3018843/fast-feed/using-data-scientists-can-predict-disease-outbreaks
  • U.S. Preventative Services Task Force. "Breast Cancer: Screening." Jan. 2016. (Nov. 1, 2016) https://www.uspreventiveservicestaskforce.org/Page/Document/UpdateSummaryFinal/breast-cancer-screening1?ds=1&s=breast%20cancer