In the summer of 2011, an epidemic of dengue fever hit the Pakistani province of Punjab, home to 100 million people. With no way of accurately detecting cases, health workers struggled to contain the disease. It spread quickly, especially through the populous city of Lahore. More than 21,000 people were eventually infected, and 350 of them died. “Hospitals were crowded with patients, and people were standing in line just to get themselves tested,” recalls Nabeel Abdur Rehman from Information Technology University in Lahore.
Rehman came up with a plan to ease the crowds of worried people. His boss, Umar Saif, happened to be the chair of the Punjab Information Technology Board, and together, they set up a free hotline manned by a hundred-strong team of medically trained operators. People could call, report their symptoms, get directions to the nearest hospitals if they were genuinely showing signs of dengue, find out if beds were available, and even request insecticide sprays for their homes or neighborhoods.
The hotline worked well. Since its inception, in September 2011, it has fielded more than 300,000 calls. But more importantly, Rehman’s team learned that they could use the volume of calls to forecast dengue outbreaks a few weeks in advance. And their predictions helped public health workers to focus their efforts in areas at greatest risk. “The forecast was being distributed to a large range of hospitals, and a lot of health workers acted upon it,” says Lakshmi Subramanian from New York University, who co-led the project. “It’s a system where the results were actionable.”
Containing a dengue outbreak is a data-driven game of whack-a-mole. There’s no cure or vaccine, so health workers typically focus on other means of preventing the disease. They poison the mosquitoes that spread it, and remove the stagnant water in which the insects breed. To do that effectively, workers really need to know exactly where the disease is rearing its head. “Getting access to that data in a developing country isn’t easy,” says Subramanian. When crowds are thick and resources are thin, data comes neither readily nor accurately.
One alternative is to use indirect data sources like helpline calls or internet search queries. As outbreaks begin, people start searching for information about why they’re feeling sick, and these searches could potentially be used to track diseases. That was the intuitive logic behind Google Flu Trends, a much-hyped way of predicting flu outbreaks by mining search queries. Unfortunately, it grossly overestimated flu levels in America three years in a row, and became known as a “poster child of the foibles of big data.” Telephone hotlines haven’t fared much better. Studies have found that call volumes correlate with flu levels in some regions but not others, making them too unreliable as a means of surveillance over large scales.
At first, that’s also what Rehman’s team found. They showed that the number of calls to their hotline correlated with the number of dengue patients in hospital a few weeks later—but only across Lahore as a whole, and not at finer scales.
The problem is that the search for information is driven by awareness as well as need. “News articles or public awareness campaigns can increase internet searches for a disease, which can limit the usefulness of this type of data,” says Hannah Clapham, an epidemiologist at the Oxford University Clinical Research Unit in Vietnam.
Rehman’s team realized this, and they knew how to deal with it. After the 2011 epidemic, the government of Punjab launched a string of awareness campaigns to teach people about symptoms, prevention measures, and the hotline itself. The team had information about the timing and location of these activities, so when they created a statistical model to predict dengue levels based on call volumes, they added data on awareness levels too. And for good measure, they included weather conditions that influence the lives of mosquitoes, like rainfall, temperature, and humidity.
With these factors accounted for, the model predicted the future numbers of dengue patients in Lahore’s 10 component towns with an average accuracy of 86 percent. The team then set up an app that allowed public health workers to check the model’s predictions in real-time, using their government-issued phones. They could spray insecticides or clean up stagnant water at specific places to contain the spread of the disease.The fact that public health workers are actually using the system “enables evaluation in real-time,” says Elaine Nsoesie, a professor of global health at the University of Washington. “Like other systems using non-traditional data sources, there is always a need for continuous maintenance and re-evaluation.” Indeed, Subramanian notes that their model isn’t static. It continuously retrains itself as new data comes in.
So far, it seems to be working. From the peak of 21,000 cases in 2011, Lahore experienced just 257 cases of dengue in 2012, and 1,600 in 2013. Of course, that decrease could also be due to awareness campaigns, other control efforts, and weather patterns. “But one of the things that majorly changed between 2011 and 2012 was the forecasting system we introduced,” says Rehman. “It’s part of a larger ecosystem that controlled dengue, but it did have an effect in targeting field workers only to areas where the disease was spreading.”
“The real test of any forecast is how it continues to perform going forward in time, handling longer-term changes in transmission, immunity, and behavior,” says Clapham. After all, Google Flu Trends looked promising at first, too. “It is also important to understand whether actions taken because of forecasts lead to an improvement in disease containment, which is not straightforward.”
That’s what Rehman and Subramanian are now working on. The app that they created also allowed public health workers to log their activities, and the team is trying to determine which measures were most effective. “This is a system that’s already running at scale in Pakistan,” says Subramanian. “It can be extended beyond phone calls to text messages or emails. It could be adapted to other countries. It’s being used for other infectious disease outbreaks, and you can see it being used for Zika or Ebola. It’s easy to deploy, and the cost is low.”