Featured Research: Bridging the Worldwide Healthcare Gap
ICSI's Srini Narayanan, head of the AI Group, is collaborating with UC Berkeley and the Hesperian Foundation, based in Berkeley, to bring accessible health care information to rural areas in developing countries. He and Matt Gedigian, a master's student at the iSchool at UC Berkeley, are working on a Wikipedia–like Web site that would combine semantic search, digitized voice recordings, and multi–lingual translations of health care manuals to bring relevant information to users in poor communities around the world.
A Need for Digital Materials
Narayanan became interested in the work of the Hesperian Foundation in 2007 after talking with Madelaine Plauché, a postdoc in the Speech Group at the time. Plauché worked with TIER, a group of the UC Berkeley Computer Science Department that researches new technologies for developing regions. Through TIER, she had become familiar with Hesperian, a publisher of health care manuals designed for poor and semi–literate users. Hesperian's primary publication, Where There Is No Doctor, has been in use since 1977, when it was translated from the Spanish text Donde No Hay Doctor. The manual lays out, in simple terms and with frequent illustrations, how to prevent, diagnose, and treat diseases. It has been translated into 80 languages and is used in 108 countries. The foundation also offers books on dental care, community and environmental health, women's health, and a wide array of other issues. In 2008, the foundation shipped out 20,000 copies of its materials and users downloaded 300,000 free copies from its Web site.
After Narayanan read the text, the question for him became, he says, "Can we help this book be used by more?" Sending a copy of the book costs $30. While some copies are offered for free through the foundation's Gratis Book Fund and all materials are offered in PDF form on its Web site, Narayanan saw the need for an easy–to–use version of the online books, a cross–referenced and searchable database that would deliver the right information to the right user at the right time.
Plauché was involved in the original plan to create a Hesperian Digital Commons, a Web site similar to Wikipedia that would tailor Hesperian materials to an individual user's needs. In the last year, Narayanan and Gedigian have begun the work. The Rockefeller Foundation and the Bill & Melinda Gates Foundation, through the latter's grant to Hesperian, funds the work.
Making It Relevant
At the root of the research is the question of how to "customize large amounts of information to one user's needs," Narayanan says. He and Gedigian are working toward searchable versions of Hesperian's materials. So far, the researchers have uploaded and annotated ten Hesperian books in English of the 20 or so available. They are working to upload all Hesperian books in English to understand how difficult it will be to do the same for Hesperian books in different languages. They have also uploaded and begun tagging versions of Where There Is No Doctor in Spanish and Tamil, a language spoken in India, Sri Lanka, and Singapore.
To this end, Narayanan and Gedigian have created an ontology, a structured set of concepts and their relationships. For the researchers, this means thinking about the relationships between a disease, what causes it, and what cures it. The work is similar to that done in the FrameNet group: the researchers must think in terms of "frames," structures of related concepts. The researchers must know the relationship between aspirin and fever, and the relationships of those terms to a disease like the flu. They then tag these terms so that a search engine will return relevant results.
Much of this work can be done automatically. When text is scanned into the Wiki, it can be checked against lists of diseases and drugs. "Malaria," for example, will be automatically tagged as a disease and "penicillin" as a drug. Some symptoms can also be automatically tagged. More complicated are the causes and consequences of diseases: many of these must be annotated by hand.
Terms are tagged with a property so that they will show up in property searches, which look not just for a word but for the relationship the word has to its page. For example, an article on the flu contains the phrase, "Aspirin or acetaminophen helps lower fever and relieve body aches and headaches." This page will show up under a property search with "treated with" as the property and "aspirin" as the value. Such searches are helpful when, for example, users need to know what they might have if they are running a fever, or what viruses they might catch through contact with blood.
After tagging the books in English, the researchers want to do the same in five other languages. In addition to Spanish and Tamil, there is preliminary work being done in Pular, a West African language, and Narayanan hopes to work with French and Wolof, another West African language, in the future. French and Spanish, says Narayanan, would be the first or second language of most of the people he wants to reach.
The researchers chose Tamil as one of the first languages to upload to the digital commons because the script and grammar are dramatically different from those used in English and Spanish. Using Tamil, Narayanan hopes to work out low–level technical difficulties with Unicode, the standard that seeks to represent every character in every human language in a standardized way. While they work, the researchers are developing technical tools and extensions to open source software to make their work easier. Many of these tools make uploading PDFs onto their Web site easier, for example by allowing them to move captions and figures around. It is possible that in future, they will develop tools to allow them to use optical character recognition for languages not currently recognized.
Looking Forward
According to a study from New York University, over 90 percent of those who bought Where There Is No Doctor did so to teach or train others, and almost 95 percent bought the book to promote health within a community. While the initial set of online materials is designed to deliver highly personalized information to individual users, Narayanan and his colleagues are also looking forward to community–wide uses. They hope one day to be able to automatically produce flyers, pamphlets, and radio broadcasts with information gleaned from multiple books to respond to specific health threats and every day health problems. Further down the road, the researchers also hope to look into whether video, audio, or text information is most persuasive, particularly for users who are semi–literate.
Narayanan's work is part of a larger effort by Hesperian, funded by the Bill & Melinda Gates Foundation, to update their materials for the 21st century. Hesperian has asked its partners, who translate its materials into local languages, to comment on how efficient the materials are. There will be larger field tests in 2010.
But in the meantime, Narayanan says, "If we can help more people reach [these materials], I construe that as being a great thing. It makes sense to do, and we believe we have something to offer."