Digital Life

Georgia Tech Researchers Design Machine Learning Technique to Improve Consumer Medical Searches

‘DiaTM’ can learn vernacular terms for health problems, symptoms

Atlanta, GA (November 17, 2010) — Medical websites like WebMD provide consumers with more access than ever before to comprehensive health and medical information, but the sites’ utility becomes limited if users use unclear or unorthodox language to describe conditions in a site search. However, a group of Georgia Tech researchers have created a machine-learning model that enables the sites to “learn” dialect and other medical vernacular, thereby improving their performance for users who use such language themselves.

Called “diaTM” (short for “dialect topic modeling”), the system learns by comparing multiple medical documents written in different levels of technical language. By comparing enough of these documents, diaTM eventually learns which medical conditions, symptoms and procedures are associated with certain dialectal words or phrases, thus shrinking the “language gap” between consumers with health questions and the medical databases they turn to for answers.

“The language gap problem seems to be the most acute in the medical domain,” said Hongyuan Zha, professor in the School of Computational Science & Engineering and a paper co-author. “Providing a solution for this domain will have a high impact on maintaining and improving people’s health.”

To educate diaTM in various modes of medical language, Crain and his fellow researchers pulled publicly available documents not only from WebMD but also Yahoo! Answers, PubMed Central, the Centers for Disease Control & Prevention website, and other sources. After processing enough documents, he said, diaTM can learn that the word “gunk,” for example, is often a vernacular term for “discharge,” and it can process user searches that incorporate the word “gunk” appropriately.

In this initial study using small-scale experiments, the researchers found that diaTM can achieve a 25 percent improvement in nDCG (“normalized discounted cumulative gain”), a scientific term that refers to the relevance of information retrieval in a web search. Zha, whose research focuses on Internet search engines and their related algorithms, said a 5 percent improvement in nDCG is “very significant.”

“DiaTM figures out enough language relationships that over time it does quite well,” said Steven Crain, Ph.D. student in computer science and lead author of the paper that describes diaTM. “Another benefit is we’re not doing word-for-word equivalencies, so ‘gunk’ doesn’t necessarily have to be connected to ‘discharge,’ as long as it’s recognized that ‘gunk’ is related to infections.”

Also, diaTM is not limited to medical search; it is a machine-learning technique that would work equally well in any topic-related search. In addition to approaching websites about incorporating diaTM into their search engines, Crain said one next stop is to develop the model so that it can learn dialects by looking at patterns that do not make sense from a topical perspective. For example, using a similar algorithm he was able to automatically discover dialects including text-speak dialect (e.g. “b4” as a subsititue for “before”), but the dialects were mixed in with topically-related groups of words.

“We’re trying to get to where you can isolate just the dialects,” Crain said.

“This feature will help common users of medical websites,” Zha said. “It will help enable consumers with a relatively low level of health literacy to access the critical medical information they need.”

DiaTM is described in the paper, “Dialect Topic Modeling for Improved Consumer Medical Search,” to be presented by Crain at the American Medical Informatics Association Annual Symposium, Nov. 17 in Washington, D.C. Crain’s coauthors include Hongyuan Zha, professor in the School of Computational Science & Engineering; Shuang-Hong Yang, a Ph.D. student in Computational Science and Engineering; and Yu Jiao, research scientist at Oak Ridge National Laboratory (ORNL). The research was conducted with partial funding from ORNL, Microsoft and Hewlett-Packard.

 

For more information contact:

Michael Terrazas

Assistant Director of Communications

College of Computing at Georgia Tech

404-245-0707

Photos

Click on an image below to see the full photo

  • Klaus building

Faculty

  • Amy Bruckman

    Amy Bruckman

    Associate Professor
    School of Interactive Computing, College of Computing

    Areas of Expertise:
    Educational Technology, Social Networking/Online Communities, Wikipedia, Twitter, Facebook, Internet Research Ethics, Human Computer Interaction, Human Computer Interaction for Kids

  • Carl DiSalvo

    Carl DiSalvo

    Assistant Professor
    School of Literature, Communication and Culture, Ivan Allen College of Liberal Arts

    Areas of Expertise:
    Participatory Design, Critical Design, Design Studies, Robotics and Sensing in Art and Community Settings

  • Keith Edwards

    Keith Edwards

    Associate Professor
    School of Interactive Computing, College of Computing

    Areas of Expertise:
    Social Impacts of Technology, Home Network Security, Home Networking, Human-Computer Interaction

  • Irfan Essa

    Irfan Essa

    Professor
    School of Interactive Computing, College of Computing
    School of Electrical and Computer Engineering, College of Engineering

    Areas of Expertise:
    Computational Video, Computational Photography, Computational Journalism, Computational Media, Computational Perception

  • Beki Grinter

    Beki Grinter

    Associate Professor
    School of Interactive Computing, College of Computing

    Areas of Expertise:
    Societal Impacts of Technology, Human-Computer Interaction, Computer Supported Cooperative Work

  • Renu Kulkarni

    Renu Kulkarni

    Executive Director, FutureMedia

    Areas of Expertise:
    Convergence of digital, social, mobile and multimedia industries, Strategic Alliances, Industry Partnerships, Open Innovation Practices

  • Blair MacIntyre

    Blair MacIntyre

    Associate Professor
    School of Interactive Computing, College of Computing
    School of Literature Communication and Culture, Ivan Allen College of Liberal Arts

    Areas of Expertise:
    Augmented Reality, Virtual Reality, Mobile Games, Social Games, Augmented Reality Games, Video Game Design, Video Game Architecture

  • Ali Mazalek

    Ali Mazalek

    Assistant Professor
    School of Literature, Communication and Culture, Ivan Allen College of Liberal Arts

    Areas of Expertise:
    Tangible Interfaces, Experimental Media, Media Arts, Interaction Design, Emerging Technologies

  • Janet Murray

    Janet H. Murray

    Ivan Allen College Dean's Professor
    School of Literature, Communication and Culture, Ivan Allen College of Liberal Arts

    Areas of Expertise:
    Game Design, Interactive Narrative, Interactive Television, Media Convergence, Information Design, Digital Media and Education

  • Elizabeth Mynatt

    Elizabeth Mynatt

    Director, GVU Center
    Professor, School of Interactive Computing
    Associate Dean for Strategic Planning and Initiatives
    College of Computing

    Areas of Expertise:
    Human-Computer Interaction, Human-Centered Computing, Health Informatics, Ubiquitous Computing, Assistive Technologies

  • Ashwin Ram

    Ashwin Ram

    Associate Professor
    School of Interactive Computing, College of Computing

    Areas of Expertise:
    Artificial Intelligence (AI) (Case-Based Reasoning, Natural Language, & Game/Entertainment AI), Human-Centered Computing - Cognitive Science, Healthcare Informatics

  • Bruce Walker

    Bruce Walker

    Associate Professor
    School of Psychology, College of Sciences School of Interactive Computing, College of Computing

    Areas of Expertise:
    Interactive Music, Mobile Music, Human-Computer Interaction, Auditory Perception, Psychology