https://venturebeat.com/wp-content/uploads/2018/02/shutterstock_400845934-e1572448996720.jpg?resize=1200%2C600&strip=all — Image Credit: Shutterstock / Uber images

Nvidia researchers detail AI-powered clinical speech transcription system

14 Sep 2020, 14:15 by Kyle Wiggers

At the Conference for Machine Intelligence in Medical Imaging 2020, which was held virtually this year, Nvidia researchers presented a paper describing an AI system that captures and transcribes clinical patients’ speech. The system identifies clinical words and maps the words in a standardized health database, tasks the researchers say could alleviate pressure on clinicians as they experience pandemic-related overwork.

The coauthors suggest telemedicine as one potential use of the system, a field that has seen unprecedented demand during the coronavirus pandemic. In March, virtual health consultations grew by 50%, according to Frost and Sullivan research, with general online medical visits on course to hit 200 million this year.

At the core of the researchers’ system is a BERT-based language model pretrained in a self-supervised manner on a text dataset. (Self-supervised learning is a means of training models to perform tasks without providing labeled data.) Bio-Megatron, a model with 345 million parameters — configuration variables internal to the model — ingested and learned patterns from 6.1 billion words extracted from PubMed, a search engine for abstracts on life sciences topics.

After pretraining, the model was fine-tuned on a clinical natural language processing dataset created by a former National Institutes of Health (NIH)-funded National Center for Biomedical Computing agreement. Then, it was incorporated into an automatic speech recognition component that performs word identification and checks words against concepts in the Unified Medical Language System (UMLS), an ontology developed by the NIH’s National Library of Medicine.

In experiments running on Nvidia V100 and T4 graphics cards, the researchers report that Bio-Megatron achieved 92.05% accuracy after 1 millisecond of processing when taking into account precision and recall. “This opens significant new capabilities in systems where responsiveness to patients, clinicians, and researchers is paramount … An automatic speech recognition model that can extract and relate key clinical concepts from clinical conversations can be very useful,” they wrote. “We hope our contribution will help achieve faster and better patient responses, ultimately leading to improved patient care.”

Nvidia’s contribution to the research community comes after Microsoft coauthors proposed a ‘state-of-the-art’ biomedical language model dubbed PubMedBERT. They claimed they managed industry-leading results on tasks including named entity recognition, evidence-based medical information extraction, document classification, and more.