Quality Data Inputs Essential For Machine Learning – Forbes

Machine Learning Requires Quality Data
Multiple times over the last decade, this column has covered the issue of the importance of data quality in decision making, both by executives as well as machines. Back in 2014, when the “big data” craze was mesmerizing the C-Suite, the warning was issued in Big Data and the Madness of Crowds. More recently in How Bad Data Is Undermining Big Data Analytics from December 2020.
Since then, more and more news has emerged regarding the failures of AI and Machine Learning initiatives with the blame given to faulty data as the reason. The recent demise of IBM Watson Health is the latest example.
To weigh in on these changes, I recently interviewed Sandeep Konam, Co-founder and chief technology officer at Abridge.
Gary Drenik: Sandeep, before we dive into the industry issues, please tell us about your expertise and experience that led to the formation of Abridge.
Sandeep Konam: I’ve always believed that artificial intelligence has the potential to radically alter the field of healthcare and fundamentally change the way we look at health and wellbeing. Over the course of the past decade, I built various healthcare AI tools ranging from a cancer biomarker detection app to a low-vision navigation aid. If I were to pick one project that taught me the most, it would be EXAID, an NLP-based clinical trial matching tool for cancer patients and oncologists, that I built during my time at Carnegie Mellon University. It was inspired by personal experiences accompanying my grandmother to the hospital visits for her breast cancer treatment and wanting to improve care delivery experiences and outcomes for patients.
Working on EXAID exposed to me various fault lines in the US healthcare system, most notably, lack of patient centricity on the one hand and how physicians are overburdened with documentation on the other hand. Shaped by those observations and with shared conviction around how medical conversations exchanged between patients and physicians can unlock value for both the end-users, I co-founded Abridge with Shiv Rao. At Abridge, we are building conversation understanding AI to summarize healthcare conversations: to help patients better understand and follow through on their care and to help providers get a head start on their documentation.
It was also while building the clinical trial tool, that I started closely following and taking notes on IBM Watson’s healthcare work, some of which I recently summarized in a Quartz Op-ed.
Drenik: So where did IBM go wrong with Watson Health?
Konam: In response to my Quartz Op-ed, an ex-Watson employee commented that ‘Watson Health was a hammer immediately searching for about a thousand nails.’ That, I think, was their biggest problem – IBM looked to throw AI at everything from medical imaging to clinical trial recruitment. Moreover, they got too far ahead of their skis, spinning the PR machine without much success with scalable deployments and results.
Their timing could also have played a key part in how it all went down. AI techniques, especially around NLP, are way more advanced now than they were at the peak of Watson Health’s trajectory. Whether it is OpenAI’s GPT-3, DeepMind’s Chinchilla, or Google’s latest PaLM, large language models are showing great promise across a wide range of language understanding and generation tasks. It’s only a matter of time before some of these advancements can power clinical NLP applications. It’s crazy that something like BERT, which we use in medical conversation processing pipelines, didn’t exist before 2018.
Drenik: Data from Prosper Insights & Analytics shows how the pandemic drove Telemedicine adoption. When will AI be ready to really understand a conversation between Physician and Patient?
Prosper – Telemedicine Trends
Konam: Conversations, in general, are difficult for machines to track, as they are filled with interruptions, overlapping speech, false starts, filler words, and sometimes different accents. Medical conversations are even more difficult, as physicians and patients bounce from topic to topic and often switch between medical terminology and colloquial phrases. Through years of investments and efforts, at Abridge, we’ve managed to tackle some complex research challenges across information extraction, classification, and summarization in the medical conversation domain. Not just solve research challenges and publish at conferences, but we’ve also managed to productize the tech and unlock value for users.
Today, we can automatically:

All of this was possible because of our differentiated access to a one-of-a-kind dataset containing transcribed conversations and thorough annotations.
Drenik: How important is quality data in machine learning initiatives, especially in healthcare?
Konam: Quality data is essential to train deployable machine learning models in healthcare. Quality here doesn’t mean noise-free – It is critical to ensure that the dataset is representative of the deployment setting, which sometimes can be noisy and messy. Getting high-quality annotations is another crucial step toward building reliable machine learning systems. Annotation work is often seen as unappealing work, but it can lead to tremendous performance gains if done right. Beyond initial data and annotations, we’ve also actively prioritized enabling user feedback loops to drive continuous learning and improvements.
In settings like medical conversations, it is also important for our system to generalize across multiple specialties (from cardiology to primary care). We’ve ensured that our dataset has wide coverage across multiple specialties from the early days. Now, Abridge is the only automated solution that can work out-of-the box across any specialty!
Drenik: Thank you Sandeep for sharing your experience and insights. Indeed, quality data is essential to train deployable machine learning models.

Connect with Chris Hood, a digital strategist that can help you with AI.

Leave a Reply

Your email address will not be published.

© 2022 AI Caosuo - Proudly powered by theme Octo