Big data in the clinic using data to guide practice

Elizabeth A Krupinski

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Radiology and pathology almost by definition are "big data" in both the healthcare and research arenas. The data sources are not only big, but incredibly varied. At the core of both medical practices are the acquired images that must be processed, transmitted, stored/archived and interpreted. Radiology is the most common medical imaging specialty with recent estimates of 3.6 billion medical procedures being done worldwide annually with ionizing radiation (3.1 billion diagnostic radiologic, 0.5 billion dental, and 37 million nuclear medicine examinations) [1]. The workload in pathology in terms of number of images that need to be processed and interpreted are incredibly large as well [2]. One of the key differences between radiology and pathology is that radiology has been digital for over 25 years and used routinely in clinical practice since the early 1990s; while pathology, although having had digital capability as whole slide imaging (WSI) since the late 1990s has yet to take off clinically due in most part to lack of FDA approval for the WSI acquisition and viewing systems for clinical use. With the advent of telemedicine the number of types of "big data" have increased exponentially with everything from store-forward dermatology and ophthalmology images, to realtime video data, to monitoring data of all sorts from wearable and other types of physiological monitoring sensors operated in every mode from periodic to 24/7. On the clinical side there are a number of tools that have been developed over the years to deal with all this data, most recently in the form of "dashboards" to help visualize aspects of clinical productivity such as device (e.g., MRI, CT, ultrasound) utilization to optimize patient scheduling and identify bottlenecks, clinician tracking to monitor RVUs (relative value units), QA/QC monitoring and a host of other metrics required to run a large or even small healthcare enterprise. Most of these tools include access not only to the raw data, but also to graphic and other visual representations to help users interpret the data efficiently and effectively. Surprisingly there have been very few studies on this aspect, although companies are starting to realize that there is usually no one-size-fits-all GUI and that product offerings likely need to be tailored somewhat at the individual site level. In terms of remote monitoring data there are a number of aspects relevant to the big data question, most typically centered on what data are useful and what data are not, and how to interpret the data (either by human or computer) in a meaningful way. For example, remote ECG monitoring systems are increasingly being used to monitor patients with a variety of cardiac conditions but there are significant challenges [3]. One of the key problems is trying to automate the analysis of the incoming data to reliability identify abnormal events (and intervene) without false positive rates that render the system useless due to the "annoyance" factor. Numerous groups are working on developing tools to classify the signals, but the problem is confounded by the fact that clinical guidelines are typically based on (often outdated and often applicable to only a narrow group of patients) population based data that at the individual level may be meaningless (e.g., baseline "normal" function for one patient may be quite different from baseline for someone else, so the algorithm needs to take this into account in order to maximize the true positive and minimize the false positive rates). The American Telemedicine Association (ATA) is developing a guideline document to address another important big data question in remote monitoring - the relationship between health care professionals and patient data, the responsibilities of the health care professional or clinical team to assess, interpret, and respond, to the data, and the responsibilities of the patient with respect to the data. For example, the nature and intent of use of the data are important to consider. If the data have a measurable impact on quality improvements in care delivery, it should receive a higher level of importance. If the data represent information about leading indicators for quality, then importance is high, but if they represent information for trailing indicators the data are less important. Data sources can also be weighted by importance. Healthcare provider requested data generated by the patient in a formal program based on improving health indicators, particularly for high risk patients, have high importance, while information generated by a patient and sent to a clinician that is not a part of a formal care process is of less importance. The technology itself can be taken into account as consumer-grade mobile technologies can be weighted less than an FDA cleared medical devices. From the research perspective there are numerous ways that big data are being used not only in terms of developing better ways to visualize and interpret image data (for example the use computer-aided detection and decision tools to analyze radiographic, pathologic and other clinical images), but also to analyze non-image medical data such as radiographic and pathologic reports. With respect to the latter, for example, there has been recent work in our department using sentiment analysis to assess whether the sentiment of a provider is associated with radiographic imaging utilization in the context of medical notes on ICU patients. Ten years of EMR data were analyzed using sentiment analysis software (a branch of natural language processing (NLP) that combines text analysis and computational linguistics to assess emotion in written text). Initial data suggest that there is a significant association with negative sentiment being associated with an increase in image utilization that decreased over time. In other words, physicians who judged their patients to be failing ordered more imaging exams than for those they perceived of as healthier or more stable, but then as time passed ordering was reduced overall. Another project is conducting a retrospective review of records from over 100,000 breast imaging studies to study rates of recall based on time intervals from last interpreted examination, permitting an analysis of the impact of human capital depreciation, fatigue, and more generally the day of week on the likelihood of a recall recommendation. This type of analysis could guide how we schedule our clinicians (easing them back into their routine so to speak) after vacations, multiple days on service and so on in order to avoid errors. Although much of the research focus in big data is focused on the computational side of the equation, there is some work being conducted on the human side - how does the clinician or other healthcare provider deal with all this information? On the one hand are studies that assess whether or not data from these sources (e.g., remote monitoring) has any impact on patient care [4]. On the other hand are the studies that approach data interpretation and use from a more human factors perspective. For example, we use eye tracking technology (Figure 1) to investigate and understand how clinicians interact with images and whether the information provided impacts both the decisions rendered as well as the efficiency with which they are rendered. (Figure Presented) In one recent study we examined the impact of adding a photo of the patient to the interface when radiologists interpreted a series of chest images. The goal was to address the fact that due to mislabeling of images "wrong patient" errors can and do occur. With 10 observers viewing a series of 21 portable chest x-ray images with and without photos, the presence of the photograph reduced the number of fixations (chest radiograph only mean 98.68; chest with photograph present 80.81; photograph 10.59) and total dwell (chest radiograph only mean 30.84 sec; chest radiograph with photograph present 25.68; photograph 3.93) on the chest radiograph as a result of periodically looking at the photograph. Overall time viewing the cases did not differ as time not spent on the radiograph was spent on the photograph. On average readers scanned from the radiograph to the photographs about 4 times during search. The added information of photographs may help with certain image interpretation tasks and may help personalize the reading experience for radiologists. There are numerous ways that big data is used in clinical practice. This presentation will highlight some of the novel approaches being used to analyze clinical data a variety of sources to address pressing questions in the clinic regarding ways to optimize data presentation for the clinical user as well as how to use big data to guide clinical practice and optimize patient care.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 International Conference on Collaboration Technologies and Systems, CTS 2016
EditorsWaleed W. Smari, Joseph Natarian
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages2
ISBN (Electronic)9781509022991
StatePublished - 2016
Event2016 International Conference on Collaboration Technologies and Systems, CTS 2016 - Orlando, United States
Duration: Oct 31 2016Nov 4 2016

Publication series

NameProceedings - 2016 International Conference on Collaboration Technologies and Systems, CTS 2016


Other2016 International Conference on Collaboration Technologies and Systems, CTS 2016
Country/TerritoryUnited States


  • Big data in healthcare
  • Clinical practice
  • Medical imaging
  • Sentiment analysis
  • Telemedicine

ASJC Scopus subject areas

  • Communication
  • Computer Networks and Communications
  • Computer Science Applications


Dive into the research topics of 'Big data in the clinic using data to guide practice'. Together they form a unique fingerprint.

Cite this