Review of Literature: Use of Deep Learning for Cancer Detection in Endoscopy Procedures

By Nitya Lorber, Biology and Human Physiology ’23

Author’s Note: I think now more than ever, the reality of artificial intelligence is knocking on our doors. We are already seeing how the use of AI programs are becoming more and more normalized for our daily use. AI is now driving our cars, talking to us through chatbots, and opening our phones with facial recognition. Frankly, I find it both incredible and intimidating having an artificial and computerized program making decisions with the intent of modeling the reasoning capabilities of the human mind. As an aspiring oncologist, I was really interested to see how AI is being used in the healthcare system, specifically in the field of oncology. So when my biological sciences writing class asked me to write a literature review on a topic of my choice, it was a no brainer – no AI needed. I hope that readers of this review can come away with a sense of comfort that AI is being used for improving cancer detection to potentially save lives.

ABSTRACT

Deep learning is a new technological science programmed to emulate and broaden human intellect [1]. With technological improvements and the development of state-of-the-art machine learning algorithms, the applications are endless for deep learning in medicine, specifically in the field of oncology. Several facilities worldwide train deep learning to recognize lesions, polyps, neoplasms, and other irregularities that may suggest the potential presence of various cancers. For colorectal cancers, deep learning can help with the early detection during colonoscopies, increasing adenoma detection rate (ADR) and decreasing adenoma miss rate (AMR), both essential indicators of colonoscopy quality. For gastrointestinal cancers, deep learning systems, such as ENDOANGEL, GRAIDS, and A-CNN, can help in early detection, giving patients a higher chance of survival. Further research is required to evaluate how these programs will perform in a clinical setting as a potential secondary tool for diagnosis and treatment.

INTRODUCTION

Artificial intelligence is the ability of a computer to execute functions generally linked to human intelligence, such as the ability to reason, find meaning, summarize information, or learn from experience [2]. Over the years, computer computing power has significantly improved, and its progress has provided several opportunities for machine learning applications in medicine [1]. Generally, deep learning in medicine utilizes machine learning models to search medical data and highlight pathways to improve the health and well-being of the patient, most commonly through physician decision support and medical imaging analysis [3]. Machine intelligence collects data and identifies pixel-level features from microimaging structures, which are easily overlooked or invisible to the naked eye [1, 4]. Deep learning is a subfield of machine learning that uses artificial neural networks to learn patterns and relationships in data. Its basic structure involves trained interconnected nodes or “neurons” organized into layers [1]. What sets deep learning apart from other types of machine learning is the depth of the neural network, which allows it to learn increasingly complex features and relationships in the data. The field of oncology has begun to incorporate deep learning in their screenings for cancers by training deep learning to recognize lesions, polyps, neoplasms, and other irregularities that may suggest the potential presence of various cancers, including lung, breast, and skin cancers. In an experimental trial setting, deep learning has shown its ability to aid in early cancer detection for a variety of cancers, specifically colorectal and gastrointestinal cancers, and although few studies show its performance in clinical settings, preliminary studies illustrate promising results for future deep learning applications in revolutionizing oncology today. The traditional approach to detecting colorectal and gastrointestinal cancers is through screening endoscopy procedures, which allow physicians to view internal structures [5-8]. Colonoscopies are a type of endoscopy that inserts a long flexible tube called the colonoscope into the rectum and large intestine to detect abnormalities, such as precancerous and cancerous lesions [7-9]. Advancing diagnostic sensitivity and accuracy of cancer detection through deep learning helps save lives by catching the disease before it progresses too far [1, 4].

DETECTION OF COLORECTAL CANCERS

Colorectal cancers (CRC), cancers of the colon and rectum, have the second highest cancer death rate for men and women worldwide [5]. Frequent colonoscopy and polypectomy screening can reduce the occurrence and mortality from CRC by up to 68% [5, 7]. However, several significant factors determine colonoscopy quality: the number of polyps and adenomas found during colonoscopy, procedural factors such as bowel preparation, morphological characteristics of the lesion, and most importantly, the endoscopist [5-8]. The performance of the endoscopist can vary for several factors, including the level of training, technical and cognitive skills, knowledge, and years of experience inspecting the colorectal mucosa to recognize polypoid (elevated) and non-polypoid (non-elevated) lesions [6, 7].

The most essential and reliable performance indicator for individual endoscopists is their adenoma detection rate (ADR) [5, 6]. ADR is the percentage of average-risk screening colonoscopies in which one or more adenomatous colorectal lesions are found, quantifying the endoscopists’ sensitivity for detecting CRC neoplasia [5, 7]. ADR is inversely related to incidence and mortality of CRC after routine colonoscopies [5-7]. Another performance indicator commonly used to investigate differences between endoscopists or technologies is the adenoma miss rate (AMR), calculated in sets of two repeated colonoscopies on the same subject and by finding the number of lesions missed in the first trial but found in the second [7]. The issue with the current approach to detecting CRC is the variability in performance, leading to widely diverse ADRs and AMRs amongst endoscopists. This variability often results in missed polyps and overlooked adenomatous lesions in patients, which can have serious consequences [5-8].

DEEP LEARNING IN COLONOSCOPIES

Deep learning provides a possible solution to the endoscopist performance variability problem. Deep learning could provide a standardized approach to colonoscopy imaging that would help eliminate inaccuracies generated by endoscopists who may have been distracted, exhausted, or less experienced [6, 8]. Over the past few years, several studies have analyzed deep learning’s impact on endoscopy quality (i.e. ADR, AMR) and how it plays a role in reducing the rate of CRCs. Convolutional neural networks (CNNs) succeed in image analysis tasks, including finding and categorizing lesions [5]. In addition, another experimental approach involves developing a computer-aided detection (CADe) system using an original CNN-based algorithm for assisting endoscopists in detecting colorectal lesions during colonoscopy [7]. Overall, deep learning systems can improve endoscopy quality and possibly reduce the CRC death rate by increasing ADR and polyp detection rates in the general population [5-8].

The known fact that deep learning can increase ADR has led to several subsequent studies on how this technology may impact our current system. For instance, it was not previously known how the increase of ADR by deep learning relates to physician experience. In trying to determine this relationship, Repici A, et al. (2022) discovered that both experienced and non-experienced endoscopists displayed a similar ADR increase during routine colonoscopies with CADe assistance compared to those without CADe assistance [6]. Surprisingly, this study concluded that deep learning was a significant factor for the ADR score, while also finding that the level of experience of the endoscopist was not [6]. Along with increasing ADR, Kamba et al. 2021 explored how deep learning would impact AMR and found a reduced AMR in colonoscopies conducted with CADe assistance compared to standard colonoscopies [7]. This study further confirmed conclusions made by Repici A, et al., saying endoscopists of all experiences using CADe will benefit from the reduced AMR and increased ADR [6, 7].

Moreover, deep learning is exceptionally well-trained in detecting flat lesions, which are often overlooked by endoscopists [6-8]. In evaluating deep learning use for detecting Lynch Syndrome (LS), the most common hereditary CRC syndrome, Hüneburg R, et al. found a higher detection rate of flat adenomas using deep learning compared to the High-Definition White-Light Endoscopy (HD-WLE), a standard protocol commonly used to examine polyps [8]. However, unlike other studies, the overall ADR was not significantly different between deep learning and HD-WLE groups, most likely from the study’s small sample size and exploratory nature [8]. This study was not the only one to observe a lack of significant increase in ADR. Zippelius C, et al. (2022) sought to assess the accuracy and diagnostic performance of a commercially available deep learning system named the GI Genius system in real-time colonoscopy [5]. Although the GI Genius system performs well in daily clinical practice and could very well reduce performance variability and increase overall ADR in less experienced endoscopists [8], it performed no better than that of expert endoscopists [5]. Overall, deep learning demonstrated to be superior or equal to standard colonoscopy performance, but never worse [5-8].

DETECTION OF UPPER GASTROINTESTINAL CANCERS

Upper gastrointestinal cancers, including esophageal and gastric cancer, are among the highest-ranked malignancies and causes of cancer-related deaths worldwide [4, 10, 11]. Of these, gastric cancer is the fifth most common form of cancer and the third leading cause of cancer-related deaths worldwide, with approximately 730,000 deaths each year [10,11]. Most upper gastrointestinal cancers are diagnosed at late stages in cancer because their signs and symptoms go unnoticed or are too general to produce a correct prognosis [10]. On the other hand, if these cancers are detected early, the 5-year survival rate of patients can exceed 90% [10, 11]. To diagnose gastrointestinal cancers, endoscopists must first conduct esophagogastroduodenoscopy (EGD) procedures examining upper gastrointestinal lesions to first find the early gastric cancer (EGC) [4, 11]. However, similar to colonoscopies, endoscopists require long-term specialized training and experience to accurately detect the difficult-to-see EGC lesions with EGD [4, 11]. EGD quality varies significantly by the endoscopist performance, and consequently impacts patient health [4, 10-11]. Because of the subjective, operator-dependent nature of endoscopy diagnosis, many patients are at risk of leaving their endoscopy examinations with undetected suspicious upper gastrointestinal cancers, especially if they are in less developed remote regions [10]. The rates of undetected upper gastrointestinal cancers go as high as 25.8%, and 73% of these cases resulted from endoscopists’ mistakes, such as the inability to detect a specific lesion or by mischaracterizing the lesion as benign during a biopsy [11]. There is a dire need for improved endoscopy quality and reliability as current tests rely too greatly on endoscopist knowledge and experience, creating too great of a variable for EGC detection [10, 11].

DEEP LEARNING IN ENDOSCOPIES

Deep learning systems may effectively monitor blind spots during EGDs, but very little research on deep learning applications in upper gastrointestinal cancers was conducted before 2019 [4, 11]. Previously, deep learning had been mainly used to distinguish between neoplastic, or monoclonal, and non-neoplastic, or polyclonal, lesions [10, 11]. However, CNNs were not among the researched algorithms, and the then-examined systems could not sufficiently distinguish between malignant and benign lesions [10, 11]. The first functional deep learning system to specifically detect gastric cancer was the 2019 “original convolutional neural network” (O-CNN), but this system had a low statistical precision, rendering it unviable for clinical practice [11]. This prior lack of research led to the development of three deep learning systems that could be used to detect and diagnose upper gastrointestinal cancers in hopes of catching the disease in its early stages to help the patient best: GRAIDS, ENDOANGEL, and A-CNN.

The first deep learning system developed and validated was the Gastrointestinal Artificial Intelligence Diagnostic System (GRAIDS), a deep learning semantic segmentation model capable of providing the first real-time automated detection of upper gastrointestinal cancers [10]. Luo H, et al. (2019) trained GRAIDS to detect suspicious lesions during endoscopic examination using over one million endoscopy images from six hospitals of different experiences across China [10]. GRAIDS is designed to provide real-time assistance for diagnosing upper gastrointestinal cancers during endoscopies as well as for retrospectively assessing the images [10]. In the study, Luo H, et al. (2019) found that GRAIDS could detect upper gastrointestinal cancers retrospectively and in a prospective observational setting with high accuracy and specificity [10]. GRAIDS’s high sensitivity is similar to that of expert endoscopists. However, GRAIDS cannot recognize some gastric contours delineated by experts leading to an increased risk of false positives, suggesting that this system is most effective as a secondary tool [10]. GRAIDS is seen as a cost-effective method for early cancer detection that can help endoscopists of every experience level [10].

The second deep learning diagnostic system is called Advanced Convolutional Neural Network (A-CNN), an upgraded version of O-CNN developed by Namikawa K, et al. (2020) [11]. Improving upon its predecessor, A-CNNs were able to successfully distinguish gastric cancers from gastric ulcers with high accuracy, sensitivity, and specificity [11]. This upgraded system is an essential improvement because gastric ulcers are often mistaken for cancer, leading to unnecessary cancer treatments for the patient. A-CNN can now help endoscopists in early diagnosis, improving survival rates of gastric cancers [11]. In addition, this program also helps to standardize the endoscopy approach to assuage some of the endoscopist performance variability [11].

The third deep learning system is ENDOANGEL, developed by Wu L, et al. (2021). Like A-CNN, ENDOANGEL is an upgrade of an older algorithm derived from CNNs called WISENSE [4]. Before the update, WISENSE illustrated the ability to monitor blind spots and create phosphodocumentation in real time during EGD [4]. Compared to WISENSE, ENDOANGEL achieved real-time monitoring during EGD with fewer endoscopic blind spots, a longer inspection time, and EGC detection with high accuracy, sensitivity, and specificity [4]. The deep learning program shows potential for detecting EGC in real clinical settings [4].

FUTURE IMPROVEMENT IN DEEP LEARNING DEVELOPMENT

Because deep learning is a moderately new technology, much of the available research is prospective. These studies attempt to determine if deep learning is a possible approach to reducing endoscopist performance variability. However, most require further research to illustrate how this technology will be used in a clinical setting. For example, most studies involving deep learning systems that were not commercially available and were conducted in highly specialized centers cannot indicate deep learning’s performance for lesion detection in daily clinical practice on different populations around the world [4, 6-7, 10-11]. Additionally, studies need to incorporate a greater patient sample size before they can be generalized to a larger population [7, 8]. Lastly, researchers should still consider endoscopist performance in their trials to explore every option and ensure each patient will get the same treatment no matter who their physician may be or their personal views and acceptance of deep learning technology [4, 5, 8]. These preliminary studies show potential, but the systems need improvement and research before they can be used as standalone options [4, 10-11].

CONCLUSION

Overall, deep learning has demonstrated impressive ability in detecting colorectal and gastrointestinal cancers in experimental trial settings. Deep learning provides a more standardized approach to conducting colonoscopies and endoscopies that may help to homogenize efficient screenings for every patient, regardless of their endoscopist. In colorectal cancers, studies have illustrated increased ADR and decreased AMR using machine learning. In gastrointestinal studies, deep learning has shown its ability to detect cancer just as well as expert endoscopists. Despite these advances, neural networks can only partially improve the cancer detection problem at hand. Even if neural networks improve the overall accuracy and sensitivity of cancer screenings, it will be useless if patients do not get their recommended cancer screenings at the recommended time. At the moment, human intervention is still required, in conjunction with deep learning support, to give the patient their most accurate results. It still needs to be fully understood how deep learning will perform in clinical settings as a secondary tool locally and globally. However, the preliminary studies discussed in this review illustrate promising results for future deep learning applications in revolutionizing oncology today.