Google AI System Outperforms General Pathologists in Grading Prostate Cancers, Study Finds

Google AI System Outperforms General Pathologists in Grading Prostate Cancers, Study Finds

A novel artificial intelligence (AI) technology developed by Google is better at grading prostate cancer biopsies than general pathologists and is as good at identifying cancerous samples, a new study reports.

The study, “Development and Validation of a Deep Learning Algorithm for Gleason Grading of Prostate Cancer From Biopsy Specimens,” was published in JAMA Oncology. The study was funded by Google.

Prostate biopsies involve removing cells from the prostate for evaluation under a microscope; this procedure is crucial for making accurate diagnoses and prognoses of prostate cancer. Traditionally, biopsies have been evaluated by specialist pathologists. However, there are drawbacks to having human beings do this evaluation, including cost, time, person-to-person variation, and human error.

Using computer programs to assess prostate biopsies may help mitigate or overcome some of these drawbacks.

In the new study, investigators at Google Health, working together with colleagues at institutions in the U.S. and Canada, developed a deep learning system (DLS), a form of artificial intelligence algorithm with the ability to learn from new data, to evaluate prostate cancer biopsies.

Basically, the tool was created by giving a collection of previously-scored biopsies to a computer, then allowing the computer to develop algorithms to sort through the biopsies based on the known scores. Then, the computer can use these algorithms to score other biopsies.

In the study, the DLS was compared against 19 general pathologists, with an average of 25 years of experience, in evaluating whether 752 biopsy images qualified as prostate cancer.

Of note, a general pathologist typically would not be called upon to do this kind of reading. Instead, it would be done by a pathologist who specializes in urology. As such, scores from at least two specialist pathologists were used as the “true” value against which to compare both the DLS and the general pathologists.

For tumor detection, the rate of agreement with specialists was similar for the DLS and the general pathologists: 94.3% vs. 94.7%. The sensitivity for the DLS was 95.5%, meaning that this system correctly identified 95.5% of positive biopsies, while that of the general pathologists was 92.8%.

The respective specificities, or the rate of negative biopsies that were identified as such, were 91.7% and 97%. This meant that the DLS had more false positives.

“The DLS showed similar overall tumor detection rates compared with general pathologists, by catching more cases of tumor than general pathologists at the cost of some false-positives,” the researchers wrote. “This trade-off suggests that the DLS could help alert pathologists to tumors that may otherwise be missed while relying on pathologist judgment to overrule false-positive categorizations.”

The researchers also compared the DLS and general pathologists in their ability to accurately determine Gleason grading. Simplistically, a higher Gleason grade is indicative of more “cancer-like” cell features, such as cells that are less well differentiated.

With 498 images analyzed, the rate of agreement between the DLS and the specialty pathologists for Gleason grading was 71.7%, which was significantly higher than that of the general pathologists (58%). Notably, the DLS outperformed the general pathologists at differentiating between lower-grade biopsies.

“The DLS showed better agreement rates with subspecialists than pathologists did for Gleason pattern quantitation, which is an important prognostic signal and independent predictor of biochemical recurrence,” the researchers wrote.

Collectively, these data indicate that the DLS performs just as well as generalists at identifying prostate tumors, and it outperforms general pathologists at grading them.

“A DLS such as this could therefore create efficiencies for health care systems by improving consistency of grading, reducing the consultation-associated costs and turnaround delays, and potentially decreasing treatment-related morbidity for men with low-risk disease,” the researchers wrote.

“In particular, the DLS was substantially less likely to overgrade … while being slightly more likely to undergrade cases than general pathologists,” they wrote.

The findings suggest that DLS could help identify low-risk cases for conservative management.

“The exact implementation and benefit of using such a tool remains to be determined but must be guided by prospective validation studies that examine the influence on diagnostic reporting and patient outcomes,” the researchers added.