Data sharing, when combined with a crowdsourcing challenge, can be a robust and powerful framework to develop new prognostic models for advanced prostate cancers, according to a study of a competition that resulted in new biomarkers of progression in metastatic and castration-resistant cancers.
The study, “Prediction of overall survival for patients with metastatic castration-resistant prostate cancer: development of a prognostic model through a crowdsourced challenge with open clinical trial data,” was published in the journal Lancet Oncology.
A competition called the Prostate Cancer DREAM (Dialogue for Reverse Engineering Assessments and Methods) Challenge — organized by Project Data Sphere, the Prostate Cancer Foundation, and Sage Bionetworks — aimed to address clinically relevant research questions regarding patients with metastatic castration-resistant prostate cancer (mCRPC). It explored new research approaches using the Project Data Sphere platform, an online service sharing patient-level data from multiple Phase 3 trials for consideration and analysis, with the goal of advancing cancer research.
In the DREAM challenge, 50 teams created models of mCRPC risk factors using data hosted on the Project Data Sphere platform. The winning model, by researchers at the Institute for Molecular Medicine Finland (FIMM) at the University of Helsinki and the University of Turku (UTU), ably predicts patient outcomes that could lead to improved clinical trial design and treatment options.
“Analyses of PDS-shared patient data using machine learning models led to the identification of biomarker combinations that accurately predict how a patient’s disease will progress,” Tero Aittokallio, group leader at FIMM and professor in the Department of Mathematics and Statistics at UTU, said in a press release. “In addition to immune system biomarkers and renal and hepatic function, our algorithm identified an under-reported cancer biomarker, aspartate aminotransferase, as an important factor in making prognoses.”
These findings show that data mining (the analysis step of the “knowledge discovery in databases” process) can provide useful and new insights about disease from patterns in patients’ clinical data.
“The fact that we were able to gain such deep insights from clinical studies that concluded years ago shows how important it is for scientists in industry, government and academia to share clinical trial data on an ongoing basis,” said Dr. Justin Guinney, director of Computational Oncology at Sage Bionetworks and a Challenge co-director
The Project Data Sphere datasets included over 150 variables for use in the modeling, including patient demographics, medical history, laboratory results, lesion volume, and previous treatments. The teams trained their model by accessing three clinical trial datasets, and tested and validated their solution using a version of the fourth trial dataset.
Project Data Sphere is an independent, nonprofit initiative of the Life Sciences Consortium of the CEO Roundtable on Cancer, established by President George H.W. Bush.