Reinforcement learning model optimizes brain cancer treatment, reduces dosing cycles and improves patient quality of life

Researchers at MIT have come up with an intriguing approach to combat ‘Glioblastoma’- a malignant tumor of the brain/spinal cord- using machine learning techniques. By reducing the toxic chemotherapy and radiotherapy that is involved in treating this cancer, the researchers aim to improve the quality of life for patients, while also reducing the various side effects caused by the former using Reinforcement learning techniques.

While the prognosis for adults is no more than 5 years, medical professionals try to shrink the tumor by administering drug doses in safe amounts. However, the pharmaceuticals are so strong that patients end up suffering from their side effects.

Enter Machine Learning and Artificial Intelligence to save the day. While it's no hidden truth that machine learning is being incorporated into healthcare on a huge scale, the MIT researchers have taken this to the next level.

Using Reinforcement Learning as the Big Idea to train the model

Media Lab researcher Gregory Yauney will be presenting a paper next week at the 2018 Machine Learning for Healthcare conference at Stanford University. This paper details how the MIT Media Lab researchers have come up with a model that could make dosing cycles less toxic but still effective. Incorporating a “self-learning” machine-learning technique, the model studies treatment regimens being used presently, and iteratively changes the measurements.

In the end, it finds an ideal treatment design suited to the patient. This has proven to reduce the tumor sizes to a degree almost identical to that of original medical regimens.

The model simulated trials of 50 patients and designed treatments that either reduced dosages to twice a year or skipped them all together. This was done keeping in mind that the model has to shrink the size of the tumor but at the same time ensuring that reduced dosages did not lead to harmful side effects.

The model is designed to used reinforced learning (RL)- that comprises artificially intelligent “agents” that complete “actions” in an unpredictable, complex environment to reach the desired outcome.

The model’s agent goes through traditionally administered regimens. It uses a combination of the drugs temozolomide (TMZ) and procarbazine, lomustine, and vincristine (PVC), administered to the patients over weeks or months. These regimens are based on protocols that have been used clinically for ages and are based on both, animal testing and various clinical tests and scenarios. The protocols are then used by Oncologists to predict how many doses the patients have to be administered based on weight.

As the model explores the regimen, it decides on one of the two actions-

Initiate a dose

Withhold a dose

If it does administer a dose, it has to make the decision if the patient needs the entire dose, or only a portion.
After a decision is taken, the model checks with another clinical model to see if the tumor’s size has changed or if it’s still the same. If the tumor’s size has reduced, the model receives a reward else it is penalised. Rewards and penalties essentially are positive and negative numbers, say +1 or – 1.

The researchers also had to ensure that the model does not over-dose or give out the maximum number of doses to reduce the mean diameter of the tumor. Therefore, the model is programmed in such a way that whenever it chooses to administer all full doses, it gets penalized. Thus the model is forced to administer fewer, smaller doses.

Patik Shah, a principal investigator at the Media Lab who supervised this research, further stresses on the fact that, as compared to traditional RL models that work toward a single outcome, such as winning a game, and take any and all actions that maximize that outcome, the model implemented by the MIT researchers is a “unorthodox RL model that weighs potential negative consequences of actions (doses) against an outcome (tumor reduction)”
The model is strikingly wired to find a dose that does not necessarily maximize tumor reduction, but also establishes a perfect balance between maximum tumor reduction and low toxicity for the patients.

The training and testing methodology used

The model was trained on 50 simulated patients - randomly selected from a large database of glioblastoma patients. These patients had previously undergone traditional treatments. The model conducted about 20,000 trial-and-error test runs for every patient. Once training was complete, the model understood the parameters for optimal regimens.

The model was then tested on 50 new simulated patients and used the above-learned parameters to formulate new regimens based on various constraints that the researchers provided.

The models treatment regimen was compared to the results of a conventional regimen using both TMZ and PVC. The outcome obtained was practically similar to the results obtained after the human counterparts administered treatments.

The model was also able to treat each patient individually, as well as in a single cohort, and achieved similar results (medical data for each patient was available to the researchers).
In short, the model has helped to generate precision medicine-based treatments by conducting one-person trials using unorthodox machine-learning architectures.
Nicholas J. Schork, a professor and director of human biology at the J. Craig Venter Institute, and an expert in clinical trial design explains “Humans don’t have the in-depth perception that a machine looking at tons of data has, so the human process is slow, tedious, and inexact,” he further adds “Here, you’re just letting a computer look for patterns in the data, which would take forever for a human to sift through, and use those patterns to find optimal doses.”

To sum it all up, Machine learning is again proving to be an essential asset in the medical field- helping both researchers as well as patients to view medical treatments in an all new perspective.

If you would like to know more about the progress done so far, head over to MIIT news.

23andMe shares 5mn client genetic data with GSK for drug target discovery

Machine learning for genomics is bridging the gap between research and clinical trials

6 use cases of Machine Learning in Healthcare