Summary
In this chapter, we built the summarization application for medical transcriptions. In the beginning, we listed the challenges in order to generate a good parallel corpus for the summarization task in the medical domain. After that, for our baseline approach, we used the already available Python libraries, such as PyTeaser
and Sumy
. In the revised approach, we used word frequencies to generate the summary of the medical document. In the best possible approach, we combined the word frequency-based approach and the ranking mechanism in order to generate a summary for medical notes.
In the end, we developed a solution, where we used Amazon's review dataset, which is the parallel corpus for the summarization task, and we built the deep learning-based model for summarization. I would recommend that researchers, community members, and everyone else come forward to build high-quality datasets that can be used for building some great data science applications for the health and medical domains...