Automatic Structured Reporting from Narrative Cancer Pathology Reports

Ying Ou, Jon Patrick

Abstract


Objective: To extract pertinent information from narrative pathology reports and automatically populate structured templates.

Materials and methods: A processing pipeline system has been developed which consists of: supervised machine learning based approach with conditional random field learner used for medical entity recognition, and rule-based methods for the population of structured templates. In total 612 narrative pathology reports of colorectal cancer were collected for evaluation.

Results: The best model of the medical entity recognition experiments with 10- fold cross-validation on the training set achieved the micro-averaged precision with 80.58%, recall with 76.33% and F-score with 78.40%. The overall micro-averaged precision, recall and F-score of end-to-end evaluation on the test set are 85.18%, 78.75% and 81.84% respectively.

Discussion: Our study shows that it is feasible to automatically populate structured reports by using a cascaded approach that integrates machine learning and several rule-based methods. It also reveals that the rules designed for structured template population are competent to populate the structured outputs and incorrect results from medical entity recognition such as the low recall on De:Mesorectal Integrity are the major cause of the errors (over 80%).

Conclusion: With further improvement (especially for medical entity recognition), the system can contribute to a higher quality of pathology reporting and improve the efficiency for cancer registries, clinical audits and epidemiology research.


Keywords


Medical Entity Recognition; Structured Template Population; Automatic Structured Re- porting; Pathology Reports

Full Text:

PDF




::::::::::::::  eJHI - electronic Journal of Health Informatics - ISSN 1446-4381  ::::::::::::::

                                     Privacy Statement - Uptime