Structured Pathology Reporting for Cancer from Free Text: Lung Cancer Case Study

Anthony Nguyen, Michael Lawley, David Hansen, Shoni Colquist


Objective: To automatically generate structured reports for cancer, including TNM (Tumour-Node-Metastases) staging information, from free-text (non-structured) pathology reports.
Method: A symbolic rule-based classification approach was proposed to identify symbols (or clinical concepts) in free-text reports that were subsumed by items specified in a structured report. Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT) was used as a base ontology to provide the semantics and relationships between concepts for subsumption querying. Synthesised values from the structured report such as TNM stages were also classified by building logic from relevant structured report items. The College of American Pathologists’ (CAP) surgical lung resection cancer checklist was used to demonstrate the methodology.
Results: Checklist items were identified in the free text report and used for structured reporting. The synthesised TNM staging values classified by the system were evaluated against explicitly mentioned TNM stages from 487 reports and achieved an overall accuracy of 78%, 89% and 95% for T, N and M stages respectively.
Conclusion: A system to generate structured cancer case reports from free-text pathology reports using symbolic rule-based classification techniques was developed and shows promise. The approach can be easily adapted for other cancer case structured reports.


Cancer Staging; Information Extraction; Lung Cancer; Natural Language Processing; Synoptic Reporting, Systematized Nomenclature of Medicine

Full Text:


= = = eJHI - electronic Journal of Health Informatics - ISSN 1446-4381 = = =