Guocai Chen is a PhD student in Computer Science at The University of Auckland, supervised by Professor Jim Warren. At present he is dealing with a data mining problem: using decision forests, and other methods, to automatically infer non-trivial metadata attributes of consumer websites using features based on the values of Hyperspace Analogue to Language models. He is also very interested in image processing and video compression.
Jim Warren
University of Auckland New Zealand
Jim Warren is Professor of Health Informatics and Chief Scientist of the National Institute for Health Innovation at The University of Auckland. He is also Chair of Health Informatics New Zealand, 2008-2010. Professor Warren’s main research interest is around innovative use of IT for chronic condition management, including methods for consumer empowerment and clinical decision support. He is a Foundation Fellow of the Australian College of Health Informatics. Prior to joining The University of Auckland in 2005 he was with the University of South Australia. He received his B.Sc. and Ph.D. from the University of Maryland.
Joanne Evans
Monash University Australia
Joanne Evans is a Research Fellow for the Smart Information Portal (SIP) Project at the Centre for Organisational and Social Informatics, Monash University, where her research interests lie in investigating how the efficiency and effectiveness of the processes for selecting and describing resources for inclusion in a virtual and distributed knowledge repository can be enhanced with intelligent technologies. This builds on her doctoral research in exploring sustainable pathways of recordkeeping metadata creation as part of the Clever Recordkeeping Metadata Project for which she received a Vice Chancellor's Commendation for Doctoral Thesis Excellence in 2008. Joanne is also a Research Fellow at the eScholarship Research Centre at the University of Melbourne where she has been involved in the research and development of archival information systems.
‘Qualities’ not ‘Quality’ – Text Analysis Methods to Classify Consumer Health Websites
Guocai Chen, Jim Warren, Joanne Evans
Abstract
There is an increasing need to help health consumers to achieve timely, differentiated access to quality online healthcare resources. This paper describes and evaluates methods for automated classification of consumer health Web content with respect to qualitative attributes relevant to the preferences of individual health consumers. This is illustrated in the context of identifying breast cancer consumer web pages that are ‘supportive’ versus ‘medical’ perspective, as compared to an existing manual classification employed by a breast cancer portal with personalised search preference options. Classification is performed based on analysis of word co-occurrences and an enhanced decision tree classifier (a decision forest). Current classification test results for ‘medical’ versus ‘supportive’ type resources are 90% accurate (95% confidence interval, 86-94%) using this decision forest classifier. These early results are indicating that language use patterns can be used to automate such classification with acceptable accuracy; however, a wider range of websites and metadata attributes needs to be assessed and compared to end-user feedback. Future application may be either in a tool to facilitate metadata coders in populating the databases of domain-specific portals such as BCKOnline, or in providing tagging or sorting on content type on live search results from health consumers.