A range of risk factors increases the likelihood of developing chronic illness, awareness, and understanding of these possible causes, give patients the best chance of survival by making informed life choices. Finding patterns amongst risk factors and chronic illness can suggest similar causes and provide guidance information to improve healthy lifestyles, and where outliers appear, gives clues for possible treatments. Prior studies have typically isolated data challenges of single disease datasets, however, to establish a truly healthy lifestyle the predictive feature power of many diseases is more useful. We discuss the 4 most common data challenges in health surveys and propose a novel approach to the selection of features in order to optimize a multi-label classifier of diabetes and 30 types of cancer, in order to establish a total healthy lifestyle. A novel knowledge graph is constructed from the text of health survey questions and used to determine the weight of feature relationships based on World Health Organization (WHO) ICD codes, to prioritize selection. The results of our experiments demonstrate that our Knowledge Graph-based feature selection, when applied to a number of machine learning and deep learning multi-label classifiers, improves precision, recall, and F1 score.
For more information, please contact the Graduate Research School.