Subject Title : Medical decision analytics using health data: application to the lung cancer case study
Supervisors: Vincent Augusto, Xiaolan Xie and Raksmey Phan
Structure: Mines Saint-Etienne, UMR CNRS 6158 LIMOS
Location: 158 cours Fauriel 42023 Saint-Etienne cedex 2 France
Starting date: October 1 st 2017
Data analytics consists in developing optimization and/or machine learning based algorithms that learn
to recognize complex patterns within valuable and massive data. Challenges related to that topic are
numerous, and many scientific fields are involved: computer science, data science, operational
research, process and data mining.
When applied to health-care, the objectives are often summarized as improving quality and timeliness
of care, maximizing financial performance, and decreasing practice variability across organizations. It
relies on the following tasks: (i) identify critical features that impact outcomes (allocation of limited
resources/time for greater effect); (ii) seek greater use of treatment evidence to advance the quality
and effectiveness of care delivery; (iii) rapid learning and best practice dissemination. Process mining
is also a closely connected field of research (van der Aalst 2004).
The main objective of this thesis consists in developing innovative optimization and machine learning
techniques to aid medical decision using available health databases such as PMSI (Programme de
Médicalisation des Systèmes d'Information, national database of hospital stays in France), SNIIRAM
(Système national d'information inter-régimes de l'Assurance maladie, national database of the health
insurance) and local databases of the CLB (Centre Léon Bérard, hospital specialized in cancer treatment
in Lyon, France). In a previous work (Prodel 2017), the application of classification methods such as
decision trees or random forest have proven very effective to predict the clinical pathway of patients
using a set of medical features. The same approach can be applied to the prediction of medical acts
within a hospital stay, for example: "Depending on his/her medical history, can an obese patient having
a severe heart condition have a heart surgery to implant a defibrillator?" or "Depending on the stage
of the lung cancer, what is the treatment among chemotherapy, radiotherapy and surgery that
optimize the survival rate of the patient?" This problem is partly related to the tuning of parameters
used to build decision trees (Camilleri et al. 2014; Coroiu 2016).
Particle swarm optimization techniques for feature selection, coupled with an optimization-based
discriminant analysis model (DAMIP) is an emergent and promising field of research when applied to
identify a classification rule with relatively small subsets of discriminatory factors that can be used to
predict resource needs, outcome for treatment... For example, (Lee et al. 2012) proposed a clinical
decision tool for predicting patient care characteristics and demonstrated that optimization achieve
better result than classical machine learning techniques. Such approach was also used for modeling
and optimizing clinic workflow (Lee et al. 2016).The scientific challenge of this thesis is twofold:
- Propose a theoretical research to develop new algorithms combining optimization and
classification methods applied to medical decision making, taking into account the special
features of the diagnosis related groups in France and in Europe. To do so, we will capitalize
on the short advance the I4S laboratory have on (i) research in the application of optimization
to predict patient clinical pathways by combining data/process mining, operational research
and machine learning and (ii) knowledge of health-care databases in France and in UK (through
the emerging collaboration with NHS and Westminster University, London, UK).
- Develop a comprehensive testbed experiment to assess the proposed algorithms on a lung
cancer case study. Several level of discovery will be used: hospital databases (PMSI), health
insurance databases (PMSI+SNIIRAM) and finally hospital databases (PMSI+SNIIRAM+Centre
Léon Bérard database) to understand the amount of required data needed to aid the medical
decision. To the best of our knowledge such case study has never been investigated in the
National databases such as PMSI and SNIIRAM are already available for such study, and a collaboration
with the Centre Léon Bérard (Lyon, France) is already live to access more detailed data, with the
support of the I-Care cluster (Lyon, France) for dissemination of ongoing research on big data and
The candidate should have strong background in data science (machine learning, process mining, data
mining, artificial intelligence), industrial engineering (formal modelling, flow simulation, performance
evaluation), operational research (mathematic modelling, optimization) and computer science (coding
C/C++/Python, Java, knowledge on mathematical libraries, statistics, optimization).
Send a CV, motivation letter, marks of the 3 previous years (including current one) to email@example.com
in order to schedule an interview.
(Camilleri et al. 2014) M. Camilleri, F. Neri, and M. Papoutsidakis. "An algorithmic approach to parameter selection in machine learning using meta-optimization techniques". WSEAS Transactions on Systems, 13:202–213, 2014.
(Coroiu 2016) A. M. Coroiu. "Tuning model parameters through a genetic algorithm approach." In IEEE
12 th International Conference on Intelligent Computer Communication and Processing (ICCP), pages
135–140, Sept 2016.
(Lee et al. 2012) Lee EK, F Yuan, DA Hirsh, MD Mallory and HK Simon. "A clinical decision tool for
predicting patient care characteristics: patients returning within 72 hours in the emergency department", AMIA Annu Symp Proc., 495-504, 2012
(Lee et al. 2016) Lee EK et al. "Systems Analytics: Modeling and Optimizing Clinic Workflow and Patient
Care", In Healthcare Analytics: From Data to Knowledge to Healthcare Improvement, chapter 9, 2016
(Prodel 2017) M. Prodel. "Process discovery and simulation of clinical pathways using health-care
databases". PhD Thesis, 2017.
(van der Aalst 2004) Wil M van der Aalst. "Workflow mining: Discovering process models from event
logs". Computers in industry, 16:1128–1142, 2004.
* Contributions to be spread via DMANET are submitted to
* Replies to a message carried on DMANET should NOT be
* addressed to DMANET but to the original sender. The
* original sender, however, is invited to prepare an
* update of the replies received and to communicate it
* via DMANET.
* DISCRETE MATHEMATICS AND ALGORITHMS NETWORK (DMANET)