Predicting A&E Attendance - Retrospective Thoughts on an AI & ML Course

Introduction

6 months ago and in a fit of madness I decided to enrol in an AI and ML course from Imperial College Business School (Professional Certificate in Machine Learning & Artificial Intelligence). The course is very fast paced and took out 10-20 hours a week of my already busy schedule but was incredibly useful in taking a step into an area everyone is talking about and sated the FOMO ghost which had been haunting me until this point. I came back realising that there is so much more to AI than just LLMs (Large Language Model). In fact we didn’t do LLMs at all stopping on Convoluted Neural Networks. As a medic I was also fascinated how now and in the future the more advanced topics look to mimic neural pathways of the brain and visual cortex trying to emulated the complex journey of converting an image into a recognisable entity or reasoning with pure maths and code. I also realised how useful domain knowledge is on solving problems with AI and ML and hope to put this to good use in the future.

Pre-Requisites

The course required knowledge of maths in the form of linear algebra and calculus. In the end although I brushed up on this with the help of my son who is a university student the other requirement which was coding I felt was more important. You really need to understand at least the fundamentals of coding and it’s all in python and jupyter notebooks using numpy and pandas at its core. I had to take a C# to Python course before but python has a simpler syntax. I recommend https://www.youtube.com/@3blue1brown as as primer for the maths side. The mathematic notations although simple really got me but once you got your head around them they seemed ok (at times!). The beauty of AI course is that they really helped you with the maths and made the concepts really clear a lot of which was from first principles which was a great relief.

Topics Covered with brief explanations

The scope of the course covered both supervised and unsupervised learning with statistical analysis at it’s core. You could call an aspect of ML “Stats on Steroids”. The core to a lot of basic concepts is understanding the relationship between inputs and output via a function with the addition of a noise or unknown entity. You then take your dataset and split into training data, testing data and possibly validation data to see if your model fits. The following topics were covered

Bias-Variance Trade-off - Balancing accuracy and overfitting in predictive models.
Nearest Neighbour Methods - Predicts based on closest data points.
Decision Trees - Uses branching questions to classify or predict with eg Random Forests. Gradient Boosting to sequentially refine weak models to reduce errors each round. Viewing classification problems with Confusion Matrices
Naïve Bayes - Treats each feature separately and uses probabilities
Bayesian Optimization - Efficiently searches for optimal model parameters via a “Black Box” analogy.
Logistic Regression - Estimates probability for yes/no outcomes.
Support Vector Machines - Draws an optimal boundary separating data classes.
Principal Component Analysis - Reduces data dimensions by identifying key directions
Deep Learning and Neural Networks - Layered "brain-like" systems for complex patterns.
Reinforcement Learning - Learns actions through trial-and-error rewards.

When you add classification predictors (yes/no) vs regression (continuous values) it gets complex on which type of “tool” best fits which type of problem. I get the impression that will come with experience, trial and error and trying different methods for the same problem for you to get the best fit.

Is this Course for You?

You probably know by now if this course is for you or not. For me it gave a good introduction into each aspect explaining concepts and principles clearly and to the point. The material we went over was extensive and lays the ground to help open my aspects of analysis not possible before. You still have to develop and learn more in these areas but I have example code and principles under my belt. It treated you like an adult and appreciated we all have busy lives yet also pushed you. If any of the above interests you I’d very much recommend it.

Anyway over to my project. We had to do a project independently at the end preferably in our own domain knowledge. I chose A&E attendance (not admissions) which has always been a topical issue within the health ecosystem and represents the interface between primary and secondary care.

My profile project - Predicting A&E Attendance from Primary Care Data

GitHub link to A&E Project


This is a link to the project with source code. I was unable to upload patient data even though it was anonymised as it is from my surgery only.

This is a link to the jupyter notebook used for the project if you’d like to go over the code base

This project investigates factors predicting A&E (Accident & Emergency) attendances in the UK using a dataset of 4,495 patients from a single practice. It includes demographic, health, and deprivation data, alongside A&E attendance records over three years. The aim is to identify links between primary care factors (e.g., long-term conditions, deprivation index) and A&E visits to improve resource allocation and preventive care. Mental health and access to primary care are highlighted for future exploration. The dataset is anonymised to comply with data protection laws, with a focus on avoiding stigmatisation or overgeneralisation in its application.

The dataset was created

  • by South West London BI service who securely emailed attendances to A&E over the last 3 years of patients from my surgery only

  • A data analysis tool I use in my surgery to risk stratify and link population groups to deprivation and population health metrics.

From these 2 files I had to cleanse the data, manage anomalies and outliers and make sure the information mapped. I also anonymised the data source here. This resulted in a csv with the following (inputs followed by outputs). I partial used pandas to help.

Inputs (data required to make a decision on attendance)

Age, Is BAME, Conditions, Condition Count, Longterm Conditions, Longterm Condition Count, AF, ASTHMA, BP, CANCER, CHD, CHOLESTEROL, CKD, COPD, DEMENTIA, DM, HF, MH, NDH, PAD, PALLIATIVE, PLD, STROKE, THYROID, WEIGHTMX, Deprivation, Index, Health Index, PHM Level, All RST Cond. + PHM Level, All RST Cond. Level, Is Housebound

Outputs (if that patient went to A+E and how often did they attend)

A+E Frequency A+E Attendance

After importing the cleaned dataset, the project is divided into 3 sections
1. Exploratory Data Analysis
2. Logistical Regression to visualise relationships for binary classification and see which features influence A&E Attendance
3. Random Forests to establish if there is non-linear relationship between input and outputs to see how much each factor contributes to A&E Visits

I settled on Random Forest with Oversampling. This gave me the best outcome on the most important aspect which is predicting A+E Attendance.

Exploratory Data Analysis

The first task is to work out if there any correlation at all between predictors/inputs and A&E attendance otherwise you wouldn’t get a good model if you used all the values in your predications. I used a heatmap for this initally

It looks like patients with more long term conditions (0.12) and a low deprivation index (0.08) tend to visit A&E more but this is weak and might be diluted by the list size

Let’s see which categories have the strongest relationship with A&E attendance that is if they attend or not

So we are in an exploratory phase here, slicing and dicing the data until we end up with something which makes sense. This is why domain expert knowledge is important when analysing the data. For example in the above table why should patients who have a high cholesterol be a predictor for A&E Attendance. I put it down to a spurious reading as it doesn’t make sense but it might be something to explore further especially if we find the same if we do analysis in other surgeries.

The most interesting analysis of this phase was the following

Looks like being housebound and having co-morbidities are the top two predictors of frequency of A&E admissions.Also Mental Health is high for frequency which goes in light with High Intensity Users.

Based on the above the following predictors were chosen to run a Logistic Regression on as these values from the data make sense to be areas to focus on and are backed up with evidence elsewhere too

  • Housebound

  • Mental Health

  • AF

  • COPD

  • HF

  • Health Index

  • Number of Long Term Conditions

  • Age

After a bit of further analysis, overfitting, threshold adjustment to 0.3, trial for SVG Boost which didn’t work out I settled on the following confusion matrix. See this link for more details

So what does this all mean? Well if I have a spreadsheet of patients with the following columns as yes/no (Housebound, Mental Health, AF, COPD, HF, Deprivation Health Index) with their age and number of long term conditions we can push them through the model then can predict with 69% accuracy if they are at risk of attending A&E. However there is also a 40% chance that the model will pick a patient who it thinks are at risk of attending when they are not (False Positives or worried well)

From my point of view it’s a start but there is so much more which determines if patients attend A&E rather than get admitted to hospital. Factors such as access to primary care and perception and trust of the patient with their own family doctor are important too. So to improve on this figure we’ll need data from other surgeries and also more data around access and trust to see if we can improve on true positives and reduce the false positives further.

What use would this have in the real world? Currently we tend to triage each patient the same when they come through the front door. If we knew their risk of attending A+E we could prioritise them to avoid attendance which might ultimately lead to admission.

The AI Bandwagon

The new shiny object not only in health care but around all areas of software development is AI. As usual health care in the UK is behind the game. Sometimes I feel we are joining the AI Bandwagon to head west to find gold in them there hills when we need to see where it best fits. Outside the NHS we are talking about AI Agents which are autonomous bots who self reason to get the job done and in the NHS we are still thinking about the risks adopting basic LLM integration in the form of AI Scribes being rightfully held back by clinical safety and IG. It is also more do to with the reluctance in uptake in the digital space so ubiquitous in our world. I love AI Scribes and they are a good example of a use case which will showcase how AI works well but we are also in a position where there are so many AI scribes out there all doing similar things. What will make them sink or swim ironically isn’t the product; it’s the marketing, pitching and selling.

Just like when we went to telephone consultations from face to face only everyone was in uproar around how unsafe it was and now we are in a situation that we are also conversing with our patients via text or emails, and it’s the same with AI. We all have a reluctance to adopt new technologies and need it to be forced on us or to develop trust that the new system will work and this will take time as it always does with adoptions of new digital technologies to the end user in health care. Ironically the main accelerator of digital adoption in the last few years has been covid which has nothing to do with digital perceptions.

There is always a perception that you just slap the word .ai at the end and it will sell. As you can appreciate from this blog AI and ML is not just self driving cars and AI scribes. There is a lot more to how we can use this technology and it’s also been around for longer than you think with established best practices. As always it’s important to be pragmatic and use the right technology for the right job whether it’s patient facing or data analysis within confined projects with outcomes which actually make sense.

Summary

I hope this blog gives some indication around the power and usefulness of AI and ML as a tool and my experience from the Professional Certification Course I attended. The question you should ask and I do frequently is why bother doing this. There are bigger fish in the sea who know more and more importantly can spend more time that we ever could as doctors creating better models with more experience and that is true. If you are a medic and have read this far you have to ask yourself if you want to stick to your domain expertise and let others do a better job or dip your toes into their world to help you interface better with experts in the field. I have no idea where this certification will take me but I do know I’ll enjoy whatever I do and that’s the most important thing!

Next
Next

CVDACTION Reporting Dashboards