At our institution, we take immense pride in bringing innovation that addresses real-world challenges through technology-driven solutions. One of the most pressing global health concerns today is heart disease, which remains a leading cause of mortality worldwide.
Early diagnosis is crucial, yet traditional diagnostic methods can be time-consuming and resource-intensive, often delaying life-saving interventions. Recognizing this critical need, a team of our students has developed an advanced Predictive Analytics Framework for Heart Disease Detection using Machine Learning—a solution with the potential to transform early detection in healthcare.
As part of the Global Immersion Programme at the National University of Singapore (NUS) – ranked among the top eight universities worldwide and the leading institution in Asia – our students had an incredible opportunity to explore Artificial Intelligence (AI), Deep Learning, and Machine Learning to derive insights into the business industry.
Out of 55 students, a mix of III-year and II-year Data Science students were divided into teams, each showcasing their analytics, machine learning, and data science skills by applying their learning at NUS to solve a real-world problem using machine learning predictions.
Among the 11 participating teams, Jeyanth’s team (III – Data Science) along with team members Akshay Roopan, Anamika, Dharshinisrii, and Anjana – won the prestigious title of “Best Project Work for Application of AI Concepts” for their innovative project. Their achievement highlights the exceptional talent and problem-solving abilities nurtured at our institution.
Let’s delve into their problem statement and explore the solution developed by their team.
Heart disease is often referred to as a silent killer because it can progress without noticeable symptoms, leaving many patients undiagnosed until it is too late. Conventional diagnostic methods, while effective, require significant time and medical resources. Machine learning, however, presents an opportunity to revolutionize this process by analyzing multiple patients features simultaneously, enabling faster and more accurate predictions.
With this goal in mind, our students built a machine learning model that analyzes 13 key medical indicators – including age, chest pain type, cholesterol levels, and resting blood pressure – to predict the likelihood of heart disease. The dataset for this project contains real patient data with an indicator variable distinguishing between the presence and absence of heart disease.
The challenge was not just to create a highly accurate model but also to ensure its interpretability so that medical professionals could effectively use it. Machine learning models are often viewed as black boxes, but by choosing the right algorithm and data preprocessing techniques, the team worked towards making this tool accessible and practical for real-world healthcare applications.
To address this challenge, the team selected the Decision Tree Classifier, a well-known machine learning algorithm for classification tasks. One of the key advantages of decision trees is their interpretability – medical professionals can visually understand the decision-making process, making it easier to integrate into clinical practice.
Before training the model, extensive data preprocessing was conducted to ensure reliability. This involved:
One of the most significant challenges was dealing with an imbalanced dataset, where cases of heart disease were more frequent than non-heart disease cases. This imbalance posed a risk of the model disproportionately predicting heart disease, leading to inaccurate results. To address this, the team employed SMOTE (Synthetic Minority Over-sampling Technique), which generates synthetic samples for the minority class (patients without heart disease). This approach enhanced the model’s ability to recognize both conditions effectively.
To measure the model’s effectiveness, the team used multiple evaluation metrics, including accuracy, precision, recall, and F1-score. Initially, the model was trained on the unbalanced dataset, and then its performance was compared to the SMOTE-enhanced version.
These results underscore the importance of data balancing techniques in medical prediction models. A falsely predicted heart disease diagnosis can lead to unnecessary stress and medical tests, while a missed diagnosis can delay critical treatment. By incorporating SMOTE, the students significantly improved the model’s ability to differentiate between patients with and without heart disease, ensuring a more reliable predictive tool.
Another key challenge was avoiding overfitting, where the model performs exceptionally well on training data but poorly on new cases. Initially, the decision tree became too complex, capturing noise rather than meaningful patterns. By fine-tuning hyperparameters, such as limiting the tree’s depth, the team was able to strike a balance between accuracy and generalizability.
Throughout the development of this project, the students encountered several technical and analytical challenges. The most significant of these was handling data imbalance—without proper adjustments, the model leaned towards predicting heart disease excessively, leading to misleading accuracy scores. Applying SMOTE was a pivotal step in rectifying this issue.
Another key takeaway was the importance of feature selection. While some features, like chest pain type, were expected to be strong predictors, others, such as maximum heart rate, emerged as equally significant through exploratory data analysis. This reinforced the value of data-driven insights in machine learning, where sometimes unexpected variables play crucial roles in predictive models.
This project exemplifies how machine learning can revolutionize healthcare, particularly in the early detection of life-threatening diseases. By integrating techniques such as decision trees and SMOTE, the students have demonstrated the potential to enhance diagnostic accuracy and improve patient outcomes.
Looking ahead, the team plans to:
As technology continues to evolve, the intersection of machine learning and healthcare will open new doors for proactive, personalized, and data-driven medical care. This project stands as a testament to our institution’s commitment to fostering innovation, problem-solving, and industry-relevant learning among our students.
By empowering the next generation of researchers and professionals, we continue to drive forward-thinking solutions that have the potential to save lives and redefine healthcare as we know it.
– Dr Sindhana Devi M
Assistant Professor, School of Data Science, KCLAS
Passionate about mentoring young innovators and guiding them toward impactful solutions, this blog highlights our students’ journey in using predictive analytics for heart disease detection, showcasing the power of technology in transforming healthcare.
– Jeyanth A & Team
School of Data Science, KCLAS
A dedicated group of data science students committed in solving real-world challenges. Their journey through the Global Immersion Programme at NUS led them to address a critical health issue – Heart Disease Detection. Through their collaborative efforts, they developed an innovative Predictive Analytics Framework using machine learning to improve early diagnosis and healthcare outcome