Posted: August 26th, 2021

Exploratory Data Analysis on Heart disease

Heart diseases are rapidly increasing in the current world, mainly due to people’s lifestyles. The diseases affect how people go about their healthy life because of the associated complications such as cardiac arrests, which sometimes become fatal. Thus, there is an increasing concern to combat the disease as it affects both genders, and it is costly to treat at an individual level and the hospital besides its cumulative adverse impacts on the overall economy. Several studies have been conducted, and others are ongoing on machine learning models and data analysis on causes and prevention of heart diseases. In this dataset, observations are taken from sample size to present the whole population in a predictive model using exploratory data analysis to draw general conclusions about the disease. The study’s primary objective is to use exploratory data analysis to determine which attributes or variables are more prevalent in heart diseases. Therefore, through this analysis, a conclusion will be drawn on which variables havea stronginfluence on the prevalence of heart diseases.

Data Set

The dataset used in this review is the raw dataset on heart disease with 788 patients and nine features.Some samples are removed in the pre-processing technique due to inconsistency eradicating errors in the whole population. The nine independent variables are cost, age, gender, interventions, drugs, ER-Visit, complications, comorbidities, and duration. In this context, the variables can be described as the cost representing the total dollar amount of claims by subscribers, age is the years of the subscriber, gender is either male or female, and interventions are the total number of procedures carried out person. More so, the drugs are categorized based on the number of prescriptions on each subscriber with 0 representing none, 1 if one, and two if more than one. ER-visits are the number of emergency room visits. Complications are whether the patient experienced any problems with 1 representing yes and 0 if none. Finally, comorbidities are the total number of diseases a patient had, and duration is the period of treatment condition.

The sample data set has nine columns and 788 observations, and no missing figures were detected in the analysis. The nine independent variables are represented numerically and continuously with clear categories as shown in Table 1 below;

Table 1: Nine independent variables

Cost 179.1 319 9310.7 280.9 18727.1 453.4 323.1
Age 63 59 62 60 55 66  64
Gender Female Female Female Male Female Female Male
Interventions 2 2 17 9 5 1 2
Drugs 1 0 0 0 2 0 0
ERVisit 4 6 2 7 7 3 3
Complications 0 0 0 0 0 0 0
Comorbidities 3 0 5 2 0 4 1
Duration 300 120 353 352 18 296 247

The dataset’s primary focus is to quantify the independent variables to develop exploratory data analysis on heart disease. Therefore, descriptive statistics were calculated from the dataset to help in the study of developing complications among heart disease patients. The tables 2 below shows descriptive statistics with the patients have difficulties, and no complications from the sample size of 788 persons.

Table 2: Descriptive statistics of patients with complications

Complications 0 1
Mean 3.709 4.767
Median 1.000 2.000
Standard deviation 5.929 6.316
Skewness 3.123 1.445
Max 60.00 23.00
Min 0.000 0.00
N 745 43

The research questions in the research scenarios below will include the correlation of age, gender, and complication developed to the patient with heart disease and will be analyzed further to draw crucial conclusions from the data. The independent variable age and gender are considered the highest risk factors for heart disease, and females are more prone to heart disease than males.

Research Scenarios

The three research scenarios developed in the study is the correlation of the patient’s age, gender, and the possibility of developing complications to the subscriber when one is suffering from heart disease.

Scenario 1- Effects of the Subscriber’s Gender to Heart Disease

The patient’s gender is considered a significant risk factor in acquiring heart diseases. From the sample population developed, female is at a higher risk to develop heart disease more than their male counterparts. In a dataset of 788 patients, the majority population is female, with 608 patients and only 180 male patients. Table 3 below shows the number of females patients versus males in the population. 

     Table 3: Male versus female patients

Gender No. of patients Percentage
Male 180 22.84%
Female 608 77.16%
Total 788 100%

Therefore, the number of female patients surpasses the number of male patients as the female subscribers are 77.16%, with only 22.84% of male patients.

Scenario 2- The Possibility of Developing Complications in Heart Disease

There is a high risk of developing complications for patients suffering from heart diseases. As shown in Table 4 below, the descriptive statistics of patients with complications represented by 1 are higher than those with no difficulties denoted by 0. The mean of patients with complications in heart disease is 4.767 and with no complications are 3.709, and a standard deviation of 6.316 and 5.929 with and without complications, respectively.

Table 4: Descriptive statistics of developing complications in heart disease

Complications Patients with No Difficulties (0) Patients with Complications (1)
Mean 3.709 4.767
Median 1.000 2.000
Standard deviation 5.929 6.316
Skewness 3.123 1.445
Max 60.000 23.000
Min 0.000 0.000
N 745 43

Therefore, there is a higher possibility of developing complications when a person is suffering from heart disease than when one is not.  Furthermore, females in the gender category are at a higher risk of developing complications from heart diseases than males. The box plot below in figure 2 shows that the distribution of females is more elevated than males.

Figure 1: Box plot relationship of gender, comorbidities, and complications

From the box plot above, females are more likely to develop other related diseases and more complications when suffering from heart disease. Besides, it can be noted that the median of the female is slightly higher than the male indicating a comparative behavior of the gender from the computed mean. However, it can also be observed that there are high outliers on females’ comorbidities with a rate of about 60. This implies that females are more likely to develop comorbidities from heart diseases. Therefore, although there is a small variance in the medians and the means of males and females in the sample size, the difference is significantly vast, implying there is a substantial relationship between the patient’s gender and developing complications when suffering from heart diseases. Thus, the box plot graph is different for both genders.

In the sample population, 43 patients will develop complications when suffering from heart diseases, while 745 will not acquire any difficulties. Additionally, out of the 43 patients, 32 are females, with only 11 males who will develop complications. Table 5 below indicates the number of male versus female who is likely to develop complications.

Table 5: Number of male and females with and without complications

Gender 0 1 Percentage of 1 Total
Male 576 32 74.42% 608
Female 169 11 25.58% 180
Total 745 43 100% 788

As indicated in Table 4, female patients have a high risk of 74.42% of developing complications than 25.58% of male counterparts. Hence, from the box plot, Tables 4 and 5, we can conclude that female patient is more likely to suffer from heart disease and develop complications.

Scenario 3- Effects of Drug Subscriptions on Heart Disease Patients

The scenario examines the effect of drug use on heart disease.  The drugs are categorized based on the number of prescriptions on each subscriber with 0 representing none, 1 if one, and two if more than one. Table 6 below shows the number of subscribers against the patients suffering from heart diseases. Besides, there is a low possibility that patients suffering from heart disease will develop any complications when taking drugs. The bar chart in Figure 3 below shows that most patients on drugs do not have any difficulties. From the chart, 0 denotes no complications, one mild complication, and two severe complications.

Figure 2: Complications on given drugs

From the box plot, majority of the patients do not develop any difficulties when they are taking drugs while suffering from heart disease. However, as demonstrated in an increase in drugs subscribed to patient lowers the relative effect on the patient as age advances. In this case, as age of heart disease patients advances, any unit increase in drug subscription has a diminishing influence on improving patient’s health.

Figure 3: Effects of drugs on patient’s age

Figure 3 shows the relationship between age and number of subscriptions made to heart disease patients.


From the analysis that focused on exploratory data analysis on heart disease, with a sample of 788 patients and nine independent variables, the correlation of the drug subscription, complications, and gender were examined. There is a weak relationship between drugs subscription and heart disease. The analysis reports that suffering from heart disease have low chances of developing complications when they are subjected to drugs. This is demonstrated in the low number of difficulties that such patients reported. However, further examination of effect of drug on patients based on age differences, it is established that increase in the number of subscription inversely affects the response of patients with heart disease. Thus, the analysis means that age and drugs are among the risk factors that are critical when dealing with heart diseases.

Subsequently, the study shows that a person’s gender determines if they will acquire heart diseases as there is a strong correlation between gender and heart diseases. The majority of the patients are females, with 77.16% of the sample population and male comprise of 22.84%. In the sampled population, the females are 608 patients with only 180 male subscribers. Finally, there is an inferior relationship of patients suffering from heart disease developing complications and comorbidities. From the sample population of 788 patients, only 43 patients developed complications, while 745 did not have any difficulties. However, most of the patients with complications were females, as they formed 74.42%, with male patients having 25.58% of the patients with complications. Hence, it can be noted that aged females are at a higher risk of suffering from heart disease and developing complications from the disease. The analysis can help individuals mitigate the risk of acquiring heart diseases, especially from their lifestyles, and healthcare providers understand the current trends in heart diseases.

