Diabetes in the United States

      More than 29 million people in the United States currently have diabetes and one in four people are unaware that they have the disease. Another one third of adults have prediabetes where they have higher than normal blood sugar levels. Every year in the United States there are around 1.4 million new cases diagnosed (Santos-Longhurst, 2018).
      Diabetes and specifically type 2 diabetes is a chronic condition that alters the way one’s body metabolizes sugar (glucose). This is a problem because this sugar is an important source of fuel for your body. Diabetes can affect your major organs, including your heart, blood vessels, nerves, eyes and kidneys (Mayo Clinic, 2018).
      In this report, we are focusing on each state in the United States and their diabetes percentage. Through data analysis, we want to compare trends of each state through factors that we think contribute to increased diabetes. These factors are obesity percentage, physical activity, number of fast food restaurants, and fruits and vegetable consumption. We found four datasets about each factor along with our dataset on diabetes. Originally, these datasets had multiple columns that were unnecessary to our analysis. We cleaned the data to have two columns called region and value. Region was each state and value was the percentage of the factor associated to the state. We then merged them based on the states’ name and kept their own values and collected them together into a new merged dataset.
      We are interested in the comparison of the top 10 highest states with diabetes, compared to the lowest 10 states with diabetes. To organize this report, we are starting with an analysis of the United States visually, and then diving into deeper analytical methods to find the best factors that relate to diabetes.

Analytics and Insights:

1. Diabetes in the United States Caption for the picture.       Research done by the Centers for Disease Control and Prevention (CDC) in the year 2015, shows that there is an approximate 9.4% of the United States population who are living with diabetes. That equates to about 30.3 million Americans as of 2015.
      From the data that was placed on the map, the states that showed the highest diabetes rate are Missouri, Louisiana, Alabama, Tennessee, Kentucky and West Virginia, with the highest being Mississippi, having a value of 13.6% of its population with diabetes. The state with the lowest percentage diabetes is Colorado, with 6.4% of its population with diabetes.
      From this map it is clear that the states in the south and southeast region of the United States have the highest diabetes percentage. It is also shown on the map that states in the north and west regions have a considerably lower diabetes rate.


2. Physical Activity in the United States Caption for the picture.       Physical activity is found to be the highest in states such as Alaska (8.8%), New York, Vermont, Montana and Oregon. Physical activity in the south is almost non-existent. The states with the least physical activity done is Alabama (1.2%), Tennessee (1.5%), Mississippi (1.7%), Georgia (1.8%) and Arkansas tied with Texas at (1.9%).
      Comparing this graph to the diabetes percentage graph starts to explain why the diabetes rate in the southeast states was so high, one reason is because they are the most least active states in the country. Furthermore, the states with the highest activity rate such as Alaska, New York and Montana are among the lowest diabetes rates. Visually, the physical activity rates is a clear trend towards diabetes percentage.


3. Fruits and Vegetable Consumption in the United States Caption for the picture.       States with the highest fruit and vegetable consumptions are Mississippi, Oklahoma, Louisiana, West Virginia, Arkansas and Alabama, with Mississippi having the average consumption rate of 51.2%, while the states with significantly lower consumption of fruits and vegetables are New Hampshire, Vermont and Massachusetts, with New Hampshire having consumption rates of 33.3%.
      This fruits and vegetables data we found is very interesting because it is the exact opposite of what we thought it would look like. Our group was predicting that the southeast region of states would be lower, like the physical activity map, because we were assuming that healthy food like fruits and vegetables were not being as consumed if that southeast region also has the highest diabetes rate.
      It is also interesting if you look at those west and northeast region states that originally had a low diabetes rate, surprisingly also have a low fruits and vegetable consumption rate. This variable may not be as useful of an indicator towards diabetes percentage as we originally thought.


4. Fast Food Restaurants per Capita in the United States Caption for the picture.       States with the highest density of fast food restaurants are in Arizona, Wyoming, South Dakota with Arizona being the highest. On the other hand, Alabama, Rhode Island and New Jersey have the least fast food per capita, with Alabama being the lowest. The measurement of fast food per capita was used in order to balance out the state population and fast food restaurants between the bigger and smaller states.
      Across the United States this trend of fast food restaurants per capita is similar to the diabetes rate across the United States. Our thinking is that if a state has more fast food restaurants per population, then the likelihood is that their diabetes percentage with also be higher from eating that unhealthy food.
      Two states that are outliers to this thinking are Mississippi and Alabama. They have the fewest fast food restaurant per capita, but the highest diabetes rate. When thinking about this graph it can make sense though because Alabama and Mississippi do not have many major cities, so they may only have a small amount of these restaurants in the few cities they have.


5. Obesity in the United States Caption for the picture.       Obesity is the factor most people associate diabetes to. From an eye to eye standpoint, this is the graph that relates and trends in the same direction as the first diabetes percentage graph. As seen before, the southeast region has the highest obesity rate. So it makes sense that research has obesity as the number one factor.
      To further prove this claim below is a plot that shows the positive correlation between diabetes and obesity whereas the value of diabetes increases, so does the value of obesity. Caption for the picture.

Analytics and Insights: Focusing on Top 10 and Bottom 10 Diabetic States

Caption for the picture.       After analyzing and visualizing our four factors across the United States, we are now focusing on the top 10 and bottom 10 diabetic states and what states are specifically associated to each factor. We are doing this because we want to see the difference between the top and bottom states based on different features, since there may be a small difference in value of states 11-39.
      The table above summarizes the averaged diabetes percentage based on different features (physical activity, obesity, fast food restaurant per capita, and fruit and vegetable consumption). For example, “bot10OB” represents that we sort the percentage of people having obesity and we chose the bottom ten states, which are the states having the ten lowest obesity rate. The result of calculating average diabetes percentage of these ten states is 7.8. Similarly, “top10OB” means after we sort the data by obesity, we got the average of the top ten states’ diabetes value, which is 11.44.
      To be clear, “bot10PA” means after we sort the data by physical activity, we calculate the average of the bottom ten states’ diabetes value; “top10PA” means after we sort the data by physical activity, we calculate the average of the top ten states’ diabetes value. “FRBottom10avgDBrate” and “FRTop10avgDBrate” means after we sort the data by fast food restaurant per capita, we calculate the average of the bottom ten and top ten states’ diabetes value. “FVBottom10avgeDBrate” and “FVTop10avgDBrate” means after we sort the data by fruit and vegetable consumption, we calculated the average of the bottom ten and top ten states’ diabetes value.


Diabetes and Obesity Relation: Top 10 Caption for the picture.       The states that have the highest ten obesity rate are Louisiana, Alabama, Mississippi, West Virginia, Kentucky, Arkansas, Kansas, Oklahoma, Tennessee, Missouri. These states are all from Midwest and south area of America, which means people who live there are more likely to have obesity.


Diabetes and Obesity Relation: Bottom 10 Caption for the picture.       The states that have the lowest ten obesity rate are Colorado, Hawaii, Montana, California, Massachusetts, Utah, New York, Vermont, Connecticut, and New Jersey. These states are crowded around New York and California. We can infer that the people that live there care more about their personal well-being since their obesity rate is relatively lower.


Diabetes and Physical Activity Relation: Top 10 Caption for the picture.       The states have the highest ten physical activity rate are Alaska, New York, Vermont, Oregon, Montana, Hawaii, Massachusetts, Wyoming, South Dakota, and Maine. Their average diabetes rate is 8.01, which is relatively low. Therefore, we can see that people from these states do more physical activity and have a low diabetes rate.


Diabetes and Physical Activity Relation: Bottom 10 Caption for the picture.       The states that have lowest 10 physical activity rate are Alabama, Tennessee, Mississippi, Georgia, Arkansas, Texas, North Carolina, Oklahoma, Florida, Missouri. These states are crowded in the southern US and their average diabetes rate is much higher, 10.99.


Diabetes and Fruits and Vegetable Consumption: Top 10 Caption for the picture.       The states have highest 10 fruit and vegetable consumption rate are Mississippi, Oklahoma, Louisiana, West Virginia, Arkansas, Alabama, South Carolina, Kentucky, Tennessee, and Georgia. Their average diabetes rate is 11.65.


Diabetes and Fruits and Vegetable Consumption: Bottom 10 Caption for the picture.       The states that have the lowest ten fruit and vegetable consumption rate are Washington, Maryland, New Jersey, California, Colorado, Connecticut, Maine, Massachusetts, Vermont, New Hampshire. The average diabetes rate is 7.92.


Diabetes and Number of Fast Food Restaurants Relation: Top 10 Caption for the picture.       The states that have the highest ten fast food restaurants per capita are Wyoming, South Dakota, Arizona, Delaware, North Dakota, Nebraska, Tennessee, Ohio, Louisiana, and Oklahoma. Their average diabetes rate is 9.45.


Diabetes and Number of Fast Food Restaurants Relation: Bottom 10 Caption for the picture.       The states that have the lowest fast food restaurants per capita are Hawaii, Alaska, Utah, Maine, Mississippi, New York, Connecticut, New Jersey, Rhode Island, Alabama. Their average diabetes rate is 8.97.


Further Diabetes Analysis: Correlation and Linear Regression

      To identify which variables have more influence on diabetes for all states, we sorted diabetes percentage from the lowest to the highest. Also, we added a new column about State abbreviation so that we can put all state name on a plot. Based on that sorted data, we created a scatter plot which shows a relationship between State and Diabetes percentages. Mississippi shows the overwhelming high diabetes percentage while Colorado shows the lowest value (13.6 vs. 6.4). Caption for the picture.       Continually, after we fixed State Name based on diabetes value from lowest to highest, we created a scatter plot again about PA, OB, FR and FV value to find which variable shows a similar pattern with Diabetes scatter plot. Scatter plots below are about each variable;


PA (Physical Activity) Caption for the picture. OB (Obesity) Caption for the picture. FR (Fast Food Restaurant) Caption for the picture. FV (Fruit & Vegetable) Caption for the picture.       From these plots, we could figure out Obesity and Fruit and Vegetable Consumption value shows a similar pattern with the Diabetes scatter plot. On the contrary, The Physical Activity value indicates a negative correlation with our target variable. The Fast Food Restaurants value doesn’t show any significant relationship.
      This is a correlation result. We only cover with correlations between the dependent variable and the independent variables such as the correlations between value and PA.value, value and OB.value (first row values). Caption for the picture. Caption for the picture.       Through this correlation analysis, we could achieve the same result as we got before in the scatter plot analysis. Plot 1 with a red trend line in the scatter matrix plot shows a negative correlation, -0.6213417. It indicates the correlation with the Physical Activity value. Plot 2 and plot 4 with a red trend line strongly shows positive correlations, each indicates Obesity value 0.7266586 and Fruit and Vegetable consumption value 0.7818462. In case of Plot 3, Fast Food Restaurants, we couldn’t find a reliable correlation. Caption for the picture.       To verify briefly, we created a linear regression model, assuming that the lm model is correct and the data represents approximately 95% confidence within the range of the gray lines. X axis is diabetes value. Some states such as Kansas, Louisiana were exceptionally meaningless beyond the grey line, but overall, we could see a positive correlation by converging with red trend lines in case of OB (Obesity) and FV (Fruit and Vegetables) value.       In FR (Fast Food) value, most states are out of the gray line, so it doesn’t mean much. Finally, PA (Physical Activity) value shows a negative correlation except for some outlier states like Alaska.


Conclusion:

      When exploring the top causes of diabetes visually and analyzing each states value, we thought that the factors of obesity and physical activity created the most impact on diabetes across the United States. But, diving further into more analytical methods we approached the situation by correlating each factor to diabetes. After this work, we found that obesity and fruit and vegetable consumption were the best factors. We also saw that physical activity had a negative correlation and the presence of fast food restaurants did not show a useful relationship to diabetes.

Avatar
Shawn Kim
Actively seeking for full-time opportunities | Analytics Position

Actively seeking for full-time opportunities | Analytics Position