CDC Dataset Task-1 – lhiteshmth522.sites.umassd.edu

I have observed in CDC Dataset there are 3 Excel sheets of data. In that Diabetics Data there are 4 Numerical Columns(Int Columns) and 2 Object columns data type. In the Obesity & Inactivity Excel Sheet has same data types. In this post we are considering about Diabetics Data.

we have used the info function to get the datatypes of each columns as I mentioned above. We can clearly observer the data types of the below screen shots.

I have used describe function to get the statistics of the Diabetics data.

we can clearly observed from above picture we have 3142 rows of data, mean of diabetics column is 8.719796, with SD of 1.794854, min 3.800000, max 17.90000. If we consider about the inter quatile ranges we can clearly observe how the data is distributed in the each quartile.

In the first 25% quartile the data is distributed as 7.30000, for 50% it shows 8.40000, for 75% it displays 9.70000.

As we can see and observe from the above screenshot that there is no-correlation between the columns but there might be a correlation between while merging the 3 excel sheets.

Leave a Reply Cancel reply