Perform exploratory data analysis and provide key insights derived from the same backed with suitable graphs and plots.
Dataset Description: The dataset is based on the “Statlog Dataset” from the UCI Machine Learning Repository. Columns of the dataset and their meaning are as follows;
Age (numeric)
Sex (text: male, female)
Job (numeric: 0 - unskilled and non-resident, 1 - unskilled and resident, 2 - skilled, 3 - highly skilled)
Housing (text: own, rent, or free)
Saving accounts (text - little, moderate, quite rich, rich)
Checking account (text - little, moderate, rich)
Credit amount (numeric, in Deutsche Mark)
Duration (numeric, in month)
Purpose (text: car, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others
Assignment questions:
-
Load the dataset into pandas and get a peek at the underlying data in the dataframe.
-
Provide the following information about the dataframe;
Dimensions of the dataframe Information about the schema Statistical metrics of each column
-
Conduct the following data pre-processing steps only as necessary along with the reason behind doing it with suitable steps; Missing values Erroneous/wrong values Skewed data Outliers
-
Perform exploratory data analysis and provide key insights derived from the same backed with suitable graphs and plots.
Few hints to get you started; Distribution of numerical variables Distribution of categorical variables Numerical vs Categorical plots Numerical vs Numerical plots