A polling institute wants to be able to estimate an individual’s income from his/her personal data (see einkommen.train). To this aim, 30.000 individuals were interviewed concerning the features summarized below. For some of the individuals, not all features are available. Crucially, the income of only 5.000 of the interviewee’s is known.
- Data Integration
- Feature Representation
- EDA Pairplot
- Correlation of Numeric Attributes
- Missing Value Representation
- Data Cleaning, covert categorical variables to numerical
- Check missing values
- Feature Selection
- Model Selection and Evaluation
- 'Logistic Regression'
- 'Random Forest'
- 'Neural Network'
- 'GaussianNB'
- 'DecisionTreeClassifier'
- 'SVM'