Ensuring the data is consistent
To ensure the data is consistent, we must check the names of the columns in the DataFrame:
column_names = [cols for cols in df] print(column_names) ['Loan_ID', 'Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'ApplicantIncome', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term', 'Credit_History', 'Property_Area', 'Loan_Status']
Next, we must get all the column names that don’t contain underscores:
num_underscore_present_columns = [cols for cols in column_names if '_' not in cols] num_underscore_present_columns ['Gender', 'Married', 'Dependents', 'Education', 'ApplicantIncome', 'CoapplicantIncome', 'LoanAmount']
Since some columns have two uppercase letters in their names, we must add the underscore before...