Challenge: Improve H2OXGBoost Regression Model

Tune the H2OXGB regression model to predict loan interest rates.

Problem statement

A financial institution wants to predict the interest rate on personal loans based on various factors such as income and credit details, loan amount, term, and debt-to-income ratio. It has collected data on these factors from past loans and would like to build a regression model that can accurately predict the interest rate on future loans. It has built a regression model with H2OXGBoostEstimator, getting an RMSE score of about 3.95 on the test dataset.

Your task is to find the right set of values for the given parameters, which should bring down the RMSE value to <3.80. Only use the given parameters—do not use any additional ones.

Data description

The financial institution has provided a dataset of past loans, which includes the following variables:

  • emp_length: The applicant’s number of years at their current job, rounded down to the nearest year. Values are capped at 10.

  • state: The two-letter code for the state where the applicant resides.

  • homeownership: The applicant’s status as a homeowner, renter, or other.

  • annual_income: The annual income reported by the applicant.

  • verified_income: The method used to verify the applicant’s income.

  • debt_to_income: The applicant’s debt-to-income ratio.

  • delinq_2y: The number of delinquencies in the past two years on lines of credit.

  • months_since_last_delinq: The number of months since the applicant’s last delinquency.

  • earliest_credit_line: The year the applicant opened their earliest line of credit.

  • inquiries_last_12m: The number of credit inquiries on the applicant’s record in the past 12 months.

  • total_credit_lines: The total number of credit lines in the applicant’s credit history.

  • open_credit_lines: The number of open credit lines in the applicant’s credit history.

  • total_credit_limit: The total available credit on all credit lines, excluding mortgages.

  • total_credit_utilized: The total amount of credit used, excluding mortgages.

  • num_collections_last_12m: The number of debts that have gone to collections in the past 12 months, excluding medical collections.

  • num_historical_failed_to_pay: The number of public records of derogatory accounts against the applicant.

  • months_since_90d_late: The number of months since the applicant was 90 days late on a payment.

  • current_accounts_delinq: The current number of delinquent accounts the applicant has.

  • total_collection_amount_ever: The total amount of debt the applicant has had sent to collections.

  • current_installment_accounts: The number of installment accounts the applicant currently has.

  • accounts_opened_24m: The number of new credit lines opened in the past 24 months.

  • months_since_last_credit_inquiry: The number of months since the last credit inquiry on the applicant’s record.

  • num_satisfactory_accounts: The number of satisfactory accounts the applicant currently has.

  • num_accounts_120d_past_due: The number of current accounts that are 120 days past due.

  • num_accounts_30d_past_due: The number of current accounts that are 30 days past due.

  • num_active_debit_accounts: The number of active bank card accounts the applicant currently has.

  • total_debit_limit: The total limit on all bank card accounts the applicant currently has.

  • num_total_cc_accounts: The total number of credit card accounts in the applicant’s credit history.

  • num_open_cc_accounts: The number of open credit card accounts the applicant currently has.

  • num_cc_carrying_balance: The number of credit card accounts currently carrying a balance.

  • num_mort_accounts: The number of mortgage accounts the applicant has.

  • account_never_delinq_percent: The percentage of credit lines on the applicant’s record that have never been delinquent.

  • public_record_bankrupt: The number of bankruptcies listed in the public record for the applicant.

  • loan_purpose: The purpose of the loan.

  • application_type: The type of application (individual or joint).

  • loan_amount: The amount of the loan the applicant received.

  • term: The term length in months of the loan the applicant received.

  • interest_rate: The interest rate on the loan (target variable).

Click the “Run” button below to look at some sample dataset values.

Get hands-on with 1200+ tech skills courses.