# Project 4: Defeat Learners

## DTLearner.py

*class*`DTLearner.`

`DTLearner`

(*leaf_size=1*,*verbose=False*)This is a decision tree learner object that is implemented incorrectly. You should replace this DTLearner with

your own correct DTLearner from Project 3.- Parameters
**leaf_size**(*int*) – The maximum number of samples to be aggregated at a leaf, defaults to 1.**verbose**(*bool*) – If “verbose” is True, your code can print out information for debugging.

If verbose = False your code should not generate ANY output. When we test your code, verbose will be False.

`add_evidence`

(*data_x*,*data_y*)Add training data to learner

- Parameters
**data_x**(*numpy.ndarray*) – A set of feature values used to train the learner**data_y**(*numpy.ndarray*) – The value we are attempting to predict given the X data

- Returns
The GT username of the student

- Return type
str

`query`

(*points*)Estimate a set of test points given the model we built.

- Parameters
**points**(*numpy.ndarray*) – A numpy array with each row corresponding to a specific query.- Returns
The predicted result of the input data according to the trained model

- Return type
numpy.ndarray

## gen_data.py

- Returns
The GT username of the student

- Return type
str

`best_4_dt`

(*seed=1489683273*)- Returns data that performs significantly better with DTLearner than LinRegLearner.

The data set should include from 2 to 10 columns in X, and one column in Y.

The data should contain from 10 (minimum) to 1000 (maximum) rows.

- Parameters
**seed**(*int*) – The random seed for your data generation.- Returns
Returns data that performs significantly better with DTLearner than LinRegLearner.

- Return type
numpy.ndarray

`best_4_lin_reg`

(*seed=1489683273*)Returns data that performs significantly better with LinRegLearner than DTLearner.

The data set should include from 2 to 10 columns in X, and one column in Y.

The data should contain from 10 (minimum) to 1000 (maximum) rows.- Parameters
**seed**(*int*) – The random seed for your data generation.- Returns
Returns data that performs significantly better with LinRegLearner than DTLearner.

- Return type
numpy.ndarray

## LinRegLearner.py

*class*`LinRegLearner.`

`LinRegLearner`

(*verbose=False*)This is a Linear Regression Learner. It is implemented correctly.

- Parameters
**verbose**(*bool*) – If “verbose” is True, your code can print out information for debugging.

If verbose = False your code should not generate ANY output. When we test your code, verbose will be False.

`add_evidence`

(*data_x*,*data_y*)Add training data to learner

- Parameters
**data_x**(*numpy.ndarray*) – A set of feature values used to train the learner**data_y**(*numpy.ndarray*) – The value we are attempting to predict given the X data

- Returns
The GT username of the student

- Return type
str

`query`

(*points*)Estimate a set of test points given the model we built.

- Parameters
**points**(*numpy.ndarray*) – A numpy array with each row corresponding to a specific query.- Returns
The predicted result of the input data according to the trained model

- Return type
numpy.ndarray

## testbest4.py

`compare_os_rmse`

(*learner1*,*learner2*,*x*,*y*)Compares the out-of-sample root mean squared error of your LinRegLearner and DTLearner.

- Parameters
**learner1**(*class:’LinRegLearner.LinRegLearner’*) – An instance of LinRegLearner**learner2**(*class:’DTLearner.DTLearner’*) – An instance of DTLearner**x**(*numpy.ndarray*) – X data generated from either gen_data.best_4_dt or gen_data.best_4_lin_reg**y**(*numpy.ndarray*) – Y data generated from either gen_data.best_4_dt or gen_data.best_4_lin_reg

- Returns
The root mean squared error of each learner

- Return type
tuple

`test_code`

()Performs a test of your code and prints the results