Project 4: Defeat Learners
DTLearner.py
- class DTLearner.DTLearner(leaf_size=1, verbose=False)
- 
This is a decision tree learner object that is implemented incorrectly. You should replace this DTLearner with 
 your own correct DTLearner from Project 3.- Parameters
- 
- leaf_size (int) – The maximum number of samples to be aggregated at a leaf, defaults to 1.
- verbose (bool) – If “verbose” is True, your code can print out information for debugging.
 If verbose = False your code should not generate ANY output. When we test your code, verbose will be False.
 
 - add_evidence(data_x, data_y)
- 
Add training data to learner - Parameters
- 
- data_x (numpy.ndarray) – A set of feature values used to train the learner
- data_y (numpy.ndarray) – The value we are attempting to predict given the X data
 
 
 - 
- Returns
- 
The GT username of the student 
- Return type
- 
str 
 
 - query(points)
- 
Estimate a set of test points given the model we built. - Parameters
- 
points (numpy.ndarray) – A numpy array with each row corresponding to a specific query. 
- Returns
- 
The predicted result of the input data according to the trained model 
- Return type
- 
numpy.ndarray 
 
 
gen_data.py
- 
- Returns
- 
The GT username of the student 
- Return type
- 
str 
 
- best_4_dt(seed=1489683273)
- Returns data that performs significantly better with DTLearner than LinRegLearner.
 The data set should include from 2 to 10 columns in X, and one column in Y.
 The data should contain from 10 (minimum) to 1000 (maximum) rows.
- Parameters
- 
seed (int) – The random seed for your data generation. 
- Returns
- 
Returns data that performs significantly better with DTLearner than LinRegLearner. 
- Return type
- 
numpy.ndarray 
- best_4_lin_reg(seed=1489683273)
- 
Returns data that performs significantly better with LinRegLearner than DTLearner. 
 The data set should include from 2 to 10 columns in X, and one column in Y.
 The data should contain from 10 (minimum) to 1000 (maximum) rows.- Parameters
- 
seed (int) – The random seed for your data generation. 
- Returns
- 
Returns data that performs significantly better with LinRegLearner than DTLearner. 
- Return type
- 
numpy.ndarray 
 
LinRegLearner.py
- class LinRegLearner.LinRegLearner(verbose=False)
- 
This is a Linear Regression Learner. It is implemented correctly. - Parameters
- verbose (bool) – If “verbose” is True, your code can print out information for debugging.
 If verbose = False your code should not generate ANY output. When we test your code, verbose will be False.
 
- add_evidence(data_x, data_y)
- 
Add training data to learner - Parameters
- 
- data_x (numpy.ndarray) – A set of feature values used to train the learner
- data_y (numpy.ndarray) – The value we are attempting to predict given the X data
 
 
- 
- Returns
- 
The GT username of the student 
- Return type
- 
str 
 
- query(points)
- 
Estimate a set of test points given the model we built. - Parameters
- 
points (numpy.ndarray) – A numpy array with each row corresponding to a specific query. 
- Returns
- 
The predicted result of the input data according to the trained model 
- Return type
- 
numpy.ndarray 
 
testbest4.py
- compare_os_rmse(learner1, learner2, x, y)
- 
Compares the out-of-sample root mean squared error of your LinRegLearner and DTLearner. - Parameters
- 
- learner1 (class:’LinRegLearner.LinRegLearner’) – An instance of LinRegLearner
- learner2 (class:’DTLearner.DTLearner’) – An instance of DTLearner
- x (numpy.ndarray) – X data generated from either gen_data.best_4_dt or gen_data.best_4_lin_reg
- y (numpy.ndarray) – Y data generated from either gen_data.best_4_dt or gen_data.best_4_lin_reg
 
- Returns
- 
The root mean squared error of each learner 
- Return type
- 
tuple 
 
- test_code()
- 
Performs a test of your code and prints the results 
