Project 4: Defeat Learners

 

 

DTLearner.py

 

class DTLearner.DTLearner(leaf_size=1, verbose=False)

This is a decision tree learner object that is implemented incorrectly. You should replace this DTLearner with
your own correct DTLearner from Project 3.

Parameters
  • leaf_size (int) – The maximum number of samples to be aggregated at a leaf, defaults to 1.
  • verbose (bool) – If “verbose” is True, your code can print out information for debugging.
    If verbose = False your code should not generate ANY output. When we test your code, verbose will be False.
add_evidence(data_x, data_y)

Add training data to learner

Parameters
  • data_x (numpy.ndarray) – A set of feature values used to train the learner
  • data_y (numpy.ndarray) – The value we are attempting to predict given the X data
author()
Returns

The GT username of the student

Return type

str

query(points)

Estimate a set of test points given the model we built.

Parameters

points (numpy.ndarray) – A numpy array with each row corresponding to a specific query.

Returns

The predicted result of the input data according to the trained model

Return type

numpy.ndarray

gen_data.py

author()
Returns

The GT username of the student

Return type

str

best_4_dt(seed=1489683273)
Returns data that performs significantly better with DTLearner than LinRegLearner.
The data set should include from 2 to 10 columns in X, and one column in Y.
The data should contain from 10 (minimum) to 1000 (maximum) rows.
Parameters

seed (int) – The random seed for your data generation.

Returns

Returns data that performs significantly better with DTLearner than LinRegLearner.

Return type

numpy.ndarray

best_4_lin_reg(seed=1489683273)

Returns data that performs significantly better with LinRegLearner than DTLearner.
The data set should include from 2 to 10 columns in X, and one column in Y.
The data should contain from 10 (minimum) to 1000 (maximum) rows.

Parameters

seed (int) – The random seed for your data generation.

Returns

Returns data that performs significantly better with LinRegLearner than DTLearner.

Return type

numpy.ndarray

LinRegLearner.py

class LinRegLearner.LinRegLearner(verbose=False)

This is a Linear Regression Learner. It is implemented correctly.

Parameters
verbose (bool) – If “verbose” is True, your code can print out information for debugging.
If verbose = False your code should not generate ANY output. When we test your code, verbose will be False.
add_evidence(data_x, data_y)

Add training data to learner

Parameters
  • data_x (numpy.ndarray) – A set of feature values used to train the learner
  • data_y (numpy.ndarray) – The value we are attempting to predict given the X data
author()
Returns

The GT username of the student

Return type

str

query(points)

Estimate a set of test points given the model we built.

Parameters

points (numpy.ndarray) – A numpy array with each row corresponding to a specific query.

Returns

The predicted result of the input data according to the trained model

Return type

numpy.ndarray

testbest4.py

compare_os_rmse(learner1, learner2, x, y)

Compares the out-of-sample root mean squared error of your LinRegLearner and DTLearner.

Parameters
  • learner1 (class:’LinRegLearner.LinRegLearner’) – An instance of LinRegLearner
  • learner2 (class:’DTLearner.DTLearner’) – An instance of DTLearner
  • x (numpy.ndarray) – X data generated from either gen_data.best_4_dt or gen_data.best_4_lin_reg
  • y (numpy.ndarray) – Y data generated from either gen_data.best_4_dt or gen_data.best_4_lin_reg
Returns

The root mean squared error of each learner

Return type

tuple

test_code()

Performs a test of your code and prints the results