{"id":1329,"date":"2021-01-14T00:12:50","date_gmt":"2021-01-14T00:12:50","guid":{"rendered":"http:\/\/lucylabs.gatech.edu\/ml4t\/?page_id=1329"},"modified":"2021-01-14T09:33:05","modified_gmt":"2021-01-14T09:33:05","slug":"project-4","status":"publish","type":"page","link":"https:\/\/lucylabs.gatech.edu\/ml4t\/spring2021\/project-4\/","title":{"rendered":"Project 4"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.4.5&#8243;][et_pb_row _builder_version=&#8221;4.4.5&#8243;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.4.5&#8243;][et_pb_text _builder_version=&#8221;4.4.5&#8243; header_font=&#8221;|700||on|||||&#8221;]<\/p>\n<h1 style=\"text-align: center;\">Project 4: Defeat Learners<\/h1>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.4.5&#8243;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.4.5&#8243;][et_pb_text _builder_version=&#8221;4.8.0&#8243; header_2_font=&#8221;|||on|on||||&#8221; hover_enabled=&#8221;0&#8243; sticky_enabled=&#8221;0&#8243;]<\/p>\n<h2>Due Date<\/h2>\n<p><span>02\/21\/2021 11:59 PM <\/span><a rel=\"nofollow noopener noreferrer\" class=\"external text\" target=\"_blank\" href=\"https:\/\/www.timeanddate.com\/time\/zones\/aoe\">Anywhere on Earth time<\/a><\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.4.5&#8243;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.4.5&#8243;][et_pb_text _builder_version=&#8221;4.8.0&#8243; header_2_font=&#8221;|||on|on||||&#8221;]<\/p>\n<h2>Revisions<\/h2>\n<p><span>This assignment is subject to change up until 3 weeks prior to the due date. We do not anticipate changes; any changes will be logged in this section.<\/span><\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.4.5&#8243;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.4.5&#8243;][et_pb_text _builder_version=&#8221;4.8.0&#8243; header_2_font=&#8221;|||on|on||||&#8221;]<\/p>\n<h2>Overview<\/h2>\n<h3><span style=\"color: #ff0000;\">This assignment counts towards 5% of your overall grade.<\/span><\/h3>\n<p>For this homework, you will generate data that you believe will work better for one learner than another. This will test your understanding of the strengths and weaknesses of various learners. The two learners you should aim your datasets at are:<\/p>\n<ul>\n<li>A decision tree learner with leaf_size = 1 (DTLearner). Note that for testing purposes we will use our implementation of DTLearner<\/li>\n<li>The LinRegLearner provided as part of the repo.<\/li>\n<\/ul>\n<p>Your data generation should use a random number generator as part of its data generation process. We will pass your generators a random number seed. Whenever the seed is the same you should return exactly the same data set. Different seeds should result in different data sets.<\/p>\n<p>The provided grading script may be similar to, but will not include all of the instructor tests.<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.4.5&#8243;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.4.5&#8243;][et_pb_text _builder_version=&#8221;4.8.0&#8243; header_2_font=&#8221;|||on|on||||&#8221;]<\/p>\n<h2>Template<\/h2>\n<p>Instructions:<\/p>\n<ul>\n<li>Download the appropriate zip file <a href=\"https:\/\/www.dropbox.com\/s\/ansg7n9vqcnr6mh\/defeat_learners_2021Spring.zip?dl=1\" target=\"_blank\" rel=\"noopener noreferrer\">File:Defeat_Learners_2021Spring.zip<\/a><\/li>\n<li>You should see the following files and directory\n<ul>\n<li><tt>defeat_learners\/<\/tt> the assignment directory<\/li>\n<li><tt>defeat_learners\/gen_data.py<\/tt> An implementation of the code you are supposed to provide: It includes two functions that return a data set and a third function that returns a user ID. Note that the data sets those functions return DO NOT satisfy the requirements for the homework. But they do show you how you can generate a data set.<\/li>\n<li><tt>defeat_learners\/LinRegLearner.py<\/tt> Our friendly, working, correct, linear regression learner. It is used by the grading script. Do not rely on local changes you make to this file, as you may only submit <tt>gen_data.py<\/tt>.<\/li>\n<li><tt>defeat_learners\/DTLearner.py<\/tt> A working, but INCORRECT, Decision Tree learner. Replace it with your working, correct DTLearner.<\/li>\n<li><tt>defeat_learners\/testbest4.py<\/tt> Code that calls the two data set generating functions and tests them against the two learners. Useful for debugging.<\/li>\n<li><tt>defeat_learners\/grade_best4.py<\/tt> The grading script; for more details see here: <a href=\"http:\/\/lucylabs.gatech.edu\/ml4t\/spring2021\/software-setup\/\" target=\"_blank\" rel=\"noopener noreferrer\">ML4T_Software_Setup<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.4.5&#8243;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.4.5&#8243;][et_pb_text _builder_version=&#8221;4.4.5&#8243; header_2_font=&#8221;|||on|on||||&#8221;]<\/p>\n<h2>Tasks<\/h2>\n<h3><span class=\"mw-headline\" id=\"Implement_Dataset_Functions\">Implement Dataset Functions<\/span><\/h3>\n<p>Create a Python program called gen_data.py that implements two functions. The two functions should be named as follows, and support the following API:<\/p>\n<p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;WDEsIFkxID0gYmVzdF80X2xpbl9yZWcoc2VlZCA9IDUpClgyLCBZMiA9IGJlc3RfNF9kdChzZWVkID0gNSk=&#8221; language=&#8221;python&#8221; _builder_version=&#8221;4.5.6&#8243;]WDEsIFkxID0gYmVzdF80X2xpbl9yZWcoc2VlZCA9IDUpClgyLCBZMiA9IGJlc3RfNF9kdChzZWVkID0gNSk=[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.5.6&#8243; header_2_font=&#8221;|||on|on||||&#8221;]<\/p>\n<ul>\n<li><b>seed<\/b> Your data generation should use a random number generator as part of its data generation process. We will pass your generators a random number seed. Whenever the seed is the same you should return exactly the same data set. Different seeds should result in different data sets.<\/li>\n<\/ul>\n<h3><span class=\"mw-headline\" id=\"Linear_Regression_Dataset\">Linear Regression Dataset<\/span><\/h3>\n<p>best_4_lin_reg() should return data that performs significantly better (see rubric) with LinRegLearner than DTLearner.<\/p>\n<p>Each data set should include from 2 to 10 columns in X, and one column in Y. The data should contain from 10 (minimum) to 1000 (maximum) rows.<\/p>\n<h3><span class=\"mw-headline\" id=\"Decision_Tree_Dataset\">Decision Tree Dataset<\/span><\/h3>\n<p>best_4_dt() should return data that performs significantly better with DTLearner than LinRegLearner.<\/p>\n<p>Each data set should include from 2 to 10 columns in X, and one column in Y. The data should contain from 10 (minimum) to 1000 (maximum) rows.<\/p>\n<h3><span class=\"mw-headline\" id=\"Implement_author.28.29_.28Up_to_10_point_penalty.29\">Implement author() (Up to 10 point penalty)<\/span><\/h3>\n<p>You should implement a function called <tt>author()<\/tt> that returns your Georgia Tech user ID as a string. This is the ID you use to log into Canvas. It is not your 9 digit student number. Here is an example of how you might implement author():<\/p>\n<p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;ZGVmIGF1dGhvcigpOgogICAgICAgIHJldHVybiAndGIzNCcgIyByZXBsYWNlIHRiMzQgd2l0aCB5b3VyIEdlb3JnaWEgVGVjaCB1c2VybmFtZS4=&#8221; language=&#8221;python&#8221; _builder_version=&#8221;4.5.6&#8243;]ZGVmIGF1dGhvcigpOgogICAgICAgIHJldHVybiAndGIzNCcgIyByZXBsYWNlIHRiMzQgd2l0aCB5b3VyIEdlb3JnaWEgVGVjaCB1c2VybmFtZS4=[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.5&#8243;]<\/p>\n<p><span>Implementing this method correctly does not provide any points, but there will be a penalty for not implementing it.<\/span><\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.4.5&#8243;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.4.5&#8243;][et_pb_text _builder_version=&#8221;4.8.0&#8243; header_2_font=&#8221;|||on|on||||&#8221;]<\/p>\n<h2>What To Turn In<\/h2>\n<p>Be sure to follow these instructions diligently!<\/p>\n<h3>Gradescope:<\/h3>\n<ul>\n<li>(SUBMISSION) Project 4: Defeat Learners\n<ul>\n<li>Your code as <tt>gen_data.py<\/tt><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>We WILL NOT use your DTLearner, or LinRegLearner, so do not submit them.<\/p>\n<p>Do not submit any other files.<\/p>\n<p>You are only allowed 3 submissions to <strong>(SUBMISSION) Project 4: Defeat Learners<\/strong>\u00a0but\u00a0unlimited resubmissions are allowed on <strong>(TESTING) Project 4: Defeat Learners.<\/strong><\/p>\n<p>Note that Gradescope does <strong>not<\/strong> grade your assignment live; instead, it pre-validates that it will run against our batch autograder that we will run after the deadline. There will be <strong>no<\/strong> credit given for coding assignments that do not pass this pre-validation.<\/p>\n<p>Refer to the\u00a0<a href=\"http:\/\/lucylabs.gatech.edu\/ml4t\/spring2021\/gradescope\/\" target=\"_blank\" rel=\"noopener noreferrer\">Gradescope Instructions<\/a>\u00a0for more information.<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.4.5&#8243;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.4.5&#8243;][et_pb_text _builder_version=&#8221;4.5.6&#8243; header_2_font=&#8221;|||on|on||||&#8221;]<\/p>\n<h2>Rubric<\/h2>\n<h3>Report<\/h3>\n<p>No report<\/p>\n<h3>Code<\/h3>\n<p>See Auto-Grader<\/p>\n<h3>Auto-Grader [100 points]<\/h3>\n<p>Deductions:<\/p>\n<ul>\n<li>Does either dataset returned contain fewer or more than the allowed number of samples? (-20 points each)<\/li>\n<li>Does either dataset returned contain fewer or more than the allowed number of dimensions in X? (-20 points each)<\/li>\n<li>When the seed is the same does the best_4_lin_reg dataset generator return the same data? (-20 points otherwise)<\/li>\n<li>When the seed is the same does the best_4_dt dataset generator return the same data? (-20 points otherwise)<\/li>\n<li>When the seed is different does the best_4_lin_reg dataset generator return different data? (-20 points otherwise)<\/li>\n<li>When the seed is different does the best_4_dt dataset generator return different data? (-20 points otherwise)<\/li>\n<li>Is the author() method implemented? (-10 points if not)<\/li>\n<li>Does the code attempt to import a learner? (-10 points if so)<\/li>\n<\/ul>\n<p>For best_4_lin_reg (1 test case):<\/p>\n<ul>\n<li>We will call best_4_lin_reg 15 times, and select the 10 best datasets. For each successful test +5 points (total of 50 points)<\/li>\n<li>For each test case we will randomly select 60% of the data for training and 40% for testing.<\/li>\n<li>Success for each case is defined as: RMSE LinReg &lt; RMSE DT * 0.9<\/li>\n<\/ul>\n<p>For best_4_dt (1 test case):<\/p>\n<ul>\n<li>We will call best_4_dt 15 times, and select the 10 best datasets. For each successful test +5 points (total of 50 points)<\/li>\n<li>For each test case we will randomly select 60% of the data for training and 40% for testing.<\/li>\n<li>Success for each case is defined as: RMSE DT &lt; RMSE LinReg * 0.9<\/li>\n<\/ul>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.4.5&#8243;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.4.5&#8243;][et_pb_text _builder_version=&#8221;4.5.6&#8243; header_2_font=&#8221;|||on|on||||&#8221;]<\/p>\n<h2>Required, Allowed &amp; Prohibited<\/h2>\n<p>Required:<\/p>\n<ul>\n<li>No reading of data from files.<\/li>\n<li>Your project must be coded in Python 3.6.x.<\/li>\n<li>Your code must be submitted to Gradescope in the appropriate Gradescope assignment.<\/li>\n<li>Your code must run in less than 5 seconds per test case.<\/li>\n<li>The code you submit should NOT include any data reading routines. You should generate all of your data within your functions.<\/li>\n<li>The code you submit should NOT generate any output: No prints, no charts, etc.<\/li>\n<li>Reference any code used in the \u201cAllowed\u201d section in your code. At minimum it should have the link\/filename\/video name of where it came from.<\/li>\n<\/ul>\n<p>Allowed:<\/p>\n<ul>\n<li>You can develop your code on your personal machine, but it must also run successfully on Gradescope.<\/li>\n<li>Your code may use standard Python libraries.<\/li>\n<li>You may use the NumPy, SciPy, matplotlib and Pandas libraries. Be sure you are using the correct versions.<\/li>\n<li>Code provided by the instructor, or allowed by the instructor to be shared.<\/li>\n<li>Cheese.<\/li>\n<\/ul>\n<p>Prohibited:<\/p>\n<ul>\n<li>Any reading of data files.<\/li>\n<li>Any libraries not listed in the &#8220;allowed&#8221; section above.<\/li>\n<li>Any code you did not write yourself.<\/li>\n<li>Any Classes (other than Random) that create their own instance variables for later use (e.g., learners like kdtree).<\/li>\n<li>Code that includes any data reading routines.<\/li>\n<li>Code that generates any output: No prints, no charts, etc.<\/li>\n<li>Absolute import statements of the <strong>current<\/strong> project folder such as <code>from defeat_learners import XXXX or import defeat_learners.XXXX<\/code><\/li>\n<li>Extra directories (manually or code created)<\/li>\n<li>Extra files not listed in &#8220;WHAT TO TURN IN&#8221;<\/li>\n<li>Ducks and wood.<\/li>\n<\/ul>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Project 4: Defeat LearnersDue Date 02\/21\/2021 11:59 PM Anywhere on Earth time Revisions This assignment is subject to change up until 3 weeks prior to the due date. We do not anticipate changes; any changes will be logged in this section.Overview This assignment counts towards 5% of your overall grade. For this homework, you will [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":1316,"menu_order":8,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"class_list":["post-1329","page","type-page","status-publish","hentry"],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages\/1329","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/comments?post=1329"}],"version-history":[{"count":6,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages\/1329\/revisions"}],"predecessor-version":[{"id":1570,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages\/1329\/revisions\/1570"}],"up":[{"embeddable":true,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages\/1316"}],"wp:attachment":[{"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/media?parent=1329"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}