{"id":1333,"date":"2021-01-14T00:24:29","date_gmt":"2021-01-14T00:24:29","guid":{"rendered":"http:\/\/lucylabs.gatech.edu\/ml4t\/?page_id=1333"},"modified":"2021-01-14T02:16:30","modified_gmt":"2021-01-14T02:16:30","slug":"project-7","status":"publish","type":"page","link":"https:\/\/lucylabs.gatech.edu\/ml4t\/spring2021\/code-documentation\/project-7\/","title":{"rendered":"Project 7"},"content":{"rendered":"\n[et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;3.22&#8243;][et_pb_row _builder_version=&#8221;3.25&#8243; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;3.25&#8243; custom_padding=&#8221;|||&#8221; custom_padding__hover=&#8221;|||&#8221;][et_pb_text _builder_version=&#8221;4.5.6&#8243; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221; hover_enabled=&#8221;0&#8243;]<div class=\"document\">\n<div class=\"documentwrapper\">\n<div class=\"bodywrapper\">\n<div class=\"body\" role=\"main\">\n<div class=\"section\" id=\"module-QLearner\">\n<h1 style=\"text-align: center;\">Project 7: Q-Learning Robot Documentation<\/h1>\n<dl class=\"class\">\n<dt><em class=\"property\"><\/em><\/dt>\n<\/dl>\n<h2><span class=\"target\"><\/span><\/h2>\n<h2 style=\"text-align: left;\"><span style=\"text-decoration: underline;\"><span class=\"target\">QLearner.py<\/span><\/span><\/h2>\n<p><span class=\"target\"><\/span><\/p>\n<dl class=\"class\">\n<dt><em class=\"property\"><\/em><\/dt>\n<dt id=\"QLearner.QLearner\"><em class=\"property\">class <\/em><code class=\"sig-prename descclassname\">QLearner.<\/code><code class=\"sig-name descname\">QLearner<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">num_states=100<\/em>, <em class=\"sig-param\">num_actions=4<\/em>, <em class=\"sig-param\">alpha=0.2<\/em>, <em class=\"sig-param\">gamma=0.9<\/em>, <em class=\"sig-param\">rar=0.5<\/em>, <em class=\"sig-param\">radr=0.99<\/em>, <em class=\"sig-param\">dyna=0<\/em>, <em class=\"sig-param\">verbose=False<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>This is a Q learner object.<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<ul class=\"simple\">\n<li><strong>num_states<\/strong> (<em>int<\/em>) \u2013 The number of states to consider.<\/li>\n<li><strong>num_actions<\/strong> (<em>int<\/em>) \u2013 The number of actions available..<\/li>\n<li><strong>alpha<\/strong> (<em>float<\/em>) \u2013 The learning rate used in the update rule. Should range between 0.0 and 1.0 with 0.2 as a typical value.<\/li>\n<li><strong>gamma<\/strong> (<em>float<\/em>) \u2013 The discount rate used in the update rule. Should range between 0.0 and 1.0 with 0.9 as a typical value.<\/li>\n<li><strong>rar<\/strong> (<em>float<\/em>) \u2013 Random action rate: the probability of selecting a random action at each step. Should range between 0.0 (no random actions) to 1.0 (always random action) with 0.5 as a typical value.<\/li>\n<li><strong>radr<\/strong> (<em>float<\/em>) \u2013 Random action decay rate, after each update, rar = rar * radr. Ranges between 0.0 (immediate decay to 0) and 1.0 (no decay). Typically 0.99.<\/li>\n<li><strong>dyna<\/strong> (<em>int<\/em>) \u2013 The number of dyna updates for each regular update. When Dyna is used, 200 is a typical value.<\/li>\n<li><strong>verbose<\/strong> (<em>bool<\/em>) \u2013 If \u201cverbose\u201d is True, your code can print out information for debugging.<\/li>\n<\/ul>\n<\/dd>\n<\/dl>\n<dl class=\"method\">\n<dt id=\"QLearner.QLearner.query\"><code class=\"sig-name descname\">query<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">s_prime<\/em>, <em class=\"sig-param\">r<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>Update the Q table and return an action<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<ul class=\"simple\">\n<li><strong>s_prime<\/strong> (<em>int<\/em>) \u2013 The new state<\/li>\n<li><strong>r<\/strong> (<em>float<\/em>) \u2013 The immediate reward<\/li>\n<\/ul>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>The selected action<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>int<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<dl class=\"method\">\n<dt id=\"QLearner.QLearner.querysetstate\"><code class=\"sig-name descname\">querysetstate<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">s<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>Update the state without updating the Q-table<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<p><strong>s<\/strong> (<em>int<\/em>) \u2013 The new state<\/p>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>The selected action<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>int<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<p><span class=\"target\" id=\"module-testqlearner\"><\/span><\/p>\n<p><span class=\"target\"><\/span><\/p>\n<p><span class=\"target\"><\/span><\/p>\n<h2 style=\"text-align: left;\"><span style=\"text-decoration: underline;\"><span class=\"target\">testqlearner.py<\/span><\/span><\/h2>\n<p><span class=\"target\"><\/span><\/p>\n<p><span class=\"target\"><\/span><\/p>\n<dl class=\"function\">\n<dt id=\"testqlearner.discretize\"><code class=\"sig-name descname\">discretize<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">pos<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>convert the location to a single integer<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<p><strong>pos<\/strong> (<em>int<\/em><em>, <\/em><em>int<\/em>) \u2013 the position to discretize<\/p>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>the discretized position<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>int<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<dl class=\"function\">\n<dt id=\"testqlearner.getgoalpos\"><code class=\"sig-name descname\">getgoalpos<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">data<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>find where the goal is in the map<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<p><strong>data<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/p>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>the position of the goal<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>tuple(int, int)<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<dl class=\"function\">\n<dt id=\"testqlearner.getrobotpos\"><code class=\"sig-name descname\">getrobotpos<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">data<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>Finds where the robot is in the map<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<p><strong>data<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/p>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>the position of the robot<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>int, int<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<dl class=\"function\">\n<dt id=\"testqlearner.movebot\"><code class=\"sig-name descname\">movebot<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">data<\/em>, <em class=\"sig-param\">oldpos<\/em>, <em class=\"sig-param\">a<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>move the robot and report reward<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<ul class=\"simple\">\n<li><strong>data<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/li>\n<li><strong>oldpos<\/strong> (<em>int<\/em><em>, <\/em><em>int<\/em>) \u2013 old position of the robot<\/li>\n<li><strong>a<\/strong> (<em>int<\/em>) \u2013 the action to take<\/li>\n<\/ul>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>the new position of the robot and the reward<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>tuple(int, int), int<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<dl class=\"function\">\n<dt id=\"testqlearner.printmap\"><code class=\"sig-name descname\">printmap<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">data<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>Prints out the map<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<p><strong>data<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<dl class=\"function\">\n<dt id=\"testqlearner.test\"><code class=\"sig-name descname\">test<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">map<\/em>, <em class=\"sig-param\">epochs<\/em>, <em class=\"sig-param\">learner<\/em>, <em class=\"sig-param\">verbose<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>function to test the code<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<ul class=\"simple\">\n<li><strong>map<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/li>\n<li><strong>epochs<\/strong> (<em>int<\/em>) \u2013 each epoch involves one trip to the goal<\/li>\n<li><strong>learner<\/strong> (<em>QLearner<\/em>) \u2013 the qlearner object<\/li>\n<li><strong>verbose<\/strong> (<em>bool<\/em>) \u2013 If \u201cverbose\u201d is True, your code can print out information for debugging.<br \/> If verbose = False your code should not generate ANY output. When we test your code, verbose will be False.<\/li>\n<\/ul>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>the total reward<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>np.float64<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><!-- \/divi:html --> <!-- divi:html --><\/p>\n<div class=\"clearer\"><\/div>\n<p><!-- \/divi:html --> <!-- divi:html --><\/p>\n<div class=\"footer\">\u00a92020, ML4T Staff |\u00a0Powered by <a href=\"http:\/\/sphinx-doc.org\/\">Sphinx 2.2.0<\/a> &amp; <a href=\"https:\/\/github.com\/bitprophet\/alabaster\">Alabaster 0.7.12<\/a><\/div>\n<p><!-- \/divi:html --><\/p>\n[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section]\n","protected":false},"excerpt":{"rendered":"<p>Project 7: Q-Learning Robot Documentation QLearner.py class QLearner.QLearner(num_states=100, num_actions=4, alpha=0.2, gamma=0.9, rar=0.5, radr=0.99, dyna=0, verbose=False) This is a Q learner object. Parameters num_states (int) \u2013 The number of states to consider. num_actions (int) \u2013 The number of actions available.. alpha (float) \u2013 The learning rate used in the update rule. Should range between 0.0 and [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":1342,"menu_order":2,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"<!-- wp:html -->\n<meta charset=\"utf-8\">\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<title>Project 7: Q-Learning Robot \u2014 ML4T 1.0 documentation<\/title>\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<link rel=\"stylesheet\" href=\"_static\/alabaster.css\" type=\"text\/css\">\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<link rel=\"stylesheet\" href=\"_static\/pygments.css\" type=\"text\/css\">\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<script type=\"text\/javascript\" id=\"documentation_options\" data-url_root=\".\/\" src=\"_static\/documentation_options.js\"><\/script>\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<script type=\"text\/javascript\" src=\"_static\/jquery.js\"><\/script>\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<script type=\"text\/javascript\" src=\"_static\/underscore.js\"><\/script>\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<script type=\"text\/javascript\" src=\"_static\/doctools.js\"><\/script>\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<script type=\"text\/javascript\" src=\"_static\/language_data.js\"><\/script>\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<link rel=\"index\" title=\"Index\" href=\"genindex.html\">\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<link rel=\"search\" title=\"Search\" href=\"search.html\">\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<link rel=\"next\" title=\"Project 8: Strategy Evaluation\" href=\"strategy_evaluation.html\">\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<link rel=\"prev\" title=\"Project 5: Marketsim\" href=\"marketsim.html\">\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<link rel=\"stylesheet\" href=\"_static\/custom.css\" type=\"text\/css\">\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<meta name=\"viewport\" content=\"width=device-width, initial-scale=0.9, maximum-scale=0.9\">\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<div class=\"document\">\n      <div class=\"documentwrapper\">\n        <div class=\"bodywrapper\">\n          \n\n          <div class=\"body\" role=\"main\">\n            \n  <div class=\"section\" id=\"module-QLearner\">\n<span id=\"project-7-q-learning-robot\"><\/span><h1>Project 7: Q-Learning Robot<a class=\"headerlink\" href=\"#module-QLearner\" title=\"Permalink to this headline\">\u00b6<\/a><\/h1>\n<dl class=\"class\">\n<dt id=\"QLearner.QLearner\">\n<em class=\"property\">class <\/em><code class=\"sig-prename descclassname\">QLearner.<\/code><code class=\"sig-name descname\">QLearner<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">num_states=100<\/em>, <em class=\"sig-param\">num_actions=4<\/em>, <em class=\"sig-param\">alpha=0.2<\/em>, <em class=\"sig-param\">gamma=0.9<\/em>, <em class=\"sig-param\">rar=0.5<\/em>, <em class=\"sig-param\">radr=0.99<\/em>, <em class=\"sig-param\">dyna=0<\/em>, <em class=\"sig-param\">verbose=False<\/em><span class=\"sig-paren\">)<\/span><a class=\"headerlink\" href=\"#QLearner.QLearner\" title=\"Permalink to this definition\">\u00b6<\/a><\/dt>\n<dd><p>This is a Q learner object.<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\"><ul class=\"simple\">\n<li><p><strong>num_states<\/strong> (<em>int<\/em>) \u2013 The number of states to consider.<\/p><\/li>\n<li><p><strong>num_actions<\/strong> (<em>int<\/em>) \u2013 The number of actions available..<\/p><\/li>\n<li><p><strong>alpha<\/strong> (<em>float<\/em>) \u2013 The learning rate used in the update rule. Should range between 0.0 and 1.0 with 0.2 as a typical value.<\/p><\/li>\n<li><p><strong>gamma<\/strong> (<em>float<\/em>) \u2013 The discount rate used in the update rule. Should range between 0.0 and 1.0 with 0.9 as a typical value.<\/p><\/li>\n<li><p><strong>rar<\/strong> (<em>float<\/em>) \u2013 Random action rate: the probability of selecting a random action at each step. Should range between 0.0 (no random actions) to 1.0 (always random action) with 0.5 as a typical value.<\/p><\/li>\n<li><p><strong>radr<\/strong> (<em>float<\/em>) \u2013 Random action decay rate, after each update, rar = rar * radr. Ranges between 0.0 (immediate decay to 0) and 1.0 (no decay). Typically 0.99.<\/p><\/li>\n<li><p><strong>dyna<\/strong> (<em>int<\/em>) \u2013 The number of dyna updates for each regular update. When Dyna is used, 200 is a typical value.<\/p><\/li>\n<li><p><strong>verbose<\/strong> (<em>bool<\/em>) \u2013 If \u201cverbose\u201d is True, your code can print out information for debugging.<\/p><\/li>\n<\/ul>\n<\/dd>\n<\/dl>\n<dl class=\"method\">\n<dt id=\"QLearner.QLearner.query\">\n<code class=\"sig-name descname\">query<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">s_prime<\/em>, <em class=\"sig-param\">r<\/em><span class=\"sig-paren\">)<\/span><a class=\"headerlink\" href=\"#QLearner.QLearner.query\" title=\"Permalink to this definition\">\u00b6<\/a><\/dt>\n<dd><p>Update the Q table and return an action<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\"><ul class=\"simple\">\n<li><p><strong>s_prime<\/strong> (<em>int<\/em>) \u2013 The new state<\/p><\/li>\n<li><p><strong>r<\/strong> (<em>float<\/em>) \u2013 The immediate reward<\/p><\/li>\n<\/ul>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\"><p>The selected action<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\"><p>int<\/p>\n<\/dd>\n<\/dl>\n<\/dd><\/dl>\n\n<dl class=\"method\">\n<dt id=\"QLearner.QLearner.querysetstate\">\n<code class=\"sig-name descname\">querysetstate<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">s<\/em><span class=\"sig-paren\">)<\/span><a class=\"headerlink\" href=\"#QLearner.QLearner.querysetstate\" title=\"Permalink to this definition\">\u00b6<\/a><\/dt>\n<dd><p>Update the state without updating the Q-table<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\"><p><strong>s<\/strong> (<em>int<\/em>) \u2013 The new state<\/p>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\"><p>The selected action<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\"><p>int<\/p>\n<\/dd>\n<\/dl>\n<\/dd><\/dl>\n\n<\/dd><\/dl>\n\n<span class=\"target\" id=\"module-testqlearner\"><\/span><dl class=\"function\">\n<dt id=\"testqlearner.discretize\">\n<code class=\"sig-prename descclassname\">testqlearner.<\/code><code class=\"sig-name descname\">discretize<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">pos<\/em><span class=\"sig-paren\">)<\/span><a class=\"headerlink\" href=\"#testqlearner.discretize\" title=\"Permalink to this definition\">\u00b6<\/a><\/dt>\n<dd><p>convert the location to a single integer<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\"><p><strong>pos<\/strong> (<em>int<\/em><em>, <\/em><em>int<\/em>) \u2013 the position to discretize<\/p>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\"><p>the discretized position<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\"><p>int<\/p>\n<\/dd>\n<\/dl>\n<\/dd><\/dl>\n\n<dl class=\"function\">\n<dt id=\"testqlearner.getgoalpos\">\n<code class=\"sig-prename descclassname\">testqlearner.<\/code><code class=\"sig-name descname\">getgoalpos<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">data<\/em><span class=\"sig-paren\">)<\/span><a class=\"headerlink\" href=\"#testqlearner.getgoalpos\" title=\"Permalink to this definition\">\u00b6<\/a><\/dt>\n<dd><p>find where the goal is in the map<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\"><p><strong>data<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/p>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\"><p>the position of the goal<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\"><p>tuple(int, int)<\/p>\n<\/dd>\n<\/dl>\n<\/dd><\/dl>\n\n<dl class=\"function\">\n<dt id=\"testqlearner.getrobotpos\">\n<code class=\"sig-prename descclassname\">testqlearner.<\/code><code class=\"sig-name descname\">getrobotpos<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">data<\/em><span class=\"sig-paren\">)<\/span><a class=\"headerlink\" href=\"#testqlearner.getrobotpos\" title=\"Permalink to this definition\">\u00b6<\/a><\/dt>\n<dd><p>Finds where the robot is in the map<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\"><p><strong>data<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/p>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\"><p>the position of the robot<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\"><p>int, int<\/p>\n<\/dd>\n<\/dl>\n<\/dd><\/dl>\n\n<dl class=\"function\">\n<dt id=\"testqlearner.movebot\">\n<code class=\"sig-prename descclassname\">testqlearner.<\/code><code class=\"sig-name descname\">movebot<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">data<\/em>, <em class=\"sig-param\">oldpos<\/em>, <em class=\"sig-param\">a<\/em><span class=\"sig-paren\">)<\/span><a class=\"headerlink\" href=\"#testqlearner.movebot\" title=\"Permalink to this definition\">\u00b6<\/a><\/dt>\n<dd><p>move the robot and report reward<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\"><ul class=\"simple\">\n<li><p><strong>data<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/p><\/li>\n<li><p><strong>oldpos<\/strong> (<em>int<\/em><em>, <\/em><em>int<\/em>) \u2013 old position of the robot<\/p><\/li>\n<li><p><strong>a<\/strong> (<em>int<\/em>) \u2013 the action to take<\/p><\/li>\n<\/ul>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\"><p>the new position of the robot and the reward<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\"><p>tuple(int, int), int<\/p>\n<\/dd>\n<\/dl>\n<\/dd><\/dl>\n\n<dl class=\"function\">\n<dt id=\"testqlearner.printmap\">\n<code class=\"sig-prename descclassname\">testqlearner.<\/code><code class=\"sig-name descname\">printmap<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">data<\/em><span class=\"sig-paren\">)<\/span><a class=\"headerlink\" href=\"#testqlearner.printmap\" title=\"Permalink to this definition\">\u00b6<\/a><\/dt>\n<dd><p>Prints out the map<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\"><p><strong>data<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/p>\n<\/dd>\n<\/dl>\n<\/dd><\/dl>\n\n<dl class=\"function\">\n<dt id=\"testqlearner.test\">\n<code class=\"sig-prename descclassname\">testqlearner.<\/code><code class=\"sig-name descname\">test<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">map<\/em>, <em class=\"sig-param\">epochs<\/em>, <em class=\"sig-param\">learner<\/em>, <em class=\"sig-param\">verbose<\/em><span class=\"sig-paren\">)<\/span><a class=\"headerlink\" href=\"#testqlearner.test\" title=\"Permalink to this definition\">\u00b6<\/a><\/dt>\n<dd><p>function to test the code<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\"><ul class=\"simple\">\n<li><p><strong>map<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/p><\/li>\n<li><p><strong>epochs<\/strong> (<em>int<\/em>) \u2013 each epoch involves one trip to the goal<\/p><\/li>\n<li><p><strong>learner<\/strong> (<a class=\"reference internal\" href=\"#QLearner.QLearner\" title=\"QLearner.QLearner\"><em>QLearner<\/em><\/a>) \u2013 the qlearner object<\/p><\/li>\n<li><p><strong>verbose<\/strong> (<em>bool<\/em>) \u2013 If \u201cverbose\u201d is True, your code can print out information for debugging.\nIf verbose = False your code should not generate ANY output. When we test your code, verbose will be False.<\/p><\/li>\n<\/ul>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\"><p>the total reward<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\"><p>np.float64<\/p>\n<\/dd>\n<\/dl>\n<\/dd><\/dl>\n\n<\/div>\n\n\n          <\/div>\n          \n        <\/div>\n      <\/div>\n\n\n\n\n\n\n\n\n\n\n        <\/div>\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<div class=\"clearer\"><\/div>\n<!-- \/wp:html -->\n\n<!-- wp:html -->\n<div class=\"footer\">\n      \u00a92020, ML4T Staff\n      \n      |\n      Powered by <a href=\"http:\/\/sphinx-doc.org\/\">Sphinx 2.2.0<\/a>\n      &amp; <a href=\"https:\/\/github.com\/bitprophet\/alabaster\">Alabaster 0.7.12<\/a>\n\n    <\/div>\n<!-- \/wp:html -->","_et_gb_content_width":"","footnotes":""},"class_list":["post-1333","page","type-page","status-publish","hentry"],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages\/1333","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/comments?post=1333"}],"version-history":[{"count":1,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages\/1333\/revisions"}],"predecessor-version":[{"id":1367,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages\/1333\/revisions\/1367"}],"up":[{"embeddable":true,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages\/1342"}],"wp:attachment":[{"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/media?parent=1333"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}