{"id":3729,"date":"2023-05-15T20:07:04","date_gmt":"2023-05-15T20:07:04","guid":{"rendered":"https:\/\/lucylabs.gatech.edu\/ml4t\/?page_id=3729"},"modified":"2023-05-15T20:11:08","modified_gmt":"2023-05-15T20:11:08","slug":"project-7-documentation","status":"publish","type":"page","link":"https:\/\/lucylabs.gatech.edu\/ml4t\/summer2023\/project-7-documentation\/","title":{"rendered":"PROJECT 7: Q-LEARNING ROBOT"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;post_content&#8221;][et_pb_row _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;post_content&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;|||&#8221; global_colors_info=&#8221;{}&#8221; custom_padding__hover=&#8221;|||&#8221; theme_builder_area=&#8221;post_content&#8221;][et_pb_text _builder_version=&#8221;4.16&#8243; _module_preset=&#8221;0eb20e7f-3c16-46bb-9826-a50862b0674a&#8221; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;post_content&#8221;]<\/p>\n<div class=\"document\">\n<div class=\"documentwrapper\">\n<div class=\"bodywrapper\">\n<div class=\"body\" role=\"main\">\n<div class=\"section\" id=\"module-QLearner\">\n<h1 style=\"text-align: center;\">Project 7: Q-Learning Robot Documentation<\/h1>\n<dl class=\"class\">\n<dt><em class=\"property\"><\/em><\/dt>\n<\/dl>\n<h2><span class=\"target\"><\/span><\/h2>\n<h2 style=\"text-align: left;\"><span style=\"text-decoration: underline;\"><span class=\"target\">QLearner.py<\/span><\/span><\/h2>\n<p><span class=\"target\"><\/span><\/p>\n<dl class=\"class\">\n<dt><em class=\"property\"><\/em><\/dt>\n<dt id=\"QLearner.QLearner\"><em class=\"property\">class <\/em><code class=\"sig-prename descclassname\">QLearner.<\/code><code class=\"sig-name descname\">QLearner<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">num_states=100<\/em>, <em class=\"sig-param\">num_actions=4<\/em>, <em class=\"sig-param\">alpha=0.2<\/em>, <em class=\"sig-param\">gamma=0.9<\/em>, <em class=\"sig-param\">rar=0.5<\/em>, <em class=\"sig-param\">radr=0.99<\/em>, <em class=\"sig-param\">dyna=0<\/em>, <em class=\"sig-param\">verbose=False<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>This is a Q learner object.<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<ul class=\"simple\">\n<li><strong>num_states<\/strong> (<em>int<\/em>) \u2013 The number of states to consider.<\/li>\n<li><strong>num_actions<\/strong> (<em>int<\/em>) \u2013 The number of actions available..<\/li>\n<li><strong>alpha<\/strong> (<em>float<\/em>) \u2013 The learning rate used in the update rule. Should range between 0.0 and 1.0 with 0.2 as a typical value.<\/li>\n<li><strong>gamma<\/strong> (<em>float<\/em>) \u2013 The discount rate used in the update rule. Should range between 0.0 and 1.0 with 0.9 as a typical value.<\/li>\n<li><strong>rar<\/strong> (<em>float<\/em>) \u2013 Random action rate: the probability of selecting a random action at each step. Should range between 0.0 (no random actions) to 1.0 (always random action) with 0.5 as a typical value.<\/li>\n<li><strong>radr<\/strong> (<em>float<\/em>) \u2013 Random action decay rate, after each update, rar = rar * radr. Ranges between 0.0 (immediate decay to 0) and 1.0 (no decay). Typically 0.99.<\/li>\n<li><strong>dyna<\/strong> (<em>int<\/em>) \u2013 The number of dyna updates for each regular update. When Dyna is used, 200 is a typical value.<\/li>\n<li><strong>verbose<\/strong> (<em>bool<\/em>) \u2013 If \u201cverbose\u201d is True, your code can print out information for debugging.<\/li>\n<\/ul>\n<\/dd>\n<\/dl>\n<dl class=\"method\">\n<dt id=\"QLearner.QLearner.query\"><code class=\"sig-name descname\">query<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">s_prime<\/em>, <em class=\"sig-param\">r<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>Update the Q table and return an action<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<ul class=\"simple\">\n<li><strong>s_prime<\/strong> (<em>int<\/em>) \u2013 The new state<\/li>\n<li><strong>r<\/strong> (<em>float<\/em>) \u2013 The immediate reward<\/li>\n<\/ul>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>The selected action<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>int<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<dl class=\"method\">\n<dt id=\"QLearner.QLearner.querysetstate\"><code class=\"sig-name descname\">querysetstate<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">s<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>Update the state without updating the Q-table<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<p><strong>s<\/strong> (<em>int<\/em>) \u2013 The new state<\/p>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>The selected action<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>int<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<p><span class=\"target\" id=\"module-testqlearner\"><\/span><\/p>\n<p><span class=\"target\"><\/span><\/p>\n<p><span class=\"target\"><\/span><\/p>\n<h2 style=\"text-align: left;\"><span style=\"text-decoration: underline;\"><span class=\"target\">testqlearner.py<\/span><\/span><\/h2>\n<p><span class=\"target\"><\/span><\/p>\n<p><span class=\"target\"><\/span><\/p>\n<dl class=\"function\">\n<dt id=\"testqlearner.discretize\"><code class=\"sig-name descname\">discretize<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">pos<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>convert the location to a single integer<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<p><strong>pos<\/strong> (<em>int<\/em><em>, <\/em><em>int<\/em>) \u2013 the position to discretize<\/p>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>the discretized position<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>int<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<dl class=\"function\">\n<dt id=\"testqlearner.getgoalpos\"><code class=\"sig-name descname\">getgoalpos<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">data<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>find where the goal is in the map<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<p><strong>data<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/p>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>the position of the goal<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>tuple(int, int)<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<dl class=\"function\">\n<dt id=\"testqlearner.getrobotpos\"><code class=\"sig-name descname\">getrobotpos<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">data<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>Finds where the robot is in the map<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<p><strong>data<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/p>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>the position of the robot<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>int, int<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<dl class=\"function\">\n<dt id=\"testqlearner.movebot\"><code class=\"sig-name descname\">movebot<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">data<\/em>, <em class=\"sig-param\">oldpos<\/em>, <em class=\"sig-param\">a<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>move the robot and report reward<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<ul class=\"simple\">\n<li><strong>data<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/li>\n<li><strong>oldpos<\/strong> (<em>int<\/em><em>, <\/em><em>int<\/em>) \u2013 old position of the robot<\/li>\n<li><strong>a<\/strong> (<em>int<\/em>) \u2013 the action to take<\/li>\n<\/ul>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>the new position of the robot and the reward<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>tuple(int, int), int<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<dl class=\"function\">\n<dt id=\"testqlearner.printmap\"><code class=\"sig-name descname\">printmap<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">data<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>Prints out the map<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<p><strong>data<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<dl class=\"function\">\n<dt id=\"testqlearner.test\"><code class=\"sig-name descname\">test<\/code><span class=\"sig-paren\">(<\/span><em class=\"sig-param\">map<\/em>, <em class=\"sig-param\">epochs<\/em>, <em class=\"sig-param\">learner<\/em>, <em class=\"sig-param\">verbose<\/em><span class=\"sig-paren\">)<\/span><\/dt>\n<dd>\n<p>function to test the code<\/p>\n<dl class=\"field-list simple\">\n<dt class=\"field-odd\">Parameters<\/dt>\n<dd class=\"field-odd\">\n<ul class=\"simple\">\n<li><strong>map<\/strong> (<em>array<\/em>) \u2013 2D array that stores the map<\/li>\n<li><strong>epochs<\/strong> (<em>int<\/em>) \u2013 each epoch involves one trip to the goal<\/li>\n<li><strong>learner<\/strong> (<em>QLearner<\/em>) \u2013 the qlearner object<\/li>\n<li><strong>verbose<\/strong> (<em>bool<\/em>) \u2013 If \u201cverbose\u201d is True, your code can print out information for debugging.<br \/> If verbose = False your code should not generate ANY output. When we test your code, verbose will be False.<\/li>\n<\/ul>\n<\/dd>\n<dt class=\"field-even\">Returns<\/dt>\n<dd class=\"field-even\">\n<p>the total reward<\/p>\n<\/dd>\n<dt class=\"field-odd\">Return type<\/dt>\n<dd class=\"field-odd\">\n<p>np.float64<\/p>\n<\/dd>\n<\/dl>\n<\/dd>\n<\/dl>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><!-- \/divi:html --> <!-- divi:html --><\/p>\n<div class=\"clearer\"><\/div>\n<p><!-- \/divi:html --> <!-- divi:html --><\/p>\n<div class=\"footer\">\u00a92020, ML4T Staff |\u00a0Powered by <a href=\"http:\/\/sphinx-doc.org\/\">Sphinx 2.2.0<\/a> &amp; <a href=\"https:\/\/github.com\/bitprophet\/alabaster\">Alabaster 0.7.12<\/a><\/div>\n<p><!-- \/divi:html --><\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Project 7: Q-Learning Robot Documentation QLearner.py class QLearner.QLearner(num_states=100, num_actions=4, alpha=0.2, gamma=0.9, rar=0.5, radr=0.99, dyna=0, verbose=False) This is a Q learner object. Parameters num_states (int) \u2013 The number of states to consider. num_actions (int) \u2013 The number of actions available.. alpha (float) \u2013 The learning rate used in the update rule. Should range between 0.0 and [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":3602,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"class_list":["post-3729","page","type-page","status-publish","hentry"],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages\/3729","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/comments?post=3729"}],"version-history":[{"count":2,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages\/3729\/revisions"}],"predecessor-version":[{"id":3743,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages\/3729\/revisions\/3743"}],"up":[{"embeddable":true,"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/pages\/3602"}],"wp:attachment":[{"href":"https:\/\/lucylabs.gatech.edu\/ml4t\/wp-json\/wp\/v2\/media?parent=3729"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}