Project 7: Q-Learning Robot Documentation
QLearner.py
- class
QLearner.
QLearner
(num_states=100, num_actions=4, alpha=0.2, gamma=0.9, rar=0.5, radr=0.99, dyna=0, verbose=False) -
This is a Q learner object.
- Parameters
-
- num_states (int) – The number of states to consider.
- num_actions (int) – The number of actions available..
- alpha (float) – The learning rate used in the update rule. Should range between 0.0 and 1.0 with 0.2 as a typical value.
- gamma (float) – The discount rate used in the update rule. Should range between 0.0 and 1.0 with 0.9 as a typical value.
- rar (float) – Random action rate: the probability of selecting a random action at each step. Should range between 0.0 (no random actions) to 1.0 (always random action) with 0.5 as a typical value.
- radr (float) – Random action decay rate, after each update, rar = rar * radr. Ranges between 0.0 (immediate decay to 0) and 1.0 (no decay). Typically 0.99.
- dyna (int) – The number of dyna updates for each regular update. When Dyna is used, 200 is a typical value.
- verbose (bool) – If “verbose” is True, your code can print out information for debugging.
query
(s_prime, r)-
Update the Q table and return an action
- Parameters
-
- s_prime (int) – The new state
- r (float) – The immediate reward
- Returns
-
The selected action
- Return type
-
int
querysetstate
(s)-
Update the state without updating the Q-table
- Parameters
-
s (int) – The new state
- Returns
-
The selected action
- Return type
-
int
testqlearner.py
discretize
(pos)-
convert the location to a single integer
- Parameters
-
pos (int, int) – the position to discretize
- Returns
-
the discretized position
- Return type
-
int
getgoalpos
(data)-
find where the goal is in the map
- Parameters
-
data (array) – 2D array that stores the map
- Returns
-
the position of the goal
- Return type
-
tuple(int, int)
getrobotpos
(data)-
Finds where the robot is in the map
- Parameters
-
data (array) – 2D array that stores the map
- Returns
-
the position of the robot
- Return type
-
int, int
movebot
(data, oldpos, a)-
move the robot and report reward
- Parameters
-
- data (array) – 2D array that stores the map
- oldpos (int, int) – old position of the robot
- a (int) – the action to take
- Returns
-
the new position of the robot and the reward
- Return type
-
tuple(int, int), int
printmap
(data)-
Prints out the map
- Parameters
-
data (array) – 2D array that stores the map
test
(map, epochs, learner, verbose)-
function to test the code
- Parameters
-
- map (array) – 2D array that stores the map
- epochs (int) – each epoch involves one trip to the goal
- learner (QLearner) – the qlearner object
- verbose (bool) – If “verbose” is True, your code can print out information for debugging.
If verbose = False your code should not generate ANY output. When we test your code, verbose will be False.
- Returns
-
the total reward
- Return type
-
np.float64