Project 7: Q-Learning Robot Documentation
QLearner.py
- class QLearner.QLearner(num_states=100, num_actions=4, alpha=0.2, gamma=0.9, rar=0.5, radr=0.99, dyna=0, verbose=False)
- 
This is a Q learner object. - Parameters
- 
- num_states (int) – The number of states to consider.
- num_actions (int) – The number of actions available..
- alpha (float) – The learning rate used in the update rule. Should range between 0.0 and 1.0 with 0.2 as a typical value.
- gamma (float) – The discount rate used in the update rule. Should range between 0.0 and 1.0 with 0.9 as a typical value.
- rar (float) – Random action rate: the probability of selecting a random action at each step. Should range between 0.0 (no random actions) to 1.0 (always random action) with 0.5 as a typical value.
- radr (float) – Random action decay rate, after each update, rar = rar * radr. Ranges between 0.0 (immediate decay to 0) and 1.0 (no decay). Typically 0.99.
- dyna (int) – The number of dyna updates for each regular update. When Dyna is used, 200 is a typical value.
- verbose (bool) – If “verbose” is True, your code can print out information for debugging.
 
 - query(s_prime, r)
- 
Update the Q table and return an action - Parameters
- 
- s_prime (int) – The new state
- r (float) – The immediate reward
 
- Returns
- 
The selected action 
- Return type
- 
int 
 
 - querysetstate(s)
- 
Update the state without updating the Q-table - Parameters
- 
s (int) – The new state 
- Returns
- 
The selected action 
- Return type
- 
int 
 
 
testqlearner.py
- discretize(pos)
- 
convert the location to a single integer - Parameters
- 
pos (int, int) – the position to discretize 
- Returns
- 
the discretized position 
- Return type
- 
int 
 
- getgoalpos(data)
- 
find where the goal is in the map - Parameters
- 
data (array) – 2D array that stores the map 
- Returns
- 
the position of the goal 
- Return type
- 
tuple(int, int) 
 
- getrobotpos(data)
- 
Finds where the robot is in the map - Parameters
- 
data (array) – 2D array that stores the map 
- Returns
- 
the position of the robot 
- Return type
- 
int, int 
 
- movebot(data, oldpos, a)
- 
move the robot and report reward - Parameters
- 
- data (array) – 2D array that stores the map
- oldpos (int, int) – old position of the robot
- a (int) – the action to take
 
- Returns
- 
the new position of the robot and the reward 
- Return type
- 
tuple(int, int), int 
 
- printmap(data)
- 
Prints out the map - Parameters
- 
data (array) – 2D array that stores the map 
 
- test(map, epochs, learner, verbose)
- 
function to test the code - Parameters
- 
- map (array) – 2D array that stores the map
- epochs (int) – each epoch involves one trip to the goal
- learner (QLearner) – the qlearner object
- verbose (bool) – If “verbose” is True, your code can print out information for debugging.
 If verbose = False your code should not generate ANY output. When we test your code, verbose will be False.
 
- Returns
- 
the total reward 
- Return type
- 
np.float64 
 
