Project 7: Q-Learning Robot Documentation

QLearner.py

class QLearner.QLearner(num_states=100, num_actions=4, alpha=0.2, gamma=0.9, rar=0.5, radr=0.99, dyna=0, verbose=False)

This is a Q learner object.

Parameters
  • num_states (int) – The number of states to consider.
  • num_actions (int) – The number of actions available..
  • alpha (float) – The learning rate used in the update rule. Should range between 0.0 and 1.0 with 0.2 as a typical value.
  • gamma (float) – The discount rate used in the update rule. Should range between 0.0 and 1.0 with 0.9 as a typical value.
  • rar (float) – Random action rate: the probability of selecting a random action at each step. Should range between 0.0 (no random actions) to 1.0 (always random action) with 0.5 as a typical value.
  • radr (float) – Random action decay rate, after each update, rar = rar * radr. Ranges between 0.0 (immediate decay to 0) and 1.0 (no decay). Typically 0.99.
  • dyna (int) – The number of dyna updates for each regular update. When Dyna is used, 200 is a typical value.
  • verbose (bool) – If “verbose” is True, your code can print out information for debugging.
query(s_prime, r)

Update the Q table and return an action

Parameters
  • s_prime (int) – The new state
  • r (float) – The immediate reward
Returns

The selected action

Return type

int

querysetstate(s)

Update the state without updating the Q-table

Parameters

s (int) – The new state

Returns

The selected action

Return type

int

testqlearner.py

discretize(pos)

convert the location to a single integer

Parameters

pos (int, int) – the position to discretize

Returns

the discretized position

Return type

int

getgoalpos(data)

find where the goal is in the map

Parameters

data (array) – 2D array that stores the map

Returns

the position of the goal

Return type

tuple(int, int)

getrobotpos(data)

Finds where the robot is in the map

Parameters

data (array) – 2D array that stores the map

Returns

the position of the robot

Return type

int, int

movebot(data, oldpos, a)

move the robot and report reward

Parameters
  • data (array) – 2D array that stores the map
  • oldpos (int, int) – old position of the robot
  • a (int) – the action to take
Returns

the new position of the robot and the reward

Return type

tuple(int, int), int

printmap(data)

Prints out the map

Parameters

data (array) – 2D array that stores the map

test(map, epochs, learner, verbose)

function to test the code

Parameters
  • map (array) – 2D array that stores the map
  • epochs (int) – each epoch involves one trip to the goal
  • learner (QLearner) – the qlearner object
  • verbose (bool) – If “verbose” is True, your code can print out information for debugging.
    If verbose = False your code should not generate ANY output. When we test your code, verbose will be False.
Returns

the total reward

Return type

np.float64