Machine Learning for Trading Course

Fall 2023 Syllabus

Overview

This course introduces students to the real-world challenges of implementing machine learning-based trading strategies including the algorithmic steps from information gathering to market orders. The focus is on how to apply probabilistic machine learning approaches to trading decisions. We consider statistical approaches like linear regression, Q-Learning, KNN, and regression trees and how to apply them to actual stock trading situations.

This course is composed of three mini-courses:

A set of course notes and example code can be found here: [[1]]

Video Content

The official video content for this course is available on Ed Lessons and also for free at Udacity.

Important note

This course ramps up in difficulty towards the end. The projects in the final 1/3 of the course are challenging. Be prepared.

Instructor information

Tucker Balch, Ph.D.
Professor, Interactive Computing at Georgia Tech
CS 7646 Course Designer
CS 7646 Instructor: Spring 2016, Fall 2016, Spring 2017, Summer 2017 (online), Fall 2017, Spring 2018, Summer 2018, Fall 2018
CIOS reviews Media:2017SummerCIOS.pdf

Maria Hybinette
Associate Professor, Computer Science, University of Georgia
CS 4646 Instructor: Summer 2018

David Byrd
Research Scientist, Interactive Media Technology Center at Georgia Tech
CS 7646 On-Campus Instructor: Summer 2016, Summer 2017, Spring 2018, Fall 2019
CS 7646 Head TA: Spring 2016, Fall 2016, Fall 2017

David Joyner
CS 7646 Online Instructor: Spring 2019, Summer 2019, Fall 2019, Spring 2020, Summer 2020, Fall 202, Spring 2021, Summer 2021, Fall 2021, Spring 2022, Summer 2022, Fall 2022, Spring 2023, Summer 2023, Fall 2023

Joshua Fox
CS 7646 Co-Instructor: Fall 2020, Spring 2021, Summer 2021, Fall 2021, Spring 2022, Summer 2022, Fall 2022, Spring 2023, Summer 2023, Fall 2023
CS 7646 Head TA: Fall 2019, Spring 2020, Summer 2020

Textbooks, Software & Other Resources

We will use the following textbooks:

  • Python for Finance by Yves Hilpisch O’Reilly Digital (optional). You can log in with your GA Tech email for free access.
  • What Hedge Funds Really Do by Romero and Balch amazon.com (required)
  •  Machine Learning by Tom Mitchell (optional)
    • Buy it at: amazon.com
    • Buy a paperback version. See purchase options on the CMU book website.
    • Buy a paperback international version. We are not certain about the reliability of this company: international
  • Introduction to Statistical Learning by James, Witten, Hastie, and Tibshirani statlearning.com (required)
  • Probabilistic Machine Learning: An Introduction by Kevin Murphy github.io (required)
  • Foundations of Deep Reinforcement Learning: Theory and Practice in Python by Graesser and Keng O’Reilly Digital. You can log in with your GA Tech email for free access. (optional)

Software:

Other resources:

Prerequisites/Co-requisites

All types of students are welcome! The Machine Learning topics might be a review for CS students, while finance parts will be a review for finance students. However, even if you have experience in these topics, you will find that we consider them in a different way than you might have seen before, in particular with an eye towards implementation for trading.

If you answer “no” to the following questions, it may be beneficial to refresh your knowledge of the prerequisite material prior to taking CS 7646:

  • Do you have a working knowledge of basic statistics, including probability distributions (such as normal and uniform), calculation, and differences between mean, median, and mode
  • Do you understand the difference between geometric mean and arithmetic mean?
  • Do you have strong programming skills?

Who this course is for: The course is intended for people with strong software programming experience and introductory-level knowledge of investment practice. A primary prerequisite is an interest in and excitement about the stock market.

The software we’ll use: In order to complete the programming assignments you will need a development environment that you’re comfortable with. We use Unix, but you can also work with Windows and Mac OS environments. You must download and install a set of Python modules to your computer (including NumPy, SciPy, and Pandas).

How to install the software: ML4T Software Setup

Logistics

  • All course content for the course is delivered via Canvas. To access it, go to Canvas, click this course, and then click Start Here to get started!
  • We will use Canvas for ALL report submissions: To access it, go to Canvas, click this course, click assignments.
  • We will use Gradescope for ALL code submissions: To access it, go to Canvas, click this course, click Gradescope.
  • We will use Ed Discussion for interaction and discussion. To access it, go to Canvas, click this course, click Ed Discussion.

 

Grading

  • A: 90% and above
  • B: 80% and above
  • C: 70% and above
  • D: 60% and above
  • F: below 60%

Students taking the course Pass/Fail must earn at least a 75% to pass.

We do not encourage “audit” students. If you are in the course on audit status, you must earn at least a “B” on the midterm.

See semester syllabus for assignment weights.

Minimum technical requirements

  • Browser and connection speed: An up-to-date version of Chrome or Firefox is strongly recommended. We also support Internet Explorer 9 and the desktop versions of Internet Explorer 10 and above (not the metro versions). 2+ Mbps recommended; at minimum 0.768 Mbps download speed.
  • Hardware: A computer with at least 4GB of RAM and CPU speed of at least 2.5GHz.
  • For code development and testing, these three configurations will work
    • PC: Windows XP or higher with latest updates installed
    • Mac: OS X 10.6 or higher with latest updates installed
    • Linux: Any recent distribution that has the supported browsers installed
  • For online test taking (Honorlock) you will need Chrome browser with the Honorlock Chrome extension and one of:
    • PC: Windows 10
    • Mac: OS X 10.13 or higher
    • ChromeOS
    • Linux is NOT supported.

Office hours

Most of our Teaching Assistants will hold weekly office hours using Hangouts, Webex, or another teleconferencing tool. Office hours are not recorded and are intended for more individually-focused help and conversations. If anything comes up during office hours that are relevant to the entire class, it will be shared via Ed Discussion.

A schedule of office hours will be made available via Ed Discussion early in the semester.

Plagiarism

In most cases I expect that all submitted code will be written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write-ups you provide should have been created by you alone.

If we discover that you have submitted assignment material created by another student, either from a previous semester or in the current session, you will be assigned a 0 for the relevant project.

Class Policies

  • For Pass/Fail students: Your overall grade must be 75% or higher to get a passing grade.
  • Official communication is by email: We use Ed Discussion for discussions, but it is not an official communication channel.
  • Student responsibilities: Be aware of the deadlines posted on the schedule. Start work on projects even if they are not open on Canvas.
  • Grade contest period: After a project grade is released, you have 7 days to contest the grade. After that time, projects will not be re-evaluated. You must have a particular issue with a compelling argument as to why your grade is incorrect. Example compelling argument: “The TA took 10 points off because I was missing a chart, but the chart is visible on page 5.” Example, not a compelling argument: “I think I should have gotten more points, please regrade my project.”
  • Grade contest process: Instruction to be released before Project 1 grade release.
  • Late policy: See CS7646 Fall 2023 – Late_Work
  • Exam scheduling: Exams will be held on specific days at specific times. If there is an emergency or other issue that requires changing the date of an exam for you, you will need to have it approved by the Dean of Students. You can apply for that here: http://www.deanofstudents.gatech.edu (under Resources -> Class Absences)
  • Each project for this course has its own page on this wiki. That description includes a list of specific deliverables and usually a rubric. Be sure to double-check your submission against those, so you don’t miss anything.
  • We require that your code run properly on Gradescope.
  • If a problem exists with your submitted code, we will not consider reassessing it if it has not been tested as described above.
  • Most projects will be accompanied by the template code and grading code that you can use to test your project. Your code must pass the grading checks we provide, but the final batch tests may be more rigorous. Be sure to examine the rubrics in the project description to be sure your code meets them.
  • Once you are satisfied with your code, submit the EXACT same working code via Gradescope.
  • It is a good idea to submit a version of your working code early (before the deadline) if some problem arises with your internet connection or Canvas.
  • The latest timestamp on any part of your submission will be used as the time of submission for your whole project. Accordingly, do not resubmit anything after the deadline, or it will be considered late.
  • After the submission deadline, we will test your code on one of our servers configured identically to the ones available for testing on Gradescope.