ML4T Local Environment

Revisions

Overview

The assignments in this class are in Python (version 3.6) and rely heavily on a few important libraries. These libraries are under active development, which unfortunately means that there can be some compatibility issues between versions. The information on this page describes the local environment that will mirror the one that is used during testing. A local development environment is required for the development and testing of the code that satisfies each projects’ requirements. We strongly recommended establishing a local Linux project environment as described below 

Should you decide to define and use a non-Linux environment, please keep in mind that your code MUST run correctly on GradeScope, which using a Linux environment like as described below. If your code fails to run on Gradescope, stating that “it works on my machine” is not a valid excuse and you will receive no credit. 

Note: This course began using Python 3.6 in Fall 2019. Be careful to avoid using course Wiki pages, environment configuration instructions, and project templates from prior terms. Please use the material associated with this term only. 

REQUIREMENTS 

  • Dual Operating System Environment: 
    • Windows or macOS for Exams 
    • Linux for Projects. We strongly recommend Ubuntu LTS or MX Linux. Linux can be installed in one of several ways, including dual-boot, bootable USB, Docker, Vagrant, Windows Subsystem for Linux 2 (WSL2), and in Virtual Machines (VMs) (e.g., VMWare Workstation Player, VirtualBox, or Parallels). 

While the course does not provide a local development environment, we will demonstrate how to create a local environment using Linux and a Virtual Machine in the course discussion forum. 

  • Remote (Private) Backup 
    • Georgia Tech GitHub, GitHub (set to private), DropBox, or OneDrive, etc.  

Backups are primarily for your security, so that if something should happen that results in a loss of code (e.g., broken or stolen laptop, accidentally deleted files) you can recover your work and be up and running quickly. For example, a VM can be installed with the code from a remote backup in approximately one hour. 

LOCAL ENVIRONMENT SETUP 

To ensure that your local environment is compatible with the Gradescope environment that will be available for remote testing and used to execute and grade project submissions, we recommend that the local environment use the exact same library and package versions. Below is a list of each package and version number, provided in the Conda Environment format. 

LOCAL FILESYSTEM SETUP 

If you are familiar with conda, you can use this to create an environment for this class that matches those version numbers. Here is an outline: 

  1. Install miniconda or anaconda (if it is not already installed). Save the above YML fragment as environment.yml. 
  2. Create an environment for this class: 

    3. Activate the new environment:

    Please see the ML4T_Software Setup page for information on how to set up, run, and check out the code scaffolding for the projects. This scaffolding, which is installed once, is required for each project. Once installed you should have a directory folder that looks like the image below: 

    LOCAL PROJECT FILES SETUP 

    Please use the project templates associated with each project. Each template will create a new directory at the same level as the data and grading folders. These templates can be downloaded at once at the start of the term, or before beginning to work on each project. 

    IDEs 

    We recommend using an IDE, which will aid development and debugging. In past terms, students have primarily used PyCharm, Spyder, and Visual Studio. Some students have also used Jupyter Notebook.  

    Note: Assignments will require python .py files. Jupyter notebook files will not be accepted. 

    ADVISORIES 

    • Apple M1-Based Computers: We are unaware of good virtualization (e.g., VMWare, VirtualBox) or container (e.g., Docker) solutions for Apple’s Mx-based machines. Students using Mx-based Macs are encouraged to move to an intel-based platform or follow the “M1 Mac Conda” adjustment at the bottom of this page. 
    • Using Windows or macOS for projects: While these instructions should work for Windows and macOS, we have observed differences that can prevent projects from executing locally. While some students successfully completed the projects using Windows or macOS, others could not overcome some of the challenges and moved to Linux mid-course. Since switching OSes mid-term can be stressful, we strongly recommend using Linux for projects. 
    • Honorlock: HonorLock (the software we use for exam proctoring) may run on Linux; however, the vendor does not support or recommend its use. Should a problem arise during testing, they will not work with you to attempt to resolve the issue. 
    • Virtual Machine OS Storage: If using a Virtual Machine, the storage used by the VM on which the Linux Operating System is installed should be on the local machine rather than cloud storage (e.g., OneDrive, Google Drive). 
    • Local Environment: The util.py file and the data and grading folders in the main folder are considered part of the environment. They must not be moved, copied, or edited. 
    • Datasets: We use a specific, static dataset for this course, which we will provide. If you download your own data from Google, Yahoo, or elsewhere, you will get incorrect answers on the assignments. 
    • Use Python 3.6: While the course videos will provide examples that use Python 2.x, this course will use Python 3.6 for all projects and exams. Please note that there are some language differences between Python 2 and Python 3. There are also some differences in library/package versions (see Chapter 1 of Pandas for Everyone for differences in Pandas indexing). Also, be aware that there are language features in higher python versions (e.g., 3.7, 3.8, 3.9) that are not present in 3.6, such that if they are used in your code, they could pass in the local environment but fail when submitted to Gradescope (which uses Python 3.6). 

    KNOWN LOCAL ENVIRONMENT CHALLENGES AND WORKAROUNDS 

    Matplotlib on Mac 

    If you are using a Mac and when attempting to plot charts, you get an exception with a stack trace, including a mention of libtk and tkinter, try the following to change the backend 

    “Freeze_Support” or “AttributeError: Can’t pickle local object” error messages on Windows 10 

    Some students have modified the local test scripts to remove the lines resulting in this message. Since those changes result in reduced functionality in the local test script, this approach should be used with caution. We recommend moving to Linux and confirming that the Conda environment is activated. 

    “ConnectionRefusedError: [Errno 111] Connection refused” message using PyCharm 

    This error has been observed in specific instances when running code within PyCharm. The workaround is to execute the code from the terminal or from a different IDE. 

    M1 Mac Conda Setup (Student Supplied)

    Copy the environment.yaml in the instructions file to your local machine and specify Rosetta mode by creating the environment using the following command:

    CONDA_SUBDIR=osx-64 conda env create -f environment.yml

    which should resolve the ResolvePackageNotFound issues.

    For background, Anaconda just recently started supporting ARM64 architectures. Now that Anaconda (and miniforge3) support this architecture natively, they use ARM64 compiled packages exclusively. The Python project announced that Python 3.7 and earlier would not be released for this architecture, so conda is only able to resolve Python 3.8 and later by default.

    Setting the environment variable CONDA_SUBDIR=osx-64 forces conda to install Intel packages. These run under emulation and will have a performance impact.

    Attempted with both miniforge3-4.10.3-10 and anaconda3-2022.05.