Introduction to Machine Learning

Author

Nick Ulle

Published

April 14, 2024

Overview

This collection of workshops provides an introduction to machine learning. The collection has two standalone parts:

  • Overview of Machine Learning (one 2-hour session): This is a non-technical workshop that emphasizes building vocabulary and gaining an intuitive understanding of machine learning concepts and methods. Start here if you’re new to machine learning and want to get a sense of what it’s about and whether it’s relevant to you. There’s no code in this workshop and only a little (high-school level) math. This workshop is also good preparation for the Machine Learning in R series.

    After completing this workshop, learners should be able to:

    • Define the following terms: observation, feature, machine learning, supervised learning, unsupervised learning, regression, classification, clustering, training set, validation set, test set, cross-validation, overfitting, underfitting, model bias, model variance, bias-variance tradeoff, ensemble model.
    • Explain the difference between supervised and unsupervised learning.
    • Explain the difference between regression and classification.
    • List and briefly describe popular machine learning methods.
    • Give an example of an ensemble model.
    • Explain what cross-validation is used for and give an overview of the procedure.
    • Assess whether and which machine learning methods might be helpful for a given research problem.
    Important

    This slide deck is the only material for this workshop.

  • Machine Learning in R (two 2-hour sessions): This is a hands-on, technical introduction to using machine learning methods in R. The two sessions cover and include examples of supervised learning (emphasis on classification), model evaluation, unsupervised learning (emphasis on clustering), and dimension reduction. The sessions also provide advice for navigating R’s fractured machine learning package landscape. Intermediate familiarity with R programming (equivalent to completing DataLab’s R Basics workshop series) is required.

    After completing this series, learners should be able to:

    • Build and train a classification model on their data.
    • Use cross-validation to estimate accuracy and tune hyperparameters for classification models.
    • Identify strategies to improve results from classification models.
    • Explain the tradeoffs between popular clustering algorithms.
    • Run a clustering algorithm on their data.