MPH/SWDT 5139: Machine learning using health data
(Fall 2024)

Role: Teaching Assistant

Office hours:

  • Fridays 4-5pm, 05 Brown Hall or Zoom (refer to syllabus for link)
  • If you are unable to make it to my office hours, please email me. I am generally free from 4p.m.-12a.m. from Mondays to Saturdays. Sunday is the Lords day, attempts to request for office hours then will be met without a response.
  • Office hours are a great opportunity to ask about (1) specific problems that you may have encountered in your homework assignments, (2) seek clarification for ML concepts that were not clear to you (this includes situations where you wish to understand the detailed mathematical proofs behind each model, I will gladly go through them step-by-step), and (3) inquire about how you can possibly incorporate AI into your current research.
  • Please do not come to office hours with no attempts on the assignments, and expect me to complete all the questions for you. A more efficient way of doing that is to use this cool technology called ChatGPT. Also, please do not come into office hours to debate with me about how AI will take over the world and destroy humanity or how chatGPT is filthy without knowing how a transformer model works. If you wish to do that, send me an email, and we can chat over coffee. Coffee is expected to be on you.

Course description: This graduate-level course aims to equip students with a background in healthcare or the social sciences with the applicable machine learning skills needed to become well-rounded applied data scientists. In today’s big data world, we witness many people creating fanciful LinkedIn job descriptions, such as ‘behavioral data scientist’ or ‘e-commerce data scientist’. However, in reality, all they do is run 20 million linear regressions using the same line of R code they wrote 10 years ago. This course teaches popular machine learning (ML) models using Python and their applications on health data. The topics include (1) Python programming basics (e.g., coding with Python, Python modules such as NumPy, Pandas, Matplotlib, and Scikit-learn); (2) Classification ML models (e.g., logistic regression, KNN, Naive Bayes); (3) Regression ML models (e.g., Linear Regression, Lasso Regression); (4) ML model training and validation; (5) Support vector machines and decision trees; (6) Ensemble methods; (7) Dimensionality reduction; and (8) Unsupervised learning techniques (e.g., hdbscan, k-means).