_______

Topic

Materials

15 Jan 
Syllabus
Group Activity: What is data science? 

17 Jan 
Lecture 1 – Data Science
 What is data science?
 What is machine learning?
 DS workflow
 Data Representation

 slides: Introduction
 worksheet 1
 [DSFS] Ch1
 [DSFS] Ch11 (p141142)
 Modeling
 What is Machine Learning?
 [PDSH] Preface (p xixii)
 What is Data Science?
 Why Python?
 [PDSH] Ch5 (p331342)
 What is Machine Learning?
 9 Data Science Problems

22 Jan 
Lab 1 – Plant Species Classification
 Data Exploration with Python
 NumPy Arrays

 materials: Lab1
 [DSFS] Ch2 Python
 [PDSH] Ch1
 IPython (all about notebooks, skip stuff about the shell)
 [PDSH] Ch2 NumPy (p3363, p7885)
 Data Types in Python
 Basic of NumPy Arrays
 Computation on NumPy Arrays
 Aggregations
 Fancy Indexing

24 Jan 
Lecture 2 – Exploratory Data Analysis
 Data Types
 Data Representation
 Dataset Statistics
 Visualization

 slides: EDA
 worksheet 2
 [DSFS] Ch3 Visualizing Data
 [DSFS] Ch10 (p121132)
 Exploring your Data
 Cleaning and Munging
 Manipulating Data
 [PDSH] Ch4

29 Jan 
Lab 2 – Analyzing the MoMA Data
 EDA Process
 Posing Data Questions
 Answering Data Questions
 Pandas DataFrames

 materials: Lab2
 [PDSH] Ch3 Pandas (p97114)
 Pandas Objects
 Data Indexing and Selection
 more answers

31 Jan 
Lecture 3 – Sentiment Analysis
 Working with Text Data
 Scraping Data from the Web
 Sentiment Prediction
 Error Rate and Accuracy

 slides: Sentiment Analysis
 worksheet 3
 [DSFS] Ch4
 [DSFS] Ch9
 Reading Files (p105108)
 Using APIs (p114117)
 Example: Twitter APIs (p117120)
 [DSFS] Ch20 (p239244)
 Word Clouds
 ngram Models

5 Feb 
Lab 3 – Analyzing Movie Reviews
 RuleBased Sentiment Prediction
 Sentiment Classifier
 Evaluation and Model Comparison

 materials: Lab3
 [PDSH] Ch5 (p343359)

7 Feb 
Lecture 4 – Regression
 LeastSquares Method
 Linear vs Polynomial Regression
 Model Complexity
 RMSE and MAE


12 Feb 
Lab 4 – Predicting Housing Prices
 Implement 1D Linear Regression
 Data Exploration
 Evaluation via RMSE

 materials: Lab4
 [DSFS] Ch14 Multiple Regression (p179183)
 The Model
 Least Squares Model
 Fitting the Model
 Interpreting the Model
 Goodness of Fit

14 Feb 
Lecture 5 – Logistic Regression
 Decision Boundary
 Probabilistic Classifier
 Sigmoid/Logistic Function
 Likelihood
 Confusion Matrix

 slides: see lecture below
 [DSFS] Ch16 Logistic Regression
 [PDSH] Ch5 ScikitLearn
 Classification on Digits (p357)

19 Feb 
Lab 5 – Detecting Breast Cancer
 Implement Logistic Regression
 Evaluation using Confusion Matrix

 materials: Lab5
 [DSFS] Ch16 Logistic Regression
 [DSFS] Ch11

21 Feb 
Lecture 5 – Logistic Regression Revisited
 Model Parameters and Decision Boundary
 Likelihood
Lecture 6 – Evaluation and Learning Principles
 Noise
 Overfitting
 Model Selection
 Learning Curve
 Sampling Bias

 slides: Learning Principles
 worksheet 6
 [DSFS] Ch11 (p142147)
 Overfitting and Underfitting
 Correctness
 [PDSH] Ch5
 Model Validation [ignore everything on crossvalidation] (p359361)
 Selecting the Best Model (p363)
 Learning Curves (p370373)
 Basis Function Regression (p392396)
 Regularization (p396398)

26 Feb 
Lab 5 – Detecting Breast Cancer
 Applying LR to Breast Cancer Prediction
 Evaluation using Confusion Matrix

 materials:
 cf. Lab5 above
 add this code to a new cell at the very end

28 Feb 
no class 

5 Mar 
Study for Midterm Exam
 Review
 Discuss questions/confusion with peers, TAs, instructor
What to study?

How to study?
 Revisit Worksheets
 do problems again (yes, rewrite your answers!)
 practice helps to remember the stuff
 the more practice, the better you will remember
 Revisit Labs
 ignore coding
 focus on Write up! problems and conceptual stuff
 Practice Retrieval of knowledge
 quiz your neighbor
 explain concepts to your neighbor
 or yourself (aloud!)
 …believe me it helps you retain your knowledge!
 Create Note Cards or Summary Sheets
 encoding concepts in your own writing helps you learn and retain your knowledge!

7 Mar 
Midterm EXAM in Crow 204
closed book – no notes – no crib/cheat sheet 
…CAUTION: room change!!!! 
12 Mar
14 Mar 
Spring Break 

19 Mar 
Lab 6 – Ethical Thinking for Data Science
 Why are ethics important in DS?
 Examples Scenarios


21 Mar 
Lecture 7 – Clustering
 Clustering Problem
 Similarity Measures
 kmeans Algorithm

 slides: Clustering
 worksheet 7
 [DSFS] Ch19 Clustering (p225232)
 The Idea
 The Model
 Example: Meetups
 Choosing k
 Example: Clustering Colors
 [PDSH] Ch5 (p462479)

26 Mar 
Lab 7 – Clustering
 Explore the kMeans Algorithm
 Choose k
 Application

 materials: Lab7
 reading: cf. Lecture 7

28 Mar 
Lecture 8 – Similaritybased Learning
 kNearest Neighbor Model
 CrossValidation
 Input Transformations

 slides: kNN (annotated)
 worksheet 8
 [DSFS] Ch12 kNN
 [PDSH] Ch5 Hyperparameters and Model Validation
 Thinking about Model Validation [crossvalidation] (p359362)

2 Apr 
Lab 8 – kNN
 Explore the kNN Algorithm
 Data Scaling
 Data Standardization

 materials: Lab8
 reading: cf. Lecture 8

4 Apr 
Lecture 9 – Feature Engineering
 Feature selection
 Feature learning
 Quick Intro to
 Decision Trees
 Random Forests
 Neural Networks


9 Apr 
Lab 9 – Feature Learning
 Explore a pretrained NN
 Feature Learning
 kNN Retrieval

 materials: Lab9
 reading: cf. Lecture 9

11 Apr 
Lecture 10 – Data Engineering
 More Insights into Neural Networks
 Data Augmentation
 Outlier Detection


16 Apr 
Lab10 – Gesture Recognition 
 materials: Lab10
 Image Augmentation with

18 Apr 
Lab 10 – Wrapup
 Train our NN
 Evaluating the trained NN
Lecture 11 – Topic Models 

23 Apr 
Lab11 – Organizing Text Data
 Topic Model
 Wikipedia Data
 Text Features
 LDA
Course evaluation: HERE
 Let’s avoid sampling bias! To do so, we need everyone to fill out the evals! Thanks for taking the time.
 Incentive: this will count as the graded part of the lab quiz for today’s lab.

 materials: Lab11
 updated tmv.zip
 [DSFS]
 Ch9 Getting Data
 Scraping the Web (p2108110)
 Ch20 Natural Language Processing
 Topic Modelling (p247252)
 [optional] intuitive and code explanations

25 Apr 
Semester Review
Pilot Offering Feedback Form
Towards Data Science: What to Study Next? 

8 May 
Final EXAM

