CSE217a Calendar (SP19)

This is an inactive course webpage.

All course information and contents for SP2020 will be available on Canvas.

You can find a list of topics we plan to cover in this course on this (tentative) roadmap.

_______

Topic

Materials

15 Jan Syllabus

Group Activity: What is data science?

17 Jan Lecture 1 – Data Science

  • What is data science?
  • What is machine learning?
  • DS workflow
  • Data Representation
  • slides: Introduction
  • worksheet 1
  • [DSFS] Ch1
    • What is Data Science?
  • [DSFS] Ch11 (p141-142)
    • Modeling
    • What is Machine Learning?
  • [PDSH] Preface (p xi-xii)
    • What is Data Science?
    • Why Python?
  • [PDSH] Ch5 (p331-342)
    • What is Machine Learning?
  • 9 Data Science Problems
22 Jan Lab 1 – Plant Species Classification

  • Data Exploration with Python
  • NumPy Arrays
  • materials: Lab1
  • [DSFS] Ch2 Python
    • The Basics (p15-26)
  • [PDSH] Ch1
    • IPython (all about notebooks, skip stuff about the shell)
  • [PDSH] Ch2 NumPy  (p33-63, p78-85)
    • Data Types in Python
    • Basic of NumPy Arrays
    • Computation on NumPy Arrays
    • Aggregations
    • Fancy Indexing
24 Jan Lecture 2 – Exploratory Data Analysis

  • Data Types
  • Data Representation
  • Dataset Statistics
  • Visualization
  • slides: EDA
  • worksheet 2
  • [DSFS] Ch3 Visualizing Data
  • [DSFS] Ch10 (p121-132)
    • Exploring your Data
    • Cleaning and Munging
    • Manipulating Data
  • [PDSH] Ch4
    • General Matplotlib Tips (p217-221)
    • Scatter plots (p233-237)
    • Histograms (p245-247)
29 Jan Lab 2 – Analyzing the MoMA Data

  • EDA Process
  • Posing Data Questions
  • Answering Data Questions
  • Pandas DataFrames
  • materials: Lab2
  • [PDSH] Ch3 Pandas (p97-114)
    • Pandas Objects
    • Data Indexing and Selection
  • more answers
31 Jan Lecture 3 – Sentiment Analysis

  • Working with Text Data
  • Scraping Data from the Web
  • Sentiment Prediction
  • Error Rate and Accuracy
  • slides: Sentiment Analysis
  • worksheet 3
  • [DSFS] Ch4
    • Vectors (p49-53)
  • [DSFS] Ch9
    • Reading Files (p105-108)
    • Using APIs (p114-117) 
    • Example: Twitter APIs (p117-120)
  • [DSFS] Ch20 (p239-244)
    • Word Clouds
    • n-gram Models 
5 Feb Lab 3 – Analyzing Movie Reviews

  • Rule-Based Sentiment Prediction
  • Sentiment Classifier
  • Evaluation and Model Comparison
  • materials: Lab3
  • [PDSH] Ch5 (p343-359)
    • Introducing Scikit-Learn
7 Feb Lecture 4 – Regression

  • Least-Squares Method
  • Linear vs Polynomial Regression
  • Model Complexity
  • RMSE and MAE
12 Feb Lab 4 – Predicting Housing Prices

  • Implement 1D Linear Regression
  • Data Exploration
  • Evaluation via RMSE
  • materials: Lab4
  • [DSFS] Ch14 Multiple Regression (p179-183)
    • The Model
    • Least Squares Model
    • Fitting the Model
    • Interpreting the Model
    • Goodness of Fit 
14 Feb  Lecture 5 – Logistic Regression

  • Decision Boundary
  • Probabilistic Classifier
  • Sigmoid/Logistic Function
  • Likelihood
  • Confusion Matrix
  • slides: see lecture below
  • [DSFS] Ch16 Logistic Regression
  • [PDSH] Ch5 Scikit-Learn
    • Classification on Digits (p357)
19 Feb Lab 5 – Detecting Breast Cancer

  • Implement Logistic Regression
  • Evaluation using Confusion Matrix
  •  materials: Lab5
  • [DSFS] Ch16 Logistic Regression
  • [DSFS] Ch11
    • Correctness (p145-147)
21 Feb Lecture 5 – Logistic Regression Revisited

  • Model Parameters and Decision Boundary
  • Likelihood

Lecture 6 – Evaluation and Learning Principles

  • Noise
  • Overfitting
  • Model Selection
  • Learning Curve
  • Sampling Bias

 

  • slides: Learning Principles
  • worksheet 6
  • [DSFS] Ch11 (p142-147)
    • Overfitting and Underfitting
    • Correctness
  • [PDSH] Ch5
    • Model Validation [ignore everything on cross-validation] (p359-361)
    • Selecting the Best Model (p363)
    • Learning Curves (p370-373)
    • Basis Function Regression (p392-396)
    • Regularization (p396-398)
26 Feb Lab 5 – Detecting Breast Cancer

  • Applying LR to Breast Cancer Prediction
  • Evaluation using Confusion Matrix
  •  materials:
    • cf. Lab5 above
    • add this code to a new cell at the very end
28 Feb  no class
5 Mar Study for Midterm Exam

  • Review
  • Discuss questions/confusion with peers, TAs, instructor

 What to study? 

 

How to study?

  • Revisit Worksheets
    • do problems again (yes, rewrite your answers!)
    • practice helps to remember the stuff
    • the more practice, the better you will remember
  • Revisit Labs
    • ignore coding
    • focus on Write up! problems and conceptual stuff
  • Practice Retrieval of knowledge
    • quiz your neighbor
    • explain concepts to your neighbor
    • or yourself (aloud!)
    • …believe me it helps you retain your knowledge!
  • Create Note Cards or Summary Sheets
    • encoding concepts in your own writing helps you learn and retain your knowledge!
7 Mar Midterm EXAM in Crow 204

closed book – no notes – no crib/cheat sheet

 …CAUTION: room change!!!!
12 Mar
14 Mar
Spring Break
19 Mar Lab 6 – Ethical Thinking for Data Science

  • Why are ethics important in DS?
  • Examples Scenarios
21 Mar Lecture 7 – Clustering

  • Clustering Problem
  • Similarity Measures
  • k-means Algorithm
  • slides: Clustering
  • worksheet 7
  • [DSFS] Ch19 Clustering (p225-232)
    • The Idea
    • The Model
    • Example: Meetups
    • Choosing k
    • Example: Clustering Colors
  • [PDSH] Ch5 (p462-479)
    • k-Means Clustering 
26 Mar Lab 7 – Clustering

  • Explore the k-Means Algorithm
  • Choose k
  • Application
  • materials: Lab7
  • reading: cf. Lecture 7
28 Mar Lecture 8 – Similarity-based Learning

  • k-Nearest Neighbor Model
  • Cross-Validation
  • Input Transformations
  • slides: kNN (annotated)
  • worksheet 8
  • [DSFS] Ch12 k-NN
  • [PDSH] Ch5 Hyperparameters and Model Validation
    • Thinking about Model Validation [cross-validation] (p359-362)
2 Apr Lab 8 – k-NN

  • Explore the k-NN Algorithm
  • Data Scaling
  • Data Standardization
  • materials: Lab8
  • reading: cf. Lecture 8
4 Apr Lecture 9 – Feature Engineering

  • Feature selection
  • Feature learning
  • Quick Intro to
    • Decision Trees
    • Random Forests
    • Neural Networks
  • slides: Feature Engineering (annotated)
  • worksheet 9
  • [DSFS]
    • Ch10 Dimensionality Reduction (p134-139)
    • Ch17 What is a Decision Tree? (p201-203)
    • Ch17 Random Forest (p211)
  • [PDSH]  Ch5 Feature Engineering (p375-381)
9 Apr Lab 9 – Feature Learning

  • Explore a pre-trained NN
  • Feature Learning
  • kNN Retrieval
  • materials: Lab9
  • reading: cf. Lecture 9
11 Apr Lecture 10 – Data Engineering

  • More Insights into Neural Networks
  • Data Augmentation
  • Outlier Detection
16 Apr  Lab10 – Gesture Recognition
18 Apr  Lab 10 – Wrap-up

  • Train our NN
  • Evaluating the trained NN

Lecture 11 – Topic Models

23 Apr Lab11 – Organizing Text Data

  • Topic Model
  • Wikipedia Data
  • Text Features
  • LDA

Course evaluationHERE

  • Let’s avoid sampling bias! To do so, we need everyone to fill out the evals! Thanks for taking the time.
  • Incentive: this will count as the graded part of the lab quiz for today’s lab.
25 Apr Semester Review

Pilot Offering Feedback Form

Towards Data Science: What to Study Next? 

8 May  Final EXAM

  • 6-7pm in Crow 201