**This is an inactive course webpage**

Announcements

  • Thanks for a great semester!
  • We care! Thanks for taking the time to fill out the course evals! We got a response rate of 84.5%, which is great!
  • Regrades – we cannot take any regrades after the deadlines listed below, since grades are due on THU May 9th (this is a school deadline we cannot influence).
    • implementation project and written hw
      • deadline: WED May 8th 11:59pm (midnight)
      • office hours: TUE May 7: 2-4pm (Trevor) in Jolley 431
      •  via Piazza (use grades tag)
    • application project
      • deadline: WED May 8th 11:59pm (midnight)
      • Brad can be reached via the cse517a Piazza – use tag application_project
      • office hours: WED May 8th 6-7pm (Brad) Jolley 5th floor table in the back
    • final exam
      • deadline: THU May 9th 2pm
      • office hours: THU May 9: 1-2pm (MN) in Jolley 222

Instructor: Marion Neumann
Office: Jolley Hall Room 222
Office Hours: TUE 11:30am-12:30pm
Contact: please use Piazza!

Assistants and TAs:
Trevor – Head TA (*) and (***)
Brad Flynn – Application Project Coordinator (**)
Zachary (***), JerryAdrien

* manages all grades on Gradescope/Canvas –> use Piazza tag grades
** contact for the application project –> use Piazza tag application_project
*** manages the autograder –> use Piazza tag autograder

TA Office hours:Wednesday 10am-12pm (Trevor) in McDonnell 362 Thursday 5:30-7:30pm (Zach) in Jolly 517 Friday 10am-12pm (Jerry) in Jolly 517 Monday 2:30pm-4:30pm (Adrien) in Jolly 517

This course assumes a profound understanding of the fundamentals of machine learning (including the theoretical foundations and principles of ML as well as hands-on implementation experience). CSE517a covers advanced topics at the frontier of the field in-depth. Topics to be covered include kernel methods (support vector machines, Gaussian processes), neural networks (deep learning), and unsupervised learning. Depending on developments in the field, the course will also cover some advanced topics, which may include learning from structured data, active learning, and practical machine learning (feature selection, dimensionality reduction). For more information check-out the Roadmap.

Prerequisites: CSE 247, CSE 417T (enforced), ESE 326, Math 233, Math 309, and profound experience in Matlab/Octave or Python.

This class counts towards the Certificate in Data Mining and Machine Learning as required course.

Syllabus

Lectures
Lectures will be held every TUE & THU 10-11:30am in Louderman 458.

Homework Assignments
There will be two kinds of homework assignments with potentially overlapping assignment times:

implementation projects (project) 

  • should be worked on in groups of 2 students
  • once assigned you have 1-2 weeks time
  • will be auto-graded
  • submit via SVN repository
  • 4-5 assignments (each weighted equally)
  • contribute 40% towards your total course performance

written homework assignments (thw)

  • should be worked on in groups of up to (not more than) 2 students
  • submit via Gradescope (….coming soon…)
  • ca. 3-4 assignments (each weighted equally)
  • contribute 10% towards your total course performance

Homeworks will be assigned concurrently to the lecture sessions covering the respective materials. Due dates and submission instructions will be indicated on the course webpage under homework assignments. It is every student’s responsibility to meet the submission requirements and deadlines. We cannot accept late submissions and submissions that do not follow the submission instructions for no reason (see also Late Policy below).

Each homework assignment will be graded and the total grade achieved for all homework assignments (no drops, no make-ups) will contribute 50% towards your total course performance.

Regrade Requests
Any regrade requests and claims of missing scores will have to made within one week of the grade announcement. We will not take any regrade requests after this one week period for no reason. Grade announcements will be made on Piazza and grading comments will be provided in your SVN repository or via Gradescope. All grades will be maintained on Canvas. It is the student’s responsibility to verify that all grades on Canvas are accurate. Regrade submissions should be exclusively done via

  • Piazza using the project and autograder tags for implementation projects
  • Gradescope for written homework assignments

Application Project
The application project will run the entire semester with milestone due dates at the end of every calendar month.

  • have to be worked on in groups of 3 to 5 students (not more and not less)
  • no automatic extension
  • provide instructor/TAs with read access to private team repository (bitbucket/github)
  • the number of required tasks depends on the group size (each task is weighted equally)
  • contributes 20% towards your total course performance

Midterm and Final Exams
There will be one written midterm exam and one written final exam contributing 20% and 10% respectively towards your total course performance. Dates are

  • Midterm: March 19 (in-class)
  • Final: May 7 2019 6-8pm (scheduled by university)

Non-curricular Activities
We can not offer accommodations for examinations and given deadlines for non-curricular activities outside your Wash U commitments. This includes job interviews or flying home early. I understand that you may decide to miss a scheduled exam date for these reasons, but you will need to weigh the consequences when making such a decision.

Grading Summary (this information is still tentative…)
20%  application project
40%  implementation projects (project)
10%  written assignments (thw)
20%  midterm exam
10%  final exam

It is not possible to achieve a higher percentage on any individual grade component than listed above through bonus or extra credit problems.

Final course grades will be assigned using the following straight scale:

Letter GradeCutoff Percentage
A93%
A-90%
B+87%
B83%
B-80%
C+77%
C73%
C-70%
D+67%
D63%
D-60%
F< 60%

The passing grade is C- or better (70%).

Late Policy
Your homework assignments must be turned in on time. There are absolutely no makeup quizzes or assignments for any reason. You get an automatic 3 day extension on every written homework and implementation project.
WARNING: there is absolutely NO extension to this extension for NO reason!

Collaboration Policy
You are encouraged to discuss the course material with other students. Discussing the material, and the general form of solutions to the labs is a key part of the class. Since, for many of the assignments, there is no single “right” answer, talking to other students and to the TAs is a good thing. However, everything that you turn in should be your own work, unless we tell you otherwise. If you talk about assignments with another student, then you need to explicitly tell us on the hand-in. You are not allowed to copy answers or parts of answers from anyone else, or from material you find on the Internet. This will be considered as willful cheating, and will be dealt with according to the official collaboration policy:

Academic Integrity
Unless explicitly instructed otherwise, everything that you turn in for this course must be your own work. If you willfully misrepresent someone else’s work as your own, you are guilty of cheating. Cheating, in any form, will not be tolerated in this class.

Checkout these questions and answers in the CSE FAQ.

There is zero tolerance of Academic Dishonesty. I will be actively searching for academic dishonesty on all homework assignments, quizzes, and exams. If you are guilty of cheating on any assignment or exam, you will receive and F in the course and be referred to the School of Engineering Discipline Committee. In severe cases, this can lead to expulsion from the University, as well as possible deportation for international students. If you copy from anyone in the class both parties will be penalized, regardless of which direction the information flowed. This is your only warning.

Please refer to the University Undergraduate Academic Integrity Policy for more information (holds for undergraduate and master’s students). The policy for PhD students can be found here. If you suspect that you may be entering an ambiguous situation, it is your responsibility to clarify this before the professor or TAs detects it. If in doubt, please ask.

Providing/Posting Solutions
Providing your course work (written or code) in any form to others is a violation of the academic integrity policy. If you provide your solutions to someone else in the course or post them publicly onlineyou are guilty of violating our academic integrity policy. Such a case will be treated the same way as described above and prosecution will also take place after finishing the course or even graduating form Wash U.

Mental Health
Mental Health Services professional staff members work with students to resolve personal and interpersonal difficulties, many of which can affect the academic experience. These include conflicts with or worry about friends or family, concerns about eating or drinking patterns, and feelings of anxiety and depression. See: http://shs.wustl.edu/MentalHealth

Accommodations based upon sexual assault
The University is committed to offering reasonable academic accommodations to students who are victims of sexual assault. Students are eligible for accommodation regardless of whether they seek criminal or disciplinary action. Depending on the specific nature of the allegation, such measures may include but are not limited to: implementation of a no-contact order, course/classroom assignment changes, and other academic support services and accommodations. If you need to request such accommodations, please direct your request to Kim Webb (kim_webb@wustl.edu), Director of the Relationship and Sexual Violence Prevention Center. Ms. Webb is a confidential resource; however, requests for accommodations will be shared with the appropriate University administration and faculty. The University will maintain as confidential any accommodations or protective measures provided to an individual student so long as it does not impair the ability to provide such measures.

If a student comes to me to discuss or disclose an instance of sexual assault, sex discrimination, sexual harassment, dating violence, domestic violence or stalking, or if I otherwise observe or become aware of such an allegation, I will keep the information as private as I can, but as a faculty member of Washington University, I am required to immediately report it to my Department Chair or Dean or directly to Ms. Jessica Kennedy, the Universitys Title IX Coordinator. If you would like to speak with the Title IX Coordinator directly, Ms. Kennedy can be reached at (314) 935-3118, jwkennedy@wustl.edu, or by visiting the Title IX office in Umrath Hall.  Additionally, you can report incidents or complaints to Tamara King, Associate Dean for Students and Director of Student Conduct, or by contacting WUPD at (314) 935-5555 or your local law enforcement agency. See: Title IX

You can also speak confidentially and learn more about available resources at the Relationship and Sexual Violence Prevention Center by calling (314) 935-8761 or visiting the 4th floor of Seigle Hall. See: RSVP Center

Bias Reporting 
The University has a process through which students, faculty, staff and commu- nity members who have experienced or witnessed incidents of bias, prejudice or discrimination against a student can report their experiences to the Universitys Bias Report and Support System (BRSS) team. See: http://brss.wustl.edu

Center for Diversity and Inclusion (CDI):
The Center of Diversity and Inclusion (CDI) supports and advocates for undergraduate, graduate, and professional school students from underrepresented and/or marginalized populations, creates collaborative partnerships with campus and community partners, and promotes dialogue and social change.  One of the CDI’s strategic priorities is to cultivate and foster a supportive campus climate for students of all backgrounds, cultures and identities.
See: diversityinclusion.wustl.edu/

Course calendar and reading

(*) indicates optional more advanced reading for the interested student

_______

Topic

Reading

15 Jan

17 Jan

Course Overview, Syllabus

Structural Risk Minimization

  • lecture notes (.tex, .pdf)
    • HERE is a folder with the images (remember to update this folder as we add new illustrations)
  • FCML: Ch1, Linear Modelling
  • ESL: 3.4.3, 10.6
22 Jan  Optimization

  • GD (brief recap)
  • Newton
  • SGD
  • momentum method
  • LFD: 3.3.2, Gradient Descent
  • FCML: Comments 4.1 & 2.6
24 Jan Estimating Probabilities form Data

  • Coin Flipping
  • MLE
  • MAP
  • FCML: 2.1-2.6, Random Variables and Probability
  • FCML: 3.1-3.7, Coin Game
29 Jan  MLE and MAP for discriminative ML

  • Linear Regression
  • Ridge Regression
  • Logistic Regression
  • FCML: 2.8, MLE
  • FCML: 3.8, 4.2-4.3, MAP
  • FCML: 5.2.2, Logistic Regression
31 Jan   Squared Euclidean Distances

  • Use-cases
  • Matrix Equations
  • Efficient Computation
5 Feb

7 Feb

 Naive Bayes

  • Generative ML
  • Categorical features
  • Multinomial features
  • Continuous features (Gaussian NB)
  • Missing features
  • FCML: 5.2.1, Bayes Classifier and NB
  • Ch 3 of Tom Mitchell’s ML book
12 Feb

14 Feb

 Performance Evaluation

  • Performance Measures
    • Regression
    • Classification
  • Statistical Tests
  • Cross-Validation
  • Re-sampling
  • lecture notes (.tex, .pdf, slides)
  • FCML: 5.4, Performance
  • ESL: 7.10, Cross-Validation
19 Feb  RBF Networks

  • Radial basis functions
  • Kernel regression
  • RBF network
  • lecture notes (.tex, .pdf)
  • LFD: eCh6.3-6.3.2, RBF Networks
21 Feb

26 Feb

 Kernels

  • Valid Kernels
  • Kernel Construction
  • Kernel Machines
  • LFD
    • 3.4, Non-linear Transformation
    • 8.3, Kernel Trick & Kernel SVM
  • FCML: 5.3.2, SVM and Kernel Methods
28 Feb  Recitation Session

  • features for text data and text classification
  • discussion of written hw1 and hw2 (selected problems)
For questions contact Zach and Trevor.

Mar 5

Mar 7

 Gaussian Processes

  • GPR via Weight-space View
  • Definition: GP
  • GPR via Function-space View
    • Noise-free Observations
    • Noisy Observations
    • noisy Predictions
  • GPR Algorithm
  • Hyperparameter Learning
  • lecture notes (.tex, .pdf)
  • FCML: 8.1, Non-Parametric Models 
  • FCML: 8.2, GP Regression 
  • GPML: 2.2-3, Function-space View
  • (*) GPML: 2.1, Weight-space View
  • (*) GPML: 4.2, Covariance functions
  • (*) GPML: 5, Hyperparameter Learning & Model selection
Mar 12

Mar 14

 Spring Break
Mar 19 MIDTERM EXAM

  • in-class
You can bring a cheat sheet with the following specifications:

  • one US-letter sized page
  • double-sided
  • handwritten
Mar 21 Wrap-up: Kernel Methods and GPs

  • GP Hyperparameter Training
  • Multi-class Classification (for SVM, GPs)
    • 1-vs-all
    • Platt scaling
    • 1-vs-1
    • (*) Multi-class Logistic Regression
  • Application: Traffic Prediction
  • FCML: 8.4, Hyperparameter Optimisation
  • GPML: 5.1, 5.3-4  Hyperparameter Learning & Model selection 
Mar 26

Mar 28

 Clustering 

  • Unsupervised Learning
  • k-Means
  • Kernel k-Means
  • GMMs
  • lecture notes (.tex, .pdf)
  • FCML: 6., Intro
  • FCML: 6.2, k-Means, Kernel k-Means
  • FCML: 6.3.1-7, Mixture Models 
Apr 2

Apr 4

Dimensionality Reduction

  • PCA/SVD/MDS
  • Non-linear Dimensionality Reduction
  • Data Preprocessing
  • Feature Engineering
  • lecture notes (.tex, .pdf)
  • FCML: 7.1-7.2, PCA
  • LFD:
    • eCH 9.1, Input Preprocessing
    • eCH 9.2, PCA, SVD
  • slides
Apr 9

Apr 11

Neural Networks – Basics

  • Examples
  • Feed-Forward NNs
  • Back-Propagation
Apr 16

Apr 18

Learning (Deep) NNs

  • Architecture
  • Regularization
  • Optimization
  • Initialization
  • Pre-training

 

Apr 23 Beyond FFNNs

  • Autoencoders
  • Recurrent NNs
  • [optional] Convolutional NNs
Apr 25  Semi-Supervised Learning 

  • self-training & co-training
  • GMMs
  • graph-based SSL
  • [optional] generative models
  • [optional] S3VM
  • (*) ESL: 17.1-17.3.1, Undirected Graphical Models
7 May Final EXAM

  • 6-7pm (starts 6pm sharp!)
    • be there by 5:50pm
  • in Louderman 458
You can bring a cheat sheet with the following specifications:

  • one US-letter sized page
  • double-sided
  • handwritten
Homework assignments

Implementations Projects

All implementations projects have to be submitted via SVN repository commit (using the specified filenames and folder locations).

  • 04/11  project4
    • complimentary autograder deadlines (allow 1 full day (24hrs) for us to get back the result)
      • TUE 04/16/2019 11:59pm
      • FRI 04/19/2019 11:59pm
      • MON 04/22/2019 11:59pm
      • THU 04/25/2019 11:59pm (additional)
    • due THU 04/25/2019 11:59pm
    • due SUN 04/28/2019 11:59pm (extended)
  • 03/07  project3
    • complimentary autograder deadlines (allow 1 full day (24hrs) for us to get back the result)
      • TUE 03/12/2019 11:59pm
      • FRI 03/15/2019 11:59pm
      • TUE 03/19/2019 11:59pm
      • THU 03/21/2019 11:59pm
      • SUN 03/24/2019 11:59pm (additional)
    • due THU 03/21/2019 11:59pm
    • due TUE 03/26/2019 11:59pm (extended)
  • 02/14 project2
    • complimentary autograder deadlines (allow 1 full day (24hrs) for us to get back the result)
      • SUN 02/17/2019 11:59pm
      • FRI 02/22/2019 11:59pm
      • TUE 02/26/2019 11:59pm
    • due TUE 02/26/2019 11:59pm
  • 01/22 project1
    • download the data from HERE (do NOT add this folder to your SVN repository at any time)
    • complimentary autograder deadlines (allow 1 full day (24hrs) for us to get back the result)
      • THU 01/31/2019 11:59pm
      • SUN 02/31/2019 11:59pm
      • TUE 02/05/2019 11:59pm
      • FRI 02/08/2018 11:59pm
    • due TUE 02/05/2019 11:59pm FRI 02/08/2018 11:59pm

Written Homework 

All written homework (thw) must be submitted via Gradescope (will be available shortly). Follow the submission instructions below (violations will result in a score penalty).

  • 04/04 thw4
  • 02/28 thw3
  • 02/07 thw2
  • 01/17 thw1
  • SUBMISSION INSTRUCTIONS
    • Find a tutorial on submitting a PDF to Gradescope HERE or watch this video.
    • You need to match your pages to the problems in Gradescope.
      • If pages are not matched correctly -10% for every incorrectly/non-matched problem.
    • Gradescope group submission is required for groups!
      • -10% if your group’s submission does not list both team members in Gradescope
      • Find a tutorial on how to add a group member to your submission in the second half of this video.
Resources & HowTos

Python

We will be using Python and NumpyScipy, and Matplotlib for the implementation and application projects. All those packages are included in the Anaconda package. Follow these instructions to get everything installed.

VERSIONS

It’s recommended to go with the newest versions included in Anaconda. If you have an up and running Python installation (and are capable to manage dependencies yourself), feel free to use any of the following Python versions: 3.4, 3.5, or 3.6 and the respective compatible versions for the packages listed above.

JUPYTER NOTEBOOKS 

Jupyter notebooks (included in the Anaconda package) might be useful to explore demo code and also for developing your application project solutions. HERE is some more information on how to get started with Jupyter.

PYTHON TUTORIALS AND RESOURCES

SVN

We will be using SVN repositories to distribute and collect implementation project assignments. Please see this tutorial about accessing your repository and how to submit your work.

If you wish to access your files from your own computer, you will need to install Tortoise (Windows) or SmartSVN (Windows, Mac, Linux) or use SVN via the terminal (Mac, Linux).

Gradescope

We will be using Gradescope for submission and grading of all written work. Find a tutorial on submitting a PDF to Gradescope HERE.

Course Books

  • Main course book: A First Course in Machine Learning, 2nd edition (FCML) by Rogers and Girolamo (We will use this book for readings, mathematical derivations, and written homework problems.)
  • From CSE417tLearning from Data (LFD) by Abu-Mostafa, Magdon-Ismail, and Lin (Keep your copy from CSE417t around. You might need it again. This book is a terrific resource!)
  • Matrix Cookbook: for anything about matrix equations and derivatives, etc.
  • Practical reference: Python Data Science Handbook by VanderPlas (This will be useful for the application project.)
  • Useful reference bookThe Elements of Statistical Learning (ESL) by Hastie, Tibshirani, Friedman (This book is freely available online, so do not hesitate to consult it for additional information.)

MATLAB or Octave

We might use MATLAB (license required) or Octave (it’s free) in the course. The choice is up to you! All provided course materials will run on either one. Here is a short article about the differences.

Octave

If you decided to use Octave, get it from HERE.

MATLAB

If you decided to use MATLAB, you have the following options:

MATLAB: Student Edition
The School of Engineering has a campus wide license that allows all Engineering students to install Matlab on their personally owned computers at no charge. This software is available to students who are enrolled in an Engineering class. You should have gotten an email with  License information and installation instructions, if not, please email support@seas.wustl.edu.
Non engineering students are not required to purchase the MATLAB software (as it is available in the computer labs), but if you wish to acquire the student edition of MATLAB go  here.

Accessing MATLAB via Remote Desktop
To access MATLAB remotely from your computer you must use Remote Desktop. For Windows, open the Start Menu and type “Remote Desktop” into the text box. The application should appear in the list. For Mac users, you will need to download this application.

Once you have started the remote desktop application, type oasis.cec.wustl.edu as the address you would like to connect to. After the connection has been made, you will need to login with your WUSTL KEY. Enter your username with “ACCOUNTS\” in front of it, like this:

ACCOUNTS\wustlkey

Once you have logged in, MATLAB can then be accessed from the Start Menu as in the CEC labs.

If you are attempting to use Remote Desktop from off campus, you may be required to use the VPN.

Accessing MATLAB via Linux Lab
You can also use the Linux Lab to run Matlab, by going to https://linuxlab.seas.wustl.edu and selecting Submit Job, and then starting a “Matlab” session.