CSE 427s Home (FL18)

This is an inactive course webpage. Find the one for your current semester.



  • Please, fill in the course evaluation. Thanks for the feedback! 
  • Office hours
    • TA office hours run until FRI Dec 14th 
    • [canceled] my office hours TUE Dec18th 3-4pm are canceled.
  • All grading issues will have to be resolved by TUE Dec 18th.
    • Use Piazza (final_grading tag) for all communication on grading issues, grade clarifications, or regrade requests (only if not possible via GS).
    • No emails about grading issues. Thanks.
  • Fill in the demographics survey for my research study.
    • Thanks for taking the time.
    • More information on how to participate in or retract from the study can be found HERE.
  • Check-out the final project guide for grading rubric and other useful instructions.

Lectures and Labs

Instructor: Marion Neumann
Office: Jolley Hall Room 222
Contact: Please use Piazza!
Office Hours: TUE 3-4pm or individual appointment (request via email – allow for 2-3 days to reply and schedule)
Please, avoid drop ins w/o appointment outside my office hours.

Head TA: Jonathan C (takes care of all grading issues – contact via Piazza or in his office hours)

TAs: Alexis, Eric T, Erik M, Guanlan, Jonathan S, Steven, Yachen, Yushu

TA Office Hours:

MON 3-5pm (Yushu) in Jolley 224
WED 12-3pm (Erik, Jonathan, Adam) in Urbauer 215
THU 2-4pm (Aumesh, Lan) in Jolley 224
FRI 10am-12pm (Steven, Yachen) in Urbauer 216
SUN 1-3pm (Eric, Jonny) in Jolley 408

This course provides a comprehensive introduction to applied parallel computing using the MapReduce programming model facilitating large scale data management and processing. There will be an emphasis on hands-on experience working with the Hadoop architecture, an open-source software framework written in Java for distributed storage and processing of very large data sets on computer clusters. Further, we will derive and discuss various algorithms to tackle big data applications and make use of related big data analysis tools from the Hadoop ecosystem, such as Pig, Hive, Impala, and Apache Spark to solve problems faced by enterprises today. Check the Roadmap for more detailed information.

Prerequisites: CSE 131 (solid background in programming with Java), CSE 247, and CSE 330 (basic knowledge in relational databases (RDMS), SQL, and AWS). Use this prerequisite check list if you are not sure.

This class counts towards the Certificate in Data Mining and Machine Learning as applications course.

This class uses materials from the Cloudera Developer Training for MapReduce, the  Cloudera Data Analyst Training: Using Pig, Hive, and Impala with Hadoop, and the Cloudera Developer Training for Apache Spark, which are made available to Washington University through the Cloudera Academic Parntership program. Further contents are based on the “Mining of Massive Data Sets” book and class taught at Stanford by Jure Leskovec.


Course Calendar and Reading

Homework Assignments

Grades on Canvas

Resources and HowTos


Please ask any questions related to the course materials and homework problems on Piazza. Other students might have the same questions or are able to provide a quick answer.
Any public postings of (partial or full) solutions to homework problems (written or in form of source or pseudo code) will result in a grade of zero for that particular problem for ALL students in the course.