CSE 427s – Cloud Computing with Big Data Applications

This is an inactive course webpage. Find the one for your current semester.



  • Thanks for the semester!
  • Thanks for filling out the course evaluation!


  • Resolve Grade Issues: TUE 5/9 11:30am-12:30pm in Jolley 222 (Marion)
    • Any grade issues need to be resolved by TUE 5/9 5pm!
  • Check out these tips and tricks about setting-up, backing-up, and trouble-shooting your VM.

This course provides a comprehensive introduction to applied parallel computing using the MapReduce programming model facilitating large scale data management and processing. There will be an emphasis on hands-on experience working with the Hadoop architecture, an open-source software framework written in Java for distributed storage and processing of very large data sets on computer clusters. Further, we will derive and discuss various algorithms to tackle big data applications and make use of related big data analysis tools from the Hadoop ecosystem, such as Pig, Hive, Impala, and Apache Spark to solve problems faced by enterprises today. Check the Roadmap for more detailed information.

Prerequisites: CSE 247, CSE 131 (solid background in programming with Java), and CSE 330 (basic knowledge in relational databases (RDMS), SQL, and AWS).

This class counts towards the Certificate in Data Mining and Machine Learning as applications course.

The content of this class is derived largely from the Cloudera Developer Training for MapReduce, the  Cloudera Data Analyst Training: Using Pig, Hive, and Impala with Hadoop, and the Cloudera Developer Training for Apache Spark, which are made available to Washington University through the Cloudera Academic Parntership program. Further materials are adapted from the “Mining of Massive Data Sets” book and class taught at Stanford by Jure Leskovec.

Instructor: Marion Neumann
Office: Jolley Hall Room 222
Office Hours: TUE 11:30am-12:30pm (or individual appointment* – avoid drop ins w/o appointment)

*request individual appointments via email and allow for 2-3 days reply and scheduling time

TA Office Hours: 
MON 1-3pm in Jolley 224 (Josh)
TUE 4:30-6:30pm in Jolley 224 (Yu)
THU 3-5pm in Jolley 224 (Ziyi)
FRI 10am-12pm in Jolley 224 (Jude)
SAT 10am-12pm in Whitaker 216 (Jonathan)

Lectures: TUE/THU 10-11:30am in Hillman 70
Lab sessions will occasionally replace the lectures and take place in Urbauer 214. All lab sessions will be announced in advance in-class and on the course calendar.

Piazza: Please ask any questions related to the course materials and homework problems on Piazza. Other students might have the same questions or are able to provide a quick answer. Any postings of (partial) solutions to problems (written or in form of source or pseudo code) will result in a grade of zero for that particular problem for ALL students. Sign-up here: piazza.com/wustl/spring2017/cse427s



Course Calendar and Reading

Homework Assignments

Grades on BB

Resources and HowTos