CSE 427s – Cloud Computing with Big Data Applications

This is an inactive course webpage. Find the one for your current semester.

This course provides a comprehensive introduction to applied parallel computing using the MapReduce programming model facilitating large scale data management and processing. There will be an emphasis on hands-on experience working with the Hadoop architecture, an open-source software framework written in Java for distributed storage and processing of very large data sets on computer clusters. Further, we will make use of related big data technologies from the Hadoop ecosystem of tools, such as Hive, Impala, and Pig in developing analytics and solving problems faced by enterprises today.

Prerequisite: CSE 241

The content of this class is derived largely from the Cloudera Developer Training for Apache Hadoop and Cloudera Data Analyst Training: Using Pig, Hive, and Impala with Hadoop, which are made available to Washington University through the Cloudera Academic Parntership program. Further materials are adapted from the “Mining Massive Data Sets” class taught at Stanford by Jure Leskovec.

Instructor: Marion Neumann
Office: Jolley Hall Room 403

Office Hours: Thursday 11am-1pm

Please ask any questions related to the course materials and homework problems on Piazza. Other students might have the same questions or are able to provide a quick answer. Any postings of solutions to assignments (in form of source or pseudo code) will result in a grade of zero for that particular problem/assignment for ALL students.

***NEW: Extra credit will be given for Piazza answers and solution describtions marked as good answers by an instructor! 1% per answer counting towards your hw assignment scores.

Course Calendar

Homework Assignments

TA Office Hours


Grades on BB

Resources and HowTos