**This is an inactive course webpage**

Lectures and Labs

  • Lecture: THU 4-5:20pm in Wilson 214
  • Lab – Section 1: TUE 1-2:20pm in Eads 016 
  • Lab – Section 2: TUE 4-5:20pm in January 110

Instructor: Marion Neumann
Office: Jolley Hall Room 222
Contact: Please use Piazza!
Office Hours: TUE 3-4pm or individual appointment (request via email – allow for 1-2 days to reply and schedule)
Try to avoid drop ins w/o appointment outside my office hours.

Head TA: Jonathan C (takes care of all grading issues – contact via Piazza or in his office hours)

TAs: Alexis, Arushee, Jordie, Kevin, Lorenzo, Patrick, Steven, Wentao, Zhibo

TA Office Hours:Monday Sever 300 @ 2:30-4pm Alexis, Jonathan Wednesday Lopata 201 @ 2:30-4:30pm Lorenzo, Patrick Friday Lopata 201 @ 10am-2pm Wentao, Jordie, Steven Sunday Rudolph 201 @ 9-11am Kevin, Zhibo

This course provides a comprehensive introduction to applied parallel computing using the MapReduce programming model facilitating large scale data management and processing. There will be an emphasis on hands-on experience working with the Hadoop architecture, an open-source software framework written in Java for distributed storage and processing of very large data sets on computer clusters. Further, we will derive and discuss various algorithms to tackle big data applications and make use of related big data analysis tools from the Hadoop ecosystem, such as Pig, Hive, Impala, and Apache Spark to solve problems faced by enterprises today. Check the Roadmap for more detailed information.

Prerequisites: CSE 131 (solid background in programming with Java), CSE 247, and CSE 330 (basic knowledge in relational databases (RDMS), SQL, and AWS). Use this prerequisite check list if you are not sure.

This class counts towards the Certificate in Data Mining and Machine Learning as applications course.

This class uses materials from the Cloudera Developer Training for MapReduce, the  Cloudera Data Analyst Training: Using Pig, Hive, and Impala with Hadoop, and the Cloudera Developer Training for Apache Spark, which are made available to Washington University through the Cloudera Academic Parntership program. Further contents are based on the “Mining of Massive Data Sets” book and class taught at Stanford by Jure Leskovec.


Lectures and Labs
Lectures will be held every THU. Labs will be held on TUE. Lab sessions may be replaced by lectures. Any change in locations will be announced on the course webpage.

Course Content
Find a list of topics to be covered on the course Roadmap!

Homework Assignments 
We will have weekly homework assignments that can be worked on in groups of two students. They will be assigned after each lecture on THU and will be due one week later at 4pm. Each homework assignment will be graded and its score counts towards the total grade. It is every student’s responsibility to meet the submission requirements and deadlines. Late submissions will not be accepted for no reason (see also: Late Policy below).  Submissions that do not follow the instructions provided on the assignment or the homework website will receive a score penalty. All homework assignments will be weighted equally and the total grade will contribute 40% towards your total course performance.

Regrade Requests
Any regrade requests and claims of missing scores for any graded work will have to made within one week of the grade announcement. We will not take any regrade requests made after this one week period for no reason. Grade announcements and grading comments will be provided via Gradescope. Grades will be maintained on Canvas.

Makeup Homework
There will be one additional makeup homework assigned in the last week of classes that can be used to replace the lowest homework assignment score.

In-class Exam
There will be one written in-class exam contributing 30% towards your total course performance. Date:

  • exam: THU 21 Nov 2019 4-5:20pm in the lecture
  • no final exam

Final Project 
There will be a final project assigned in the 2nd half of the course. Due date:

  • FRI 13 Dec 2019 at 6pm (no extension)

The final project contributes 20% towards your total course performance. It will be graded on a 0-100% score scale and grades will be assigned based on both, its implementation component and its conceptual component (a report motivating, describing, and analyzing the project and a discussion of its experimental results and impact).

Non-curricular Activities
We cannot offer accommodation for examinations and given deadlines for non-curricular activities outside your Wash U commitments. This includes job interviews or flying home early. I understand that you may decide to miss a scheduled exam date for these reasons, but you will need to weigh the consequences when making such a decision.

Lab and Active Learning Quizzes
Quizzes will be given in lectures and labs to encourage your own thinking and enhance your learning process. We will use socrative (http://www.socrative.com) to distribute and record quizzes. Students will need to bring a WIFI enabled device (laptop, tablet, smart phone, …).

  • Lab quizzes will be graded and contribute 10% towards the final grade for the course. The two lowest score recap quizzes will be dropped.
  • In-class Active Learning quizzes will be recorded for participation. A  participation >70% may result in a grade bump for boarder-lined final course scores (less than 1% away from cutoff). Active Learning quizzes can only count if your answers are meaningful; quizzes with empty or nonsense answers do not count towards participation.
  • There are no make ups for missed quizzes.

Grading Summary
40%  homework assignments
10%  lab quizzes
30%  in-class exam
20%  final project (implementation component and conceptual component)

It is not possible to achieve a higher percentage on any individual grade component than listed above through bonus or extra credit problems.

Final course grades will be assigned using the following straight scale:

Letter GradeCutoff Percentage
A>= 93%
A->= 90%
B+>= 87%
B>= 83%
B->= 80%
C+>= 77%
C>= 73%
C->= 70%
D+>= 67%
D>= 63%
D->= 60%
F< 60%

Late Policy
Your homework assignments must be turned in on time. We cannot accept any late submissions or submissions that do not follow the instructions on the assignment. It is your responsibility to follow the submission instructions exactly. You get an automatic three day extension on every homework deadline (this does not include the final project deadlines).
WARNING: There is absolutely NO extension to this extension for NO reason!

Collaboration Policy
You are encouraged to discuss the course materials with other students. Discussing the material, and the general form of solutions to the labs is a key part of the class. Since, for many of the assignments, there is no single “right” answer, talking to other students and to the TAs is a good thing. However, everything that you turn in should be your own work, unless we tell you otherwise. If you talk about assignment solutions with another student, then you need to explicitly tell us on the hand-in. You are not allowed to copy answers/code or parts of answers/code from anyone else, or from material you find on the internet. This will be considered as willful cheating, and will be dealt with according to the official collaboration policy stated below:

Academic Integrity
Unless explicitly instructed otherwise, everything that you turn in for this course must be your own work. If you willfully misrepresent someone else’s work as your own, you are guilty of cheating. Cheating, in any form, will not be tolerated in this class.

Checkout these questions and answers in the CSE FAQ.

There is zero tolerance of Academic Dishonesty. I will be actively searching for academic dishonesty on all homework assignments, quizzes, and exams. If you are guilty of cheating on any assignment or exam, you will receive an F (failed) in the course and be referred to the School of Engineering Discipline Committee. In severe cases, this can lead to expulsion from the University, as well as possible deportation for international students. If you copy from anyone in the class or anyone that has previously taken this or other classes at Washington University covering the same topics, both parties will be penalized, regardless of which direction the information flowed. This is your only warning.

Please refer to the University Undergraduate Academic Integrity Policy, for more information. If you suspect that you may be entering an ambiguous situation, it is your responsibility to clarify it before the professor or TAs detect it. If in doubt, please ask us.

Mental Health
Mental Health Services professional staff members work with students to resolve personal and interpersonal difficulties, many of which can affect the academic experience. These include conflicts with or worry about friends or family, concerns about eating or drinking patterns, and feelings of anxiety and depression. See: http://shs.wustl.edu/MentalHealth

If you have any problems with the workload of this class, please come and talk to me. The earlier we talk the better.

Accommodations based upon sexual assault
The University is committed to offering reasonable academic accommodations to students who are victims of sexual assault. Students are eligible for accommodation regardless of whether they seek criminal or disciplinary action. Depending on the specific nature of the allegation, such measures may include but are not limited to: implementation of a no-contact order, course/classroom assignment changes, and other academic support services and accommodations. If you need to request such accommodations, please direct your request to Kim Webb (kim_webb@wustl.edu), Director of the Relationship and Sexual Violence Prevention Center. Ms. Webb is a confidential resource; however, requests for accommodations will be shared with the appropriate University administration and faculty. The University will maintain as confidential any accommodations or protective measures provided to an individual student so long as it does not impair the ability to provide such measures.

If a student comes to me to discuss or disclose an instance of sexual assault, sex discrimination, sexual harassment, dating violence, domestic violence or stalking, or if I otherwise observe or become aware of such an allegation, I will keep the information as private as I can, but as a faculty member of Washington University, I am required to immediately report it to my Department Chair or Dean or directly to Ms. Jessica Kennedy, the Universitys Title IX Coordinator. If you would like to speak with the Title IX Coordinator directly, Ms. Kennedy can be reached at (314) 935-3118, jwkennedy@wustl.edu, or by visiting her office in the Womens Building. Additionally, you can report incidents or complaints to Tamara King, Associate Dean for Students and Director of Student Conduct, or by contacting WUPD at (314) 935-5555 or your local law enforcement agency.

You can also speak confidentially and learn more about available resources at the Relationship and Sexual Violence Prevention Center by calling (314) 935-8761 or visiting the 4th floor of Seigle Hall.

Bias Reporting 
The University has a process through which students, faculty, staff and community members who have experienced or witnessed incidents of bias, prejudice or discrimination against a student can report their experiences to the Universitys Bias Report and Support System (BRSS) team. See: http://brss.wustl.edu


Part I – MapReduce (about 5 weeks)

  1. Big Data and Big Data Analysis
  2. Cloud Computing
    • Distributed Storage
    • HDFS
    • Distributed Computation
    • MapReduce Programming Paradigm
  3. Hadoop MapReduce
    • Running a MapReduce Job
    • Job Execution
    • Writing MapReduce Programs
      • Implementing Mappers, Reducers, and Drivers in Java
      • Reusing Objects and Map-only Jobs
    • Job Configuration
  4. MapReducing Algorithms
    • Sorting and Searching
    • Inverted Index
    • Secondary Sort

Part II – Big Data Analysis and Applications (about 8 weeks)

  1. Recommendation Engines
    • Introduction (Top-N list, Frequently Bought Together)
    • Content-based Recommendation
    • Collaborative Filtering
    • Final Project Topic: Netflix Recommendation Challenge
  2. [Data Analysis with Pig]
    • Basics: Pig Latin, Loading Data, Data Types
    • Example Data Analysis Task: ETL
    • Multi-Dataset Operations
  3. Data Analysis with Hive
    • Introduction: Hive vs Traditional Databases
    • HiveQL Syntax and Built-in Functions
    • Hive Data Management
    • Final Project Topic: Text Processing with Hive
  4. Data Analysis with Spark
    • RDDs
    • Interactive Spark Shell and PySpark
    • Example Application: ETL
    • Spark Applications – Python Programs using Spark
    • Example Iterative Algorithm: PageRank
    • Final Project Topic: Geolocation Clustering with Spark
  5. Applications
  6. [Introduction to Impala]
    • Interactive Data Analysis with Impala
  7. Discussion: [Impala vs] Hive vs [Pig vs] Spark vs MapReduce vs RDMS

Note: [ ] indicate optional topics

More Big Data Applications (we might touch upon)

  • Finding Similar Items
    • Document Retrieval
    • Big input/feature spaces
    • Locality Sensitive Hashing
  • Social Network Analysis
    • Social Networks as Graphs
    • Clustering/Community Detection/Partitioning
    • Finding Triangles using MapReduce
    • Beyond Social Networks: Introduction to Graph-based Machine Learning
Course calendar and reading




27 Aug Syllabus, Course Overview
29 Aug Introduction

  • What is Big Data?
  • Big Data Characteristics
  • Big Data Applications
  • Processing Patterns
  • HTDG 4th ed:
    • Ch1 pp 3-14, Data & Meet Hadoop
    • Ch2 pp 19-22, Weather Example
3 Sept  Lab1 – Distributed Storage

  • Distributed File Systems
  • HDFS
  • HTDG 4th ed
    • Ch3 pp 43-47, HDFS
    • Ch3 pp 51-52, Basic FS Operations
5 Sept Cloud Computing

  • History
  • Basic Concept
  • Main Components

MR1:  MapReduce (MR)

  • Distributed Computing
  • MapReduce Data Flow
  • HTDG 4th ed
    • Ch2 pp 22-24, MapReduce
10 Sept Lab2 – MR2: Running a MR Job

  • MapReduce Processes
  • WordCount Example
  • Java Implementation
  • Job Submission
  • HTDG 4th ed
    • Ch2 pp 24-30, Java MapReduce
    • Ch6 pp 160-168, Running a MR Job
12 Sept MR3: Job Execution

  • MR Job Execution
  • YARN
  • Data Locality
  • Shuffle and Sort
  • Combiners


  • HTDG 4th ed
    • Ch2 pp 30-37, Scaling Out & Combiner
    • Ch4 pp 79-81, YARN 
    • Ch4 pp 85-86, Scheduling in YARN 
    • Ch7 pp 197-200, Shuffle and Sort
17 Sept Lab3 – MR4: Writing & Testing a MR Program

  • Serialization
  • Driver
  • Map-only Job
  • Reuse Objects
  • Testing Locally
  • Notes on using Eclipse
  • Hadoop 2.6.0 CDH 5.14.0 API
  • HTDG 4th ed
    • Ch2 pp 26-27, Driver
    • Ch5 pp 109-115, Serialization
    • Ch8 pp 220-223,232-236, Input Formats
    • Ch6 p 141, Developing MR Programs
19 Sept MR3: Job Execution cont.

  • Combiners

MR5: Job Configuration

  • ToolRunner
  • Passing Parameters
  • Distributed Cache
  • HTDG 4th ed
    • Ch6 pp 141-148, Configuration API
    • Ch6 pp 148-152, ToolRunner
    • Ch9 pp 273-279, Distributed Cache
24 Sept Lab4 – MR6: Optimizing MR Programs

  • Partitioner
  • Use Case: Global Sort
  • [optional] Log File Analysis
  • HTDG 4th ed
    • Ch9 pp 255-262, Sorting
26 Sept Application: Recommendation Systems

  • Long Tail
  • Recommendation Tasks

RS1: Collaborative Filtering

  • Utility Matrix
  • Similarity Measures
  • MMDS Ch9
    • Ch9.1: A Model for RS
    • Ch9.3: Collaborative Filtering
    • Ch9.5: Netflix Challenge
  • WIRED article:  The Long Tail
1 Oct MR7: Custom Key/Value Types

  • Writables & WritableComparables

Lab 5 –  Use Case: Secondary Sort 

  • Custom WriteableComparable
  • Custom Partitioner
  • Custom SortComparator
  • Custom GroupComparator
  • HTDG 4th ed
    • Ch5 pp 109-126, Serialization
    • Ch9 pp 262-266, Secondary Sort
  • Data Algorithms: pp 1-12, Secondary Sort
3 Oct RS1: Collaborative Filtering contd.

  • MR Program
  • Challenges
  • No Lecture QUIZ
  • slides: see previous lecture
  • MMDS Ch9
    • Ch9.3: Collaborative Filtering
8 Oct MR8: Practical Development

  • Incremental Development
  • Debugging
  • [optional] Unit Testing, Logging

Lab 6 – RS2: Top-N-List  Recommendations

  •  Top-N-List in MapReduce
10 Oct RS3: Co-occurrence based Recommendation

  • Frequently bought together
  • Customers who bought this item also bought…
  • Communication Cost
  • Pairs and Stipes
15 Oct  FALL BREAK – no lab
17 Oct MR9: MR Workflows & Beyond  MR

  • MR Workflows
  • DAGs
  • Database Operations
  • Latency


Database Operations on HDFS

  • Selection, Projection
  • Union, Intersection, Difference
  • Grouping & Aggregation
  • Joins
  • Quick Into to Hive & Pig
  • slides:  MR Workflows
  • HTDG 4th ed,
    • Ch6 pp 177-179: MR Workflows
  • MMDS Ch2.3.3-8 Relational-Algebra Operations
  • HTDG 4th ed,
    • Ch9 pp 268-273: Joins
    • Ch17 pp471-484: Hive (Intro, An Example, The MetastoreComparison with Traditional Databases)
  • [optional] HTDG 4th ed,
    • Ch16 pp 423-431: Pig (Intro, An Example, Comparison with Databases)
22 Oct Lab 7 – Data Management with Hive

  • Hive Syntax, Data Types, and Basic Operations
  • Hive Metastore
  • Creating Tables
  • Querying Tables
  • Complex Field Types
  • Hive Reference Slides: Hive1
  • HTDG 4th ed,
    • Ch9 pp 268-273: Joins
    • Ch17 pp471-484: Hive (Intro, An Example, The MetastoreComparison with Traditional Databases)
24 Oct SP1: Spark

  • Introduction
  • RDDs
  • Spark Shell
  • Lazy Execution
  • Pair RDDs
  • MapReduce in Spark
29 Oct Lab 8 – Flume & Spark Shell

  • Data Ingest
    • Sqoop (hw9)
    • Flume (Lab 8)
  • Using the Spark Shell
  • Creating Pair RDDs
  • HTDG 4th ed, Ch16
    • pp 381-384: Intro to Flume
  • HTDG 4th ed, Ch15
    • pp 401: Intro to Sqoop
31 Oct SP2: More Spark

  • Writing Spark Programs
  • Spark Job Execution

Application: PageRank

FINAL PROJECT Introduction

  • HTDG 4th ed, Ch19
    • pp 565-570: Jobs & Stages
    • pp 571-574: Spark on YARN
5 Nov SP3 – RDD Persistence

  • RDD Lineage
  • RDD Persistance
  • Checkpointing

Lab 9 PageRank

FINAL PROJECT milestone 1

  • HTDG 4th ed, Ch19
    • pp 560-561: Persistence
7 Nov Application: Text Mining & Sentiment Analysis

  • TF-IDF
  • word co-occurrence
12 Nov Lab 10 – PageRank for Real

  • Write the PageRank Spark application
  • Analyze the application
  • Draw the DAG
  • Compute rankings for a subset of the real webgraph

FINAL PROJECT milestone 2

  • slides: cf. lecture Oct 31 and Lab9
14 Nov Model Parameters, Choices, and Evaluation

  • recommendations
  • prediction/classification
  • clustering

FINAL PROJECT milestone 3

  • MMDS Ch12
    • 12.1.1 Training Sets
    • 12.1.4 Testing (Model Evaluation)
19 Nov EXAM Review and Questions

EXAM Preparation

  • open discussion based on your questions
  • work on previous exam problems in groups
  • We will not present solutions to previous exams.
  • If you have worked on the problems and you have any questions or got stuck, I am happy to help and discuss!
21 Nov ***In-Class EXAM***
26 Nov  Lab 11 – Cloud Execution on EMR

  • Amazon Elastic Map Reduce
  • Create S3 bucket
  • Launch EC2 instances
  • Launch Hadoop cluster

FINAL PROJECT milestone 4

EMR links for final projects
28 Nov  THANKSGIVING – no class
3 Dec Lab 12 – Work on Project

  • coordinate next steps with your team members
  • start project report template
  • ask us clarifying questions
Exam regrades may be made in the lab sessions.
5 Dec Application: Large-scale Classificationor

Application: Large-Scale Social Network Analysis

  • MMDS
    • Ch12 Large-scale Machine Learning
    • Ch10 Mining Social Network Graphs
13 Dec FINAL PROJECT due at 6pm – no extension!
Homework assignments

All homework submissions must be made via Gradescope. Sign-up will be managed via Canvas.


(violations will result in a penalty on the hw grade)

Find a tutorial on submitting a PDF to Gradescope HERE or watch this video.

  1. Match Pages: In Gradescope every page needs to be matched to the problems it contains.
    • -10% penalty of the assignment score, if the pages are not (or incorrectly) matched.
  2. [tentative] Include wustlkey: Each page needs to include one wustlkey indicating the SVN repository used for code submission.
    • required for individual and group submissions
    • no credit for code submission if wustlkey is not provided on the submission page for the respective problem
    • regrades will be taken but a penalty of -10% of the problem score will be applied
  3. Gradescope Group Submission: In Gradescope both group members need to be added to the submission.
    • -5% penalty for all team members if your group’s submission is not a Gradescope group submission listing both team members
    • Find a tutorial on how to add a group member to your submission in the second half of this video.
  • 08/29 hw1
    • due: THU 09/05/2019 at 4pm
    • submit pdf via Gradescope
    • submit hw reflection via SVN repository commit
    • submit hw rating using this link
  • 09/05 hw2
    • due: THU 09/12/2019 at 4pm
    • submit pdf via Gradescope
    • submit hw reflection via SVN repository commit
    • submit hw rating using this link
  • 09/12 hw3
    • due: THU 09/19/2019 at 4pm
    • submit pdf via Gradescope
    • submit hw reflection via SVN repository commit
    • submit hw rating using this link
  • 09/19 hw4
    • due: THU 09/26/2019 at 4pm
    • submit pdf to Homework 4 assignment via Gradescope
    • submit zip to Homework 4 Code assignment via Gradescope
    • submit hw reflection via SVN repository commit
    • submit hw rating using this link
  • 09/26 hw5
    • due: THU 10/03/2019 at 4pm
    • submit pdf to Homework 5 assignment via Gradescope
    • submit zip to Homework 5 Code assignment via Gradescope
    • submit hw reflection via SVN repository commit
    • submit hw rating using this link
  • 10/03 hw6
    • due: THU 10/10/2019 at 4pm
    • submit pdf to Homework 6 assignment via Gradescope
    • submit zip to Homework 6 Code assignment via Gradescope
    • submit hw reflection via SVN repository commit
    • submit hw rating using this link
  • 10/10 hw7
    • due: THU 10/24/2019 at 4pm (2 weeks!!!)
    • submit pdf to Homework 7 assignment via Gradescope
    • submit zip to Homework 7 Code assignment via Gradescope
    • submit hw reflection via SVN repository commit
    • submit hw rating using this link
  • 10/24 hw8
    • due: THU 10/31/2019 at 4pm
    • submit pdf to Homework 8 assignment via Gradescope
    • submit hw reflection via SVN repository commit
    • submit hw rating using this link
  • 10/31 hw9
    • due: THU 11/07/2019 at 4pm
    • submit pdf to Homework 9 assignment via Gradescope
    • submit zip to Homework 9 Code assignment via Gradescope
    • submit hw reflection via SVN repository commit
    • submit hw rating using this link
Resources and HowTos


  • Mining of Massive Data Sets by Jure Leskovec, Anand Rajaraman, Jeff Ullman (available for free online http://mmds.org)
  • Hadoop: The Definite Guide (4th edition) by Tom White
  • Data Analytics with Hadoop – An Introduction for Data Scientists by Benjamin Bengfort, Jenny Kim

Optional Book: 

  • Data Algorithms: Recipes for Scaling Up with Hadoop and Spark by Mahmoud Parsian

Cloudera Course VM

We will use a pre-configured virtual machine in the course.

  1. Download and install a virtualization program to run the virtual machine.
    • VirtualBox (recommended for all platforms MacLinuxWindows)
    • VMWare (possible for Windows OS – limited instructor and TA support)
  2. Download the VM matching your virtualization software from HERE.
    • System requirements: To be able to run the VM on your laptop you need at least 4GB RAM which is the minimum recommended memory as indicated here.
    • If your laptop/computer does not support these requirements, please contact me asap! We can provide you with a rental laptop for the semester.
  3. Set up the VM.
    • Here is a tutorial for VirtualBox.
    • Here is a tutorial for VMWare.
  4. Check out these VM trouble-shooting tips whenever you run into issues with your VM!
  5. Working with the VM and optional set up.
    • The username for the CentOs operating system running in the VM is cloudera and the password (if you should need it) is cloudera as well.
    • Attention Windows users: make friends with the Linux terminal and consider this cheat sheet for useful shell commands.
    • Optional: to install software on CentOS running in your VM use the terminal application yum
      • e.g., sudo yum install htop (if you want to install htop)
      • e.g., sudo yum install subversion (if you want to install svn)
      • here is a tutorial on yum
    • Optional but useful: set up a shared folder with your host machine: here are the instructions.


We will use Gradescope for all homework grading. Find a tutorial on submitting a PDF to Gradescope HERE. To sign up use entry code TBA.


We will be using SVN to distribute code stubs and data, as well as to collect code solutions. Please see this tutorial about accessing your repository.

The path to your SVN repository is:


You need to substitute your own wustlkey (e.g. m.neumann) in place of <wustlkey>, your course number (e.g. 427s) in place of XXX, and the respective abbreviation for the semester and year (e.g. fl18 for fall 2018).

If you wish to access your files from your own computer, you can use SVN via the terminal (Mac, Linux) on your host machine or you will need to install Tortoise (Windows) or SmartSVN (Windows, Mac, Linux) again on your host OS.

Verifying your repository commits
To verify if your work was committed successfully enter the URL (https://svn.seas.wustl.edu/repositories/<wustlkey>/cseXXX_fl18) of your repository in a web browser. You will see all the files that are currently in the repository (mind browser caching).

AWS Account

Towards the end of the semester we will be using AWS to execute our programs on a “real” cloud. Follow theses instructions to create an account and get educational credit (only possible if you have not applied for educational credit before).


Notes on how to use Eclipse can be found here. Notes on how to use Eclipse to test MapRedcue programs locally are here.


A reference about regular expressions can be found here.

Please ask any questions related to the course materials and homework problems on Piazza. Other students might have the same questions or are able to provide a quick answer.
Any public postings of (partial or full) solutions to homework problems (written or in form of source or pseudo code) will result in a grade of zero for that particular problem for ALL students in the course.