Part 1: Intro to DFS (20mins)

Let’s briefly talk about data storage for cloud computing and discuss distributed file systems (DFS) in general and HDFS in particular. We are especially interested in how data is stored in the distributed file system.

Part 2: Reading & Lab Quiz (20mins)

Skim through the lab slides. You will learn…

  • about some of the essential properties of DFS
  • some Hadoop cluster terminology
  • the basic concept of HDFS and the most important shell commands to interact with it

Finish the quiz before working on the lab. You can find the quiz on the Gradescope course for your lab section:

Now, you are prepared to do the lab!

Part 3: Distributed Storage with HDFS (30mins)

Now, let’s put some data into HDFS in your VM. How about Shakespeare’s work? You will also learn how to navigate in the distributed file system using the command line and the Hue file browser.

Download the step by step lab instructions HERE.