Lab 7 – Data Management with Hive

Part I: Introduction to Hive (15min)

The essential reading for this lab are the Hive1 slides. Let’s go through the introduction together. They further cover

  • basic syntax,

  • data types,
  • joins, and
  • built-in functions,

as well as, how to

  • create tables,
  • load data,
  • set up external tables, and
  • save query results.

Take 10 mins to skim through those slides and have them ready as a reference while you do the lab.

Part II: Let’s do some Hive (40min)

You will need to complete the lab to answer some of the quiz questions below. Run this command in ~/training_materials/dev1/scripts first:

sh ./dev1_toggle_services.sh

 

The goal of this lab is to practice using the following common techniques for creating, populating, and querying Hive tables:

  • Table creation and population
  • Creating and switching databases
  • Querying tables
  • Modifying table information: Altering and dropping a table

You will also create and query a table containing each of the complex field types Hive offers:

  • array,
  • map, and
  • struct.

Find the step-by-step lab instructions HERE.

Part III: Quiz (15min)

This quiz asks question about the Hive basics and on some of the results you got while doing the previous part.

IMPORTANT: When answering the questions on the practical part of the lab, select “I did not get to this part“, if you did not implement the respective queries. This is important for us to see how far you got. We will give partial credit for being honest.