Course Book
-
Data Analytics with Hadoop – An Introduction for Data Scientists by Benjamin Bengfort, Jenny Kim
- electronic copy through the Wash U library for viewing online
Additional Books
- Mining of Massive Data Sets by Jure Leskovec, Anand Rajaraman, Jeff Ullman (available for free online http://mmds.org)
- [optional] Hadoop: The Definite Guide (4th edition) by Tom White
- electronic copy through the Wash U library for viewing online
Cloudera Course VM
We will use a pre-configured virtual machine that runs Hadoop in the course.
- Download and install a virtualization program to run the virtual machine.
- VirtualBox (recommended for all platforms (Mac, Linux, Windows)
- VMWare (possible for Windows OS)
- Download the VM matching your virtualization software from HERE.
- System requirements: To be able to run the VM on your laptop you need at least 4GB RAM which is the minimum recommended memory as indicated here. If your laptop/computer does not support these requirements, please contact me asap.
- Set up the VM.
- Do Lab0 (requires a working VM).
- Check out these VM trouble-shooting tips whenever you run into issues with your VM!
- Working with the VM and optional set up.
- The username for the CentOs operating system running in the VM is cloudera and the password (if you should need it) is cloudera as well.
- Attention Windows users: make friends with the Linux terminal and consider this cheat sheet for useful shell commands.
- Optional: to install software on CentOS running in your VM use the terminal application yum
- e.g., sudo yum install htop (if you want to install htop)
- e.g., sudo yum install subversion (if you want to install svn)
- here is a tutorial on yum
- Optional but useful: set up a shared folder with your host machine: here are the instructions.
AWS Account
We will be using AWS to execute our programs on a “real” cloud. Follow theses instructions to create an account and get educational credit (only possible if you have not applied for educational credit before).
Gradescope
We will use Gradescope for all homework grading. Find a tutorial on submitting a PDF to Gradescope HERE. To sign up use entry code TBA.
Regex
A reference about regular expressions can be found here.
You must be logged in to post a comment.