VM Trouble-Shooting

Here are couple of issues you might run into. First, note that whenever you run into an issue restarting your VM (restarting implies powering it offnot saving it) might resolve it.

  1. VM TAKES FOREVER TO BOOT
  2. No internet connection
  3. Cannot check out SVN repository inside VM
  4. CANNOT COMMIT TO OR UPDATE SVN REPOSITORY INSIDE VM
  5. COMMON VM ISSUES: Connection refused
  6. Common VM Issues: Namenode in safemode
  7. COMMON VM ISSUES: PROJECTS do not show in Eclipse
  8. Hue File browser not working
  9. NO SPARK LOGS
  10. WINDOWS: VM DOES NOT BOOT

(0) VM TAKES FOREVER TO BOOT

We recommend not to power off a working VM, but simply save its state. This way once you start the VM the next time it will just open at that exact state with no need for starting up the OS. This should be much faster!

(1) No internet connection

Once up and running you might want to get a connection to the internet. Click on the network icon in the upper right and select Auto Ethernetenable_network

If this does not work either, double-check your network settings. The network should be set to NAT.

If it says anything but NAT under network set it to use NAT by going to Settings –> Network and pick NAT in the drop-down box:

If this still doesn’t work an you have a WINDOWS host computer try the following:

  1. Go to Control Panel > Network and Internet > Network Connections
    • Right click on the box for Wi-Fi and press Add to Bridge
  2. Go to Virtual Box > Settings > Network and ensure that Enable Network Adapter is selected in the  Adapter 1 tab. Pick NAT from the dropdown for ‘Attached To’.
  3. Under Advanced Settings
    • set Allow VMs for Promiscuous Mode
    • select Cable Connected
(2) Cannot check out SVN repository inside VM

Make sure you follow lab0 first. One reason for this issue might be that your VM has no internet connection –> see (1)

(3) CANNOT COMMIT TO OR UPDATE SVN REPOSITORY INSIDE VM

Most likely your VM has no internet connection –> see (1)

(4) COMMON VM ISSUES: Connection refused

Try to execute the following command:

sudo service ssh restart

to restart ssh or restart your VM. If this does not work try this:

sudo service hadoop-hdfs-namenode restart

If this fails as well you will have to try the following:

  1. Go to /var/log/hadoop-hdfs/
  2. Examine your hadoop-hdfs-namenode-quickstart.cloudera.log file.
    • Using a text editor go to the bottom of the log and scroll up or execute
cat hadoop-hdfs-namenode-quickstart.cloudera.log | grep fsimage
    • we are looking for the following error:
ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: Failed to load image from FSImageFile(file=/var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/fsimage_000000000000000XXXX, cpkTxId=000000000000000XXXX) java.io.IOException:premature EOF from inputStream
    • If you find this error it means that fsimage_000000000000000XXXX is a corrupted file that you need to delete in order for your namenode to start correctly. (Note the XXXX is some sequence of digits that will vary from person to person it can also be more than 4 digits)
  1. Next navigate to the directory /var/lib/hadoophdfs/cache/hdfs/dfs/name/current/
  2. Check if the size of this file is 0 bytes (excute ll in the terminal, the 4th column shows the byte size or right click on the file that you think is corrupted). If this file has 0 bytes, delete it by executing
sudo rm fsimage_000000000000000XXXX
  1. Now, all you have to do is restart your namenode:
sudo service hadoop-hdfs-namenode start

Your NameNode should now be running. To check this and all of your other nodes are running use the following:

for service in /etc/init.d/hadoop-hdfs-*; do $service status; done;

Let me know if this worked for you! If it doesn’t or you have a different error, post the error from your log file on the ongoing Piazza post.

(5) Common VM Issues: Namenode in safemode

When you encounter this error:

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot delete /tmp/hadoop-yarn/staging/training/.staging/job_1473719038910_0001. Name node is in safe mode.

Execute this command:

hdfs dfsadmin -safemode leave
(6) COMMON VM ISSUES: PROJECTS do not show in Eclipse

If you VM (including Eclipse) does not shut down correctly you might encounter en empty workspace when you open it again. Follow those instructions to fix this issue (from stackoverflow): 

Now, you can browse to the workspace directory, so that all projects will be listed. Then untick the Copy projects into workspace option (it is selected by default). And hit Finish.

(7) Hue File browser not working

Execute this command to (re)start Hue:

~scripts/analyst/toggle_services.sh
(8) NO SPARK LOGS

Find the spark-defaults.conf file in

/etc/spark/conf.list

Open it with sudo and add these lines:

spark.eventLog.dir=hdfs:///user/spark/applicationHistory
spark.eventLog.enabled           true
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.yarn.historyServer.address=localhost:18088
spark.executor.memory 400M

Then execute your job and go to the Yarn Resource Manager Web UI. Find your application and go to the very right to click on History. You should see something like this:

(9) WINDOWS: VM DOES NOT BOOT

If you get a black screen like the one shown above on a windows pro edition, turn on/off windows settings and disable hyperV.