Tag Archives: hortonworks-sandbox

Getting started with Hadoop

Getting started with Hadoop ecosystem is quite different than getting hands dirty with any other library. I plan to document here, few important points of my learning for the benefit of others.

Local Installation

I’ve chosen to try Hortonworks Sandbox for the following reasons.You can also install the sandbox provided by Cloudera (known as Cloudera QuickStart VM) instead.

  • to try out few additional packages like Hue, HCatalog etc.
  • runs on 32-bit and 64-bit OS (Windows XP, Windows 7, Windows 8 and Mac OSX)

Prerequisites

  • Minimum 4GB RAM; 8Gb required to run Ambari and Hbase
  • Virtualization enabled on BIOS
NOTE

I had to re-image my laptop when I tried to enable the virtualization from BIOS. So make sure to backup all important data and try at your own discretion. Instructions on enabling the virtualization are provided below.

Installation

Enable virtualization in BIOS

1. Press Esc during system start-up to bring the below screen

screen1

2. Press F10 to enter BIOS and then select System Configuration

screen2
3. Select Device Configurations

screen3

4. Enable the below two settings

screen4

5. Download Oracle Virtual Box from here
6. Download Hortonworks Sandbox VM from here
7. Follow the rest of the instructions from here

Credentials

  • root/hadoop
  • hue/hadoop

Connectivity

  1. http://127.0.0.1:8888
  2. SSH: 127.0.0.1 2222
  3. SCP: 127.0.0.1 2222

URL’s

Tagged , , , ,

Running HBase Java applications on Hortonworks Hadoop Sandbox 2.x with YARN

Running a Java HBase client in the Hortonworks Sandbox was not easy/intuitive as HBase is not enabled by default.

Code to smoke test connectivity

The below program acts as a smoke test for the connectivity to HBase


public class HBaseSmokeTest
{
public static void main(String[] args) throws IOException
{
Configuration conf = HBaseConfiguration.create();

System.out.println("---> zookeeper.znode.parent = " + conf.get("zookeeper.znode.parent"));
System.out.println("---> hbase.zookeeper.quorum = " + conf.get("hbase.zookeeper.quorum"));
System.out.println("---> HBaseConfiguration = " + conf);

HTable hTable = new HTable(conf, "ambarismoketest");

try
{
System.out.println("---> Table name = " + Bytes.toString(hTable.getTableName()));
}
finally
{
hTable.close();
}
}
}

Deploy

  1. Jar up the generated class (using maven is of great help). Say the name of the jar is hadoop-examples-0.0.1-SNAPSHOT.jar
  2. SCP the jar to /usr/lib/hadoop

Run

  1. SSH to 127.0.0.1 on port 2222
  2. cd /usr/lib/hadoop
  3. Run the command: yarn jar ./hadoop-examples-0.0.1-SNAPSHOT.jar com.aravind.learning.hadoop.hbase.HBaseSmokeTest

Issues

Issue 1

YARN doesn’t have HBase related jar files and HBase configuration (i.e. hbase-site.xml) in the classpath.

The first error you encounter would be a NoClassDefFoundError on one of the HBase API (e.g.  org.apache.hadoop.hbase.util.Bytes). The issue is that the HBase related jar’s are not in the classpath of YARN by default. Follow these steps to fix this

NoClassDefFoundError: /org/apache/hadoop/hbase/HBaseConfiguration

Solution

  1. Open /etc/hadoop/conf/hadoop-env.sh in edit mode
  2. Go to the location of export HADOOP_CLASSPATH
  3. Create two new env properties above that location of HADOOP_CLASSPATH: HBASE_LIBS=/usr/lib/hbase/lib/ and HBASE_CONF=/etc/hbase/conf/
  4. Add these two env properties to the HADOOP_CLASSPATH as shown below
  5. export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}${JAVA_JDBC_LIBS}:${MAPREDUCE_LIBS}:${HBASE_LIBS}:{HBASE_CONF}

Adding HBASE_CONF to the classpath is required so that hbase-site.xml is in the classpath.

Issue 2

ClusterId read in ZooKeeper is null.

Re-running the program after fixing issue 1 will result in the following error in the log file (Oddly logged at INFO level)

13/12/11 09:45:33 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x207f5580

13/12/11 09:45:33 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x207f5580 connecting to ZooKeeper ensemble=localhost:2181
13/12/11 09:45:33 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
13/12/11 09:45:33 INFO zookeeper.ClientCnxn: Socket connection established to localhost.localdomain/127.0.0.1:2181, initiating session
13/12/11 09:45:33 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost.localdomain/127.0.0.1:2181, sessionid = 0x142e28373f3000c, negotiated timeout = 40000
13/12/11 09:45:33 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null

Solution

The HBase clients will discover the running HBase cluster using the following two properties:

  1. hbase.zookeeper.quorum: is used to connect to the zookeeper cluster
  2. zookeeper.znode.parent. tells which znode keeps the data (and address for HMaster) for the cluster

The value of zookeeper.znode.parent in HBASE_CONF/hbase-site.xml is specified as /hbase-unsecure (see below) which is correct but for some reason (still trying to figure this out), the value being printed is /hbase. So currently I’ve overridden this programatically in the client program by adding the following line to the program

conf.set(“zookeeper.znode.parent”, “/hbase-unsecure”);

Re deploy the jar and re run the program.

Tagged , , , , ,