Getting started with Hadoop

Getting started with Hadoop ecosystem is quite different than getting hands dirty with any other library. I plan to document here, few important points of my learning for the benefit of others.

Local Installation

I’ve chosen to try Hortonworks Sandbox for the following reasons.You can also install the sandbox provided by Cloudera (known as Cloudera QuickStart VM) instead.

  • to try out few additional packages like Hue, HCatalog etc.
  • runs on 32-bit and 64-bit OS (Windows XP, Windows 7, Windows 8 and Mac OSX)

Prerequisites

  • Minimum 4GB RAM; 8Gb required to run Ambari and Hbase
  • Virtualization enabled on BIOS
NOTE

I had to re-image my laptop when I tried to enable the virtualization from BIOS. So make sure to backup all important data and try at your own discretion. Instructions on enabling the virtualization are provided below.

Installation

Enable virtualization in BIOS

1. Press Esc during system start-up to bring the below screen

screen1

2. Press F10 to enter BIOS and then select System Configuration

screen2
3. Select Device Configurations

screen3

4. Enable the below two settings

screen4

5. Download Oracle Virtual Box from here
6. Download Hortonworks Sandbox VM from here
7. Follow the rest of the instructions from here

Credentials

  • root/hadoop
  • hue/hadoop

Connectivity

  1. http://127.0.0.1:8888
  2. SSH: 127.0.0.1 2222
  3. SCP: 127.0.0.1 2222

URL’s

Tagged , , , ,

Running HBase Java applications on Hortonworks Hadoop Sandbox 2.x with YARN

Running a Java HBase client in the Hortonworks Sandbox was not easy/intuitive as HBase is not enabled by default.

Code to smoke test connectivity

The below program acts as a smoke test for the connectivity to HBase


public class HBaseSmokeTest
{
public static void main(String[] args) throws IOException
{
Configuration conf = HBaseConfiguration.create();

System.out.println("---> zookeeper.znode.parent = " + conf.get("zookeeper.znode.parent"));
System.out.println("---> hbase.zookeeper.quorum = " + conf.get("hbase.zookeeper.quorum"));
System.out.println("---> HBaseConfiguration = " + conf);

HTable hTable = new HTable(conf, "ambarismoketest");

try
{
System.out.println("---> Table name = " + Bytes.toString(hTable.getTableName()));
}
finally
{
hTable.close();
}
}
}

Deploy

  1. Jar up the generated class (using maven is of great help). Say the name of the jar is hadoop-examples-0.0.1-SNAPSHOT.jar
  2. SCP the jar to /usr/lib/hadoop

Run

  1. SSH to 127.0.0.1 on port 2222
  2. cd /usr/lib/hadoop
  3. Run the command: yarn jar ./hadoop-examples-0.0.1-SNAPSHOT.jar com.aravind.learning.hadoop.hbase.HBaseSmokeTest

Issues

Issue 1

YARN doesn’t have HBase related jar files and HBase configuration (i.e. hbase-site.xml) in the classpath.

The first error you encounter would be a NoClassDefFoundError on one of the HBase API (e.g.  org.apache.hadoop.hbase.util.Bytes). The issue is that the HBase related jar’s are not in the classpath of YARN by default. Follow these steps to fix this

NoClassDefFoundError: /org/apache/hadoop/hbase/HBaseConfiguration

Solution

  1. Open /etc/hadoop/conf/hadoop-env.sh in edit mode
  2. Go to the location of export HADOOP_CLASSPATH
  3. Create two new env properties above that location of HADOOP_CLASSPATH: HBASE_LIBS=/usr/lib/hbase/lib/ and HBASE_CONF=/etc/hbase/conf/
  4. Add these two env properties to the HADOOP_CLASSPATH as shown below
  5. export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}${JAVA_JDBC_LIBS}:${MAPREDUCE_LIBS}:${HBASE_LIBS}:{HBASE_CONF}

Adding HBASE_CONF to the classpath is required so that hbase-site.xml is in the classpath.

Issue 2

ClusterId read in ZooKeeper is null.

Re-running the program after fixing issue 1 will result in the following error in the log file (Oddly logged at INFO level)

13/12/11 09:45:33 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x207f5580

13/12/11 09:45:33 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x207f5580 connecting to ZooKeeper ensemble=localhost:2181
13/12/11 09:45:33 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
13/12/11 09:45:33 INFO zookeeper.ClientCnxn: Socket connection established to localhost.localdomain/127.0.0.1:2181, initiating session
13/12/11 09:45:33 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost.localdomain/127.0.0.1:2181, sessionid = 0x142e28373f3000c, negotiated timeout = 40000
13/12/11 09:45:33 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null

Solution

The HBase clients will discover the running HBase cluster using the following two properties:

  1. hbase.zookeeper.quorum: is used to connect to the zookeeper cluster
  2. zookeeper.znode.parent. tells which znode keeps the data (and address for HMaster) for the cluster

The value of zookeeper.znode.parent in HBASE_CONF/hbase-site.xml is specified as /hbase-unsecure (see below) which is correct but for some reason (still trying to figure this out), the value being printed is /hbase. So currently I’ve overridden this programatically in the client program by adding the following line to the program

conf.set(“zookeeper.znode.parent”, “/hbase-unsecure”);

Re deploy the jar and re run the program.

Tagged , , , , ,

Indexing in Neo4j

By default, Neo4j uses Lucene as its index implementation. So anyone planning to use Neo4j indexing feature, should learn the basics of inverted index and Lucene. Below tutorials are a good start

  1. Inverted Index
  2. Creating a simple inverted index
  3. How to implement search engine
  4. Lucene: Getting started

You have two options of using indexes in Neo4j

Auto-indexing

Pros

  1. If you enable this, then there will be only two indexes created; 1 for the Nodes and another one for the Relationships.
  2. Updates to the Nodes or Relationships are automatically handled by the Neo4j

Cons

  1. Tow large indexes for your entire db instance. Some use-case require finer level control.

Custom indexing

Pros

  1. Control WHAT you want to index. You can choose to index only few types of your nodes and relationships and also only few properties of those.
  2. Control WHERE you want to index. You can have separate indexes for each types of nodes/relationships. For e.g. in a movie related db, you can have users indexed in a “users” index and movies indexed in a “movies” index.
  3. Might result in less space and might be more performant.

Cons

  1. You are responsible for making sure to cascade the insert, delete and modify operations to the index as well.
Tagged , ,

XmlBeans: removing namespace attributes from the generated xml


public static void removeNamespaces(XmlObject root)
{
 String s;
 XmlCursor cursor = root.newCursor();
 cursor.toNextToken();
 while (cursor.hasNextToken())
 {
 if (cursor.isNamespace())
 {
 cursor.removeXml();
 }
 else
 {
 if (cursor.isStart() || cursor.isAttr())
 {
 s = cursor.getName().getLocalPart();
 cursor.setName(new QName(s));
 }
 cursor.toNextToken();
 }
 }
 cursor.dispose();
}

 

Tagged ,

JMock – Lessons learned

  • Expectations are asserted after the run of each test method
  • Expectations are additive, so remember to call mock.reset() if you are planning to set a new expectation in each test method
  • If u r using JUnit 4 and above then you no need to extend from MockObjectTestCase. You can use @RunWith(JMock.class). But that means we can not use other JUnit 4 runners like @RunWith(Parameterized.class). You can look into XJ4 (extensions for JUnit 4) project @ http://code.google.com/p/peachjean/wiki/XJ4 to see how it can help
Tagged ,

JMock – mocking the same interface multiple times

A subject can have multiple addresses which i wanted to mock. but the below attempt throws the fololwing error
java.lang.IllegalArgumentException: a mock with name tokenizedUnitedKingdomAddress already exists

UnitedKingdomConsumerSubject primarySubj = context.mock(UnitedKingdomConsumerSubject.class);
TokenizedUnitedKingdomAddress currAddr = context.mock(TokenizedUnitedKingdomAddress.class);
TokenizedUnitedKingdomAddress prevAddr = context.mock(TokenizedUnitedKingdomAddress.class);
TokenizedUnitedKingdomAddress prevAddr2 = context.mock(TokenizedUnitedKingdomAddress.class);
 
primarySubj.setAddress(CURRENT, currAddr);
primarySubj.setAddress(FORMER, prevAddr);
primarySubj.setAddress(SECOND_FORMER, prevAddr2);

Seems like we have to use the overloaded method public T mock(Class typeToMock, String name). So the below fixed the error

TokenizedUnitedKingdomAddress currAddr = context.mock(TokenizedUnitedKingdomAddress.class, "current address");
TokenizedUnitedKingdomAddress prevAddr = context.mock(TokenizedUnitedKingdomAddress.class, "former address");
TokenizedUnitedKingdomAddress prevAddr2 = context.mock(TokenizedUnitedKingdomAddress.class, "2nd former address"
Tagged ,

Handling Exceptions in Mule

Proper exception handling is a requirement of any integration application. Handling errors with Mule seems to be challenging for a beginner. So here i present various ways of dealing with exceptions to the best of my knowledge.

Note: check the hello example that comes bundled with mule. The below example is based upon that with little bit more explanation

Problem:

Send a friendly error message back to the user in case of any business exceptions

Solution:

In request-response MEP (Message Exchange Pattern), the service provider has to send a meaningful error message back to the service consumer when a business exception occurs. The suggested way of dealing with this scenario is to use Exception based filtering (payload-type-filter) in Mule.

MEP: Request-Response

Flow: Idea is to expose the existing BusinessService class over http so that it can receive requests through browser. I have written a NEW BusinessServiceUMO class that acts as a mediator between the requestor and BusinessService. If the BusinessService class throws an exception, BusinessServiceUMO catches it and returns the exception as the NEW payload. I have also configured a payload type filter on the outbound endpoint of BusinessServiceUMO to route the exceptions to UserErrorHandler service. UserErrorHandler service just returns a message contained in the exception and Mule sends this message as a response to the user using the response transformer. All this will be pretty clear if we go through the mule-config.

Usage:

Important things to be noted w.r.t this example:

  • BusinessServiceUMO returing exception itself as the payload
  • synchronous=”true” on all the participating inbound endpoints enabling Request-Response MEP (so that the response is sent back to the caller)
  • use of payload-type-filter
  • use of responseTransformer-refs=”ExceptionToString” on the UserErrorHandler

Components:
BusinessService: Class implementing the business logic. This class can throw two kinds of exceptions: ValidationException and BusinessException. This class has no mule specific logic.

BusinessServiceUMO: receives input events from Mule and delegates the processing to

BusinessService.process(String req) method. This class has the logic specific to Mule. Look at how each of the exceptions are handled. For the payload-type-filter to work, the onEvent method should return the exception as the payload.


public Object onEvent(final Object req)
{
 Object payload = null;

 try
 {
  payload = service.process((String) req);
 }
 catch (BusinessException be)
 {
  payload = be;
 }
 catch (ValidationException ve)
 {
  payload = ve;
 }

 return payload;
}

UserErrorHandler: Important thing to note here is the responseTransformer-refs=”ExceptionToString”. All the transformer does is to return the message contained in the exception. Mule then takes this message and returns back to the browser as response

mule-config:


<spring:beans>
 <spring:bean id="businessService" class="com.aravind.mule.errorhandling.BusinessService" />
 <spring:bean id="businessUMO" class="com.aravind.mule.errorhandling.BusinessServiceUMO">
 <spring:property name="service" ref="businessService"></spring:property>
 </spring:bean>
</spring:beans>

<!-- Global Transformers -->
<custom-transformer name="HttpRequestToString" class="org.mule.example.hello.HttpRequestToNameString" />
<custom-transformer name="ExceptionToString" class="com.aravind.mule.errorhandling.ExceptionToString" />

<model name="my-service-model">
 <service name="BusinessService">
 <inbound>
 <http:inbound-endpoint address="http://localhost:9988"
 transformer-refs="HttpRequestToString" synchronous="true">
 <not-filter>
 <wildcard-filter pattern="/favicon.ico" />
 </not-filter>
 </http:inbound-endpoint>
 </inbound>
 <component>
 <spring-object bean="businessUMO" />
 </component>
 <outbound>
 <filtering-router>
 <vm:outbound-endpoint path="userErrorHandler" />
 <payload-type-filter expectedType="java.lang.Exception" />
 </filtering-router>
 </outbound>
 </service>

<!-- User error handling returns an error message to the end user -->
 <service name="UserErrorHandler">
 <inbound>
 <vm:inbound-endpoint path="userErrorHandler"
 responseTransformer-refs="ExceptionToString" synchronous="true" />
 </inbound>
 </service>
</model>

Dada Harir’s Well in Gujarat, 1880s.

– My Country, My People

Tagged , ,

Installing cURL and SSL on Windows 7

Download and Install

  1. Download and unzip 64-bit cURL with SSL.
  2. Download the latest bundle of Certficate Authority Public Keys from mozilla.org.
  3. Rename this file from cacert.pem to curl-ca-bundle.crt
  4. Make sure both of them are in the PATH environment.

Test: curl -L https://www.google.com

Tagged

Scientific reason on visiting Temples

I’m not into idol worship (and hence refrain from visiting temples as much as I can) and believe that the real essence of visiting a temple is different from the usual explanations (excuses) I hear. I’ve been searching over internet for a while and fortunately found the scientific explanation behind the Temple construction, Prasadam, Teertham etc. which I am copy/pasting verbatim. I fully disclose that the following material is research of someone else which I am (shamelessly) copy/pasting to my blog for my own future reference and for the benefit of others. If someone know the true authenticity of this material, kindly share it with me so that I can give full credit to them.

———————————————————————————————————————————-

There are thousands of temples all over India in different size, shape and locations but not all of them are considered to be built the Vedic way. Generally, a temple should be located at a place where earth’s magnetic wave path passes through densely. It can be in the outskirts of a town/village or city, or in middle of the dwelling place, or on a hilltop. The essence of visiting a temple is discussed here.

Now, these temples are located strategically at a place where the positive energy is abundantly available from the magnetic and electric wave distributions of north/south pole thrust. The main idol is placed in the core center of the temple, known as Garbhagriha or Moolasthanam. In fact, the temple structure is built after the idol has been placed. This Moolasthanam is where earth’s magnetic waves are found to be maximum. We know that there are some copper plates, inscribed with Vedic scripts, buried beneath the Main Idol. What are they really? No, they are not God’s / priests’ flash cards when they forget the shlokas. The copper plate absorbs earth’s magnetic waves and radiates it to the surroundings. Thus a person regularly visiting a temple and walking clockwise around the Main Idol receives the beamed magnetic waves and his body absorbs it. This is a very slow process and a regular visit will let him absorb more of this positive energy. Scientifically, it is the positive energy that we all require to have a healthy life.

Further, the Sanctum is closed on three sides. This increases the effect of all energies. The lamp that is lit radiates heat energy and also provides light inside the sanctum to the priests or poojaris performing the pooja. The ringing of the bells and the chanting of prayers takes a worshipper into trance, thus not letting his mind waver. When done in groups, this helps people forget personal problems for a while and relieve their stress. The fragrance from the flowers, the burning of camphor give out the chemical energy further aiding in a different good aura. The effect of all these energies is supplemented by the positive energy from the idol, the copper plates and utensils in the Moolasthanam / Garbagraham. Theertham, the “holy” water used during the pooja to wash the idol is not
plain water cleaning the dust off an idol. It is a concoction of Cardamom,Karpura (Benzoin), zaffron / saffron, Tulsi (Holy Basil), Clove, etc…Washing the idol is to charge the water with the magnetic radiations thus increasing its medicinal values. Three spoons of this holy water is distributed to devotees. Again, this water is mainly a source of magneto-therapy. Besides, the clove essence protects one from tooth decay, the saffron & *Tulsi* leafs protects one from common cold and cough, cardamom and Pachha Karpuram (benzoin), act as mouth fresheners. It is proved that Theertham is a very good blood purifier, as it is highly energized. Hence it is given as prasadam to the devotees. This way, one can claim to remain healthy by regularly visiting the Temples. This is why our elders used to suggest us to offer prayers at the temple so that you will be cured of many ailments. They were not always superstitious. Yes, in a few cases they did go overboard when due to ignorance they hoped many serious diseases could be cured at temples by deities. When people go to a temple for the Deepaaraadhana, and when the doors open up, the positive energy gushes out onto the persons who are there. The water that is sprinkled onto the assemblages passes on the energy to all. This also explains why men are not allowed to wear shirts at a few temples and women are requested to wear more ornaments during temple visits. It is through these jewels (metal) that positive energy is absorbed by the women. Also, it is a practice to leave newly purchased jewels at an idol’s feet and then wear them with the idol’s blessings. This act is now justified after reading this article. This act of “seeking divine blessings” before using any new article, like books or pens or automobiles may have stemmed from this through mere observation.

Energy lost in a day’s work is regained through a temple visit and one is refreshed slightly. The positive energy that is spread out in the entire temple and especially around where the main idol is placed, are simply absorbed by one’s body and mind. Did you know, every Vaishnava(Vishnu devotees), “must” visit a Vishnu temple twice every day in their location. Our practices are NOT some hard and fast rules framed by 1 man and his followers or God’s words in somebody’s dreams. All the rituals, all the practices are, in reality, well researched, studied and scientifically backed thesis which form the ways of nature to lead a good healthy life.

The scientific and research part of the practices are well camouflaged as “elder’s instructions” or “granny’s teaching’s” which should be obeyed as a mark of respect so as to once again, avoid stress to the mediocre brains.

———————————————————————————————————————————-

My Country, My People

Click this link to visit the 17th century paintings of Ramayana preserved by the British Library. On the opened window, select Ramayana and then select 2nd option of using Silverlight.

Tagged
Follow

Get every new post delivered to your Inbox.

Join 220 other followers