Broken: October 2008

Have a cluster of 4 machines and am set to install hadoop on it.
Machines:
ds16: master
ds11: slave
ds04: slave
ds14: slave

Am loosely following sangmi's blog and the following
http://sangpall.blogspot.com/2008/09/installing-hadoop.html
http://wiki.apache.org/hadoop/GettingStartedWithHadoop
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

Step 1: chk if you have ssh, rsync and java on all machines
For java
sudo apt-get install sun-java5-jdk

Step 2:
Download hadoop and in conf/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun

Step 3:
now you can just run your standalone operation as it is.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*

[One problem faced was - when I did ssh to my machines, I get output

id: cannot find name for group ID 521

So I added a group with id 521 that solved the problem
sudo addgroup -gid 521 test

Also disabled stricthostchecking in /etc.ssh/ssh_config -- causing problem since ~ is mounted on nfs]
Till now we have run hadoop on single node. Lets move to a cluster now.

Cluster Operations:
Step 4:
Configure conf/hadoop-site.xml on the namenode as follows



<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>fs.default.name</name>
  <value>hdfs://ds16:54310</value>
</property>
<property>
  <name>mapred.job.tracker</name>
  <value>hdfs://ds16:54311</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>8</value>
</property>
<property>
  <name>dfs.name.dir</name>
  <value>/export/pathaka/hadoop/dfs</value>
</property>
<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx512m</value>
</property>
</configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/export/pathaka/hadoop/tmp/</value>
  <description>A base for other temporary directories.</description>
</property>

Edit conf/masters and conf/slaves on ds16 as shown



$ cat conf/masters
ds16

$ cat conf/slaves 
ds16
ds04
ds11
ds14

[Used http://www.palfrader.org/code2html/code2html.html for converting code to html .. its nice]

[Make sure there are no leading spaces in the file -- gives a weird XML exception
[Fatal Error] hadoop-site.xml:2:6: The processing instruction target matching "[xX][mM][lL]" is not allowed.]

Step 5:

$mkdir /export/pathaka/hadoop/dfs
# This is the dfs file tree

$ mkdir /export/pathaka/hadoop/tmp

On master - ds16 run
$ bin/hadoop namenode -format
# This formats the dfs folder
#Do it only the first time and on master only

On ds16 - master, run
$ bin/start-dfs.sh
$ bin/start-mapred.sh
# master and jobtracker are the same machine

Stopping
$ bin/stop-mapred.sh
$ bin/stop-dfs.sh

Broken

Saturday, October 18, 2008

Compiling kernel > 2.6.13

Monday, October 13, 2008

Installing Hadoop on Ubuntu