Thursday, June 11, 2015

Setting up Ganglia on Ubuntu nodes


Setting up Ganglia on ubuntu Hadoop cluster:

Dependent on apache2 and php client and php module installation. Therefore install apache
sudo apt-get install apache2 php5 php5_cli libapache2-mod-php5
And visually verify that apache2 is running on http://localhost:80

Master ganglia- chose a low utilization , low consequence node i.e. slave6

sudo apt-get install ganglia-monitor rrdtool gmetad ganglia-webfrontend

get monitor , rrdtools and web ui.

Modify webcontext file and copy to apache
sudo cp /etc/ganglia-webfrontend/apache.conf /etc/apache2/sites-enabled/ganglia.conf
Modify /etc/ganglia/gmond.conf to your specs (we are using unicast configuration). I commented out multicast attributes and replaced default ips with my ganglia master ip:

globals {
  daemonize = yes
  setuid = yes
  user = ganglia
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  deaf = no
  host_dmax = 0 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  send_metadata_interval = 30
}

/* If a cluster attribute is specified, then all gmond hosts are wrapped inside
 * of a  tag.  If you do not specify a cluster tag, then all  will
 * NOT be wrapped inside of a  tag. */
cluster {
  name = "Hadoop Ganglia Monitor" 
  owner = "hduser"     
  latlong = "unspecified"
  url = "unspecified"
}

/* The host section describes attributes of the host, like the location */
host {
  location = "unspecified"
}

/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  host = 10.77.201.104
  port = 8649
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  port = 8649
}


/* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
tcp_accept_channel {
  port = 8649
}
 

Modify /etc/ganglia/gmetd.conf to state the data collecting node for ganglia

data_source "Hadoop Cluster" 10.77.201.104


Starting and stopping:

Restart gmetad
sudo service gmetad restart


Restart ganglia monitor in master node

sudo service ganglia-monitor restart

Ganglia Clusters


Install ganglia monitor on nodes
sudo apt-get install ganglia-monitor

Modify /etc/ganglia/gmond.conf to send data to receiver = datasource

/* If a cluster attribute is specified, then all gmond hosts are wrapped inside
 * of a  tag.  If you do not specify a cluster tag, then all  will
 * NOT be wrapped inside of a  tag. */
cluster {
  name = "Hadoop Ganglia Monitor" 
  owner = "hduser"     
  latlong = "unspecified"
  url = "unspecified"
}

/* The host section describes attributes of the host, like the location */
host {
  location = "unspecified"
}

/* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  host = 10.77.201.104
  port = 8649
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
}

/* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
tcp_accept_channel {
}

Starting and stopping:
sudo service ganglia-monitor start
Hadoop ganglia


Issues:
tried accessing ganglia url
got on html:
There was an error collecting ganglia data (127.0.0.1:8652): fsockopen error: Connection refused.

Problem is a file permissions issue.

Solution:

chown -R nobody:root /var/lib/ganglia/rrds
then restart daemons


MONITORING
send datagram

master node:
ps -ef | grep -v grep | grep gm
Results
ganglia  21116     1  0 12:58 ?        00:00:00 /usr/sbin/gmond --pid-file=/var/run/ganglia-monitor.pid
nobody   21127     1  0 12:58 ?        00:00:01 /usr/sbin/gmetad --pid-file=/var/run/gmetad.pid


network

sudo netstat -plane | egrep 'gmon|gme'
results:
tcp        0      0 0.0.0.0:8649            0.0.0.0:*               LISTEN      999        87732663    21116/gmond     
tcp        0      0 0.0.0.0:8651            0.0.0.0:*               LISTEN      65534      87729072    21127/gmetad    
tcp        0      0 0.0.0.0:8652            0.0.0.0:*               LISTEN      65534      87729073    21127/gmetad    
udp        0      0 0.0.0.0:8649            0.0.0.0:*                           999        87732662    21116/gmond     
udp        0      0 192.168.179.103:60243   192.168.179.103:8649    ESTABLISHED 999        87732666    21116/gmond     
unix  3      [ ]         STREAM     CONNECTED     87785728 21116/gmond         
unix  3      [ ]         STREAM     CONNECTED     83863311 21127/gmetad        

Monday, June 8, 2015

Managing a maven artifact repository with Artifactory

Planning

The assumption is that you know how to use maven at an elementary level. You should have an idea of how to deploy web applications using tomcat or other JEE app server. Our examples use Subversion as the source code version management tool. You should have some familiarity with branching.

Our goals:
  1. Reduce the disk space usage in our source code repository such as Subversion.
    1. Set up a local repository to manage all third party libraries.  
  2. Standardize building, testing and releasing workproduct.
    1. scripts to provide a stable methodology for building , testing and releasing.
  3. A secure environment to execute 1) and 2).
    1. Artifactory is a secure external repository management tool.
    2. Maven has plugins for sourcecode  version control ; third-party repositories and ftp plugins; build plugin including ANT plugin; JUnit plugins for testing; and release plugins.

Infrastructure management is still required to completely close all security holes:
  • A shared local repository (artifactory) should be located in an accessible server. It should be accessible to development; intergration; testing and production build environments. Different organizations may have higher security requirements than others, but generally local/intranet readonly accessibility is sufficient.  
  • Work product should be separated out into separate pom files. See section III.

I. Setting up maven



II. Setting up Artifactory



2. Install in JBOSS 6 didn’t work.
Installed in tomcat 5 using JDK6.
Set global variables for artifactory and jdk 6
     $> cd <install dir>/apache-tomcat-5.5.28/bin/
     $> export JAVA_HOME='/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home'
     $> export JAVA_OPTS="$JAVA_OPTS -Dartifactory.home=/Users/kaniu_n/artifactory-2.5.1.1"
     $> ./startup.sh

3. The maven repositories:

Artifactory manages multiple pre-configured remote maven repositories. It also has caching capability i.e. stores downloaded libraries locally temporarily to reduce IO. Artifactory administration allows for adding new or modifying existing repositories: local or remote.
Therefore, Artifactory becomes your library ‘repository’ instead of the many remote repositories.

4. pom.xml example:

<project xmlns="http://maven.apache.org/POM/4.0.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                            http://maven.apache.org/maven-v4_0_0.xsd">
   <modelVersion>4.0.0</modelVersion>

   <groupId>com.ritho.maven</groupId>
   <artifactId>simple-parent</artifactId>
   <packaging>pom</packaging>
   <version>1.0</version>
   <name> Parent Project</name>

    <url>http://maven.ritho.com</url>

    <repositories>
        <repository>
            <id>central</id>
            <url>http://localhost:8080/artifactory/repo</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>snapshots</id>
            <url>http://localhost:8080/artifactory/repo</url>
            <releases>
                <enabled>false</enabled>
            </releases>
        </repository>
    </repositories>
    <pluginRepositories>
        <pluginRepository>
            <id>central</id>
            <url>http://localhost:8080/artifactory/repo</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </pluginRepository>
        <pluginRepository>
            <id>snapshots</id>
            <url>http://localhost:8080/artifactory/repo</url>
            <releases>
                <enabled>false</enabled>
            </releases>
        </pluginRepository>
    </pluginRepositories>
        ..
        ..
        ..
 
   
Note that the urls point to the deployed Artifactory instance.

III. Build, test and Release Management using Maven.


Each project should have a pom. Related projects should have a parent pom file and a separate release pom file. In the following example, I have 4 eclipse projects:
    -Exam/ has dependencies on DataObjects/
    -ParentProject/ contains the main pom for all projects.
    -Release/ is the end point for all packaged entities.
Here’s is an eclipse project layout:


This example will use implied classpaths for compiling the projects with dependencies.  This is done by referencing the parent project which is ‘aware’ of all the relevant modules in the workspace. The goal is to build, test or package from the parent pom.xml file. <module> element allows mvn to decipher project dependencies and then create the compilation sequence of all artifacts. By always executing mvn process from the  parent pom, you are ensuring that all changes in subsequent projects are reflected in the final artifact of the compilation sequence.

Caveates:
All projects will be compiled using the parent JDK. Therefore all projects have to be up to date with current JDK, i.e. some JDK5 warnings may not compile in JDK6.
Building does not auto-clean the build directories therefore do a ‘clean’ before rebuilding/testing/packaging i.e:
>mvn clean
Or
>mvn clean package



Here are my examples:
1. Parent pom

<project xmlns="http://maven.apache.org/POM/4.0.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                            http://maven.apache.org/maven-v4_0_0.xsd">
      <modelVersion>4.0.0</modelVersion>
       <groupId>com.ritho.maven</groupId>
       <artifactId>simple-parent</artifactId>
      <packaging>pom</packaging>
       <version>1.0</version>
      <name> Parent Project</name>
       <url>http://maven.ritho.com</url>
    <repositories>
        <repository>
            <id>central</id>
            <url>http://localhost:8080/artifactory/repo</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>snapshots</id>
            <url>http://localhost:8080/artifactory/repo</url>
            <releases>
                <enabled>false</enabled>
            </releases>
        </repository>
    </repositories>
    <pluginRepositories>
        <pluginRepository>
            <id>central</id>
            <url>http://localhost:8080/artifactory/repo</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </pluginRepository>
        <pluginRepository>
            <id>snapshots</id>
            <url>http://localhost:8080/artifactory/repo</url>
            <releases>
                <enabled>false</enabled>
            </releases>
        </pluginRepository>
    </pluginRepositories>
   <modules>
       <module>../Exam</module>
       <module>../DataObjects</module>
   </modules>
   <build>
       <pluginManagement>
           <plugins>
               <plugin>
                   <groupId>org.apache.maven.plugins</groupId>
                   <artifactId>maven-compiler-plugin</artifactId>
                   <configuration>
                       <source>1.5</source>
                       <target>1.5</target>
                   </configuration>
               </plugin>
           </plugins>
       </pluginManagement>
   </build>
    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.5</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>


Some notes: This pom uses JDK5 but you may alter it for JDK6 as below:
<pluginManagement>
           <plugins>
               <plugin>
                   <groupId>org.apache.maven.plugins</groupId>
                   <artifactId>maven-compiler-plugin</artifactId>
                   <configuration>
                       <source>1.6</source>
                       <target>1.6</target>
                   </configuration>
               </plugin>
           </plugins>
       </pluginManagement>

Make sure your maven is running on JDK6 or higher by setting JAVA_HOME; in MacOsX terminal:

$export JAVA_HOME='/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home'
$mvn -e package


2. DataObjects pom
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

<modelVersion>4.0.0</modelVersion>
<groupId>com.dataobjects</groupId>
<artifactId>dataobjects</artifactId>
<parent>
<groupId>com.ritho.maven</groupId>
               <artifactId>simple-parent</artifactId>
               <version>1.0</version>
              <relativePath>../ParentProject</relativePath>
       </parent>
    <packaging>jar</packaging>
    <build>
       <sourceDirectory>src</sourceDirectory>
          <directory>${project.parent.relativePath}/maven/${project.artifactId}</directory>
     </build>
</project>



3. Exam pom
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com</groupId>
    <artifactId>exam</artifactId>
    <parent>
              <groupId>com.ritho.maven</groupId>
               <artifactId>simple-parent</artifactId>
               <version>1.0</version>
               <relativePath>../ParentProject</relativePath>
      </parent>
    <packaging>jar</packaging>
    <dependencies>
        <dependency>
            <groupId>com.dataobjects</groupId>
            <artifactId>dataobjects</artifactId>
            <version>${project.version}</version>
        </dependency>
       </dependencies>
    <build>
       <sourceDirectory>src</sourceDirectory>
       <testSourceDirectory>test</testSourceDirectory>
      <directory>${project.parent.relativePath}/maven/${project.artifactId}</directory>
     </build>
</project>



4. Distribution pom
This pom can be the parent pom with additional plugins enabled i.e for reporting, distribution management, remote connection.
i. Reporting allows you to autogenerate html that you may use to describe the distribution product of the pom file. Example:
<reporting>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-report-plugin</artifactId>
                <version>2.12</version>
                <configuration>    <outputDirectory>${project.build.directory}/${project.version}/surefire</outputDirectory>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-site-plugin</artifactId>
                <version>3.0</version>
                <configuration>
                <outputDirectory>${project.build.directory}/${project.version}/site</outputDirectory>
                </configuration>
            </plugin>
        </plugins>
</reporting>

Note structure of the project below containing site and site.apt directories:
site.xml configures the template and its look and feel.
index.apt configures the content using Almost Plain Text notation.
site.xml
<?xml version="1.0" encoding="ISO-8859-1"?>

<project name="Maven" xmlns="http://maven.apache.org/DECORATION/1.0.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/DECORATION/1.0.0 http://maven.apache.org/xsd/decoration-1.0.0.xsd">  
   <bannerLeft>
       <name>Ritho HBase Utility for V-1.0.+</name>
       <src>https://sites.google.com/site/rithotechcorp/_/rsrc/1332167517940/config/customLogo.gif?revision=3</src>
       <href>http://maven.ritho.com/</href>
   </bannerLeft>
   <skin>
       <groupId>org.apache.maven.skins</groupId>
       <artifactId>maven-fluido-skin</artifactId>
       <version>1.2.1</version>
   </skin>
   <custom>
       <fluidoSkin>
           <sideBarEnabled>true</sideBarEnabled>
           <googlePlusOne />
       </fluidoSkin>
   </custom>
   <body>
       <links>
           <item name="Labs @ Ritho" href="http://labs.ritho.com/"/>
       </links>
       <menu name="Sources and Libraries">
          <item name="Download Directory" href="../../opensource/com/ritho/hbase/util//rithohbaseutility/" />
       <item name="Maven Descriptor" href="../../opensource/com/ritho/hbase/util/rithohbaseutility/maven-metadata.xml" />
       </menu>
       <menu ref="reports"/>
   </body>
</project>


site.apt
 <<Ritho HBase Utility:>>
 
 Ritho HBase Utility library is a utility for interacting with HBase versions 1.0.+ .

<<Benefits:>>
 This utility simplifies back-end calls to only a few lines of code reducing  developer mistakes. Some level of connection management \
 can be handled for you but the most significant advantage is simplified syntax.
 
<<Usage:>>
 This testcase outlines the most common way to use this utility. Each testcase is self explanatory. You would need to set up \
 configuration for each testcases:

----
package test
class test{
    //some code ...
}
----

APT tool consumes this to produce decorated html encapsulating your text.

ii. distribution management and remote connection
This setting points to a remote/or local ftp site where you can submit your product. Here you can specify the site and repository locations.

    <distributionManagement>
        <site>
            <id>ritho-ftp-repository</id>
            <url>ftp://ritho.com/downloads/site</url>
        </site>
        <repository>
            <id>ritho-ftp-repository</id>
            <url>ftp://ritho.com/downloads/opensource</url>
        </repository>
    </distributionManagement>

Distribution management requires an extention. Here is an ftp wagon example:
    <build>
        <extensions>
            <!-- Enabling the use of FTP -->
            <extension>
                <groupId>org.apache.maven.wagon</groupId>
                <artifactId>wagon-ftp</artifactId>
                <version>1.0-beta-6</version>
            </extension>
        </extensions>
        ..
        ..

The assumption is that your remote ftp server requires authentication. Note that distribution management identifies the target remote repository (ritho-ftp-repository). settings.xml will contain a username and password for this repository. setting.xml resides in the local/execution environment at ~/m2/settings.xml
<settings>
 ...
 <servers>
   <server>
     <id>ritho-ftp-repository</id>
     <username>username</username>
     <password>password</password>
   </server>
 </servers>
 ...
</settings>

References: