Wednesday 31 December 2014

Taking Backup of MySQL database

MySQL is a very popular and widely used open source database. Most of the software engineers might have used mysql database atleast once. Keeping a backup of a database or migrating the tables/databases from one database server to another is a very important task. In mysql this task can be performed in a very simple way.
MySQL is providing a commandline utility to perform this operation. The command is mysqldump.
For getting more details about mysqldump, type the following command
>mysqldump --help

For taking the backup of a complete database, type the following command.
>mysqldump --databases [DB1 Name] ....[DBn Name] -u <username> -h <hostname> -p

This will print the entire database dump to the console. For storing the output of this in a file, do the following operation.
 >mysqldump --databases [DB1 Name] ....[DBn Name] -u <username> -h <hostname> -p >> dump.sql

Here the contents will be written to a file locally.

If you want to take the back up of all the contents in a mysql server, we can back up all databases.
The command to perform this operation is given below.
>mysqldump --all-databases -u <username> -h <hostname> -p >> dump.sql

In the above commands, <username> is the mysql username and <hostname> is the hostname or ip address of the machine where the mysql server is running.

Now your database is backed up. The next step is how to load this dump file to another mysql instance.

This is a very simple task.
  • Copy the dump file to the machine where you are planning to perform the following operations
  • Login to mysql instance with valid credentials
  • In the mysql prompt, type the following command.
mysql> source <path to the dump file>

This will load the contents inside the dump file to the new mysql instance.
If you want to know more about the mysql dump file, open the file in a text file editor and read the contents. It contents the DDLs and DMLs that recreates your databases and tables.

Changing the Hostname of Redhat and CentOS machines

Sometimes, we need to change the hostname of machines. Hostname is just like our name which is a name assigned to machines. For changing the hostname of a machine, we need root access.
We can change the hostname through several ways. One of the method to change the hostname is listed below.

In linux OS, almost every configuration are present inside some files.
The hostname is configured in the following file

If we change the value in this file with some other name, the hostname will be changed.

So for changing hostname of a machine, edit the /proc/sys/kernel/hostname file and add the desired hostname.

Linux commands to get IP address and Hostname of a machine

Knowing the IP address and Host name of a machine are some very common requirements.

The command to get the IP address of a linux machine is
eth0      Link encap:Ethernet  HWaddr 0A:90:74:4A:62:42
          inet addr:  Bcast:  Mask:
          inet6 addr: fe80::890:74ff:fe4a:6242/64 Scope:Link
          RX packets:13139146 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12665241 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:10433927585 (9.7 GiB)  TX bytes:16516355937 (15.3 GiB)

lo        Link encap:Local Loopback
          inet addr:  Mask:
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:40162017 errors:0 dropped:0 overruns:0 frame:0
          TX packets:40162017 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

          RX bytes:6184024837 (5.7 GiB)  TX bytes:6184024837 (5.7 GiB)

The command to get the hostname of a linux machine is

Python code to find the md5 checksum of a file

Checksum calculation is an unavoidable and very important step in places where we transfer files/data. The simplest way to ensure whether a file reached the destination properly or not is by comparing the checksum of source and target files. Checksum can be calculated in several ways. One is by calculating the checksum by keeping the entire file as a single block. Another way is multipart checksum calculation, where we calculate the checksum of multiple small chunks in the file and finally calculating the aggregated checksum.
Here I am explaining about the calculation of checksum of a file using the simplest way. I am using the hashlib library in python for calculating the checksum.
Suppose I have a zip file located in the location /home/coder/ The checksum of the file can be calculated as follows.

import hashlib
file_name = ‘/home/coder/’
checksum = hashlib.md5(open(file_name).read()).hexdigest()
print checksum

One common mistake I have seen among people is passing the file name directly without opening the file
Eg: hashlib.md5(file_name).hexdigest()

This will also return a checksum. But it will be calculating the checksum of the file name, not the checksum calculated based on the contents of the file.  So always use the checksum calculation as follows


This will return the exact checksum. 

In linux, you can calculate the md5sum using a commandline utility also.
> md5sum file_name

Python Code Snippet to get the hostname and IP address of a machine

The following python code snippet will return the IP address of a machine

import socket
ip_address = socket.gethostbyname(socket.gethostname())
print ip_address
host_name = socket.getfqdn()
print host_name

The IP address may not be proper in all the cases. If the /etc/hosts file of the unix machine contains entries with hostname mapped to, will return as IP address. So for ensuring proper working, it will be better to use the fully qualified hostname method. 

Saturday 29 November 2014

Python program to check whether a number is Odd or Even

This is a very basic program in python for checking whether a given number is odd or even.

__author__ = 'coder'

def OddEven(number):
        remainder = number % 2
        if number == 0:
            print "Number is Zero"
        elif remainder == 0:
            print "Number is Even"
            print "Number is Odd"
        print "Error while processing"

if __name__ == '__main__':

Wednesday 19 November 2014

Hadoop Interview Questions

1) What is the name of Hadoop's file system .?

2) What is the full form of HDFS.?
Ans: Hadoop Distributed File System

3) What is the Processing Layer of Hadoop. ?
Ans: Mapreduce

4) Hadoop framework is written in which language .?
Ans: Java

5) What is the licencing cost for hadoop.?
Ans: Hadoop is an opensource technology. So it is free.

6) Who is known as father of Hadoop.?
Ans: Doug Cutting

7) How Hadoop differs from other data processing technologies..?
Ans: Hadoop is a framework which is having distributed storage as well as a distributed processing layer. The basic idea behind hadoop is to bring down the processing layer down to storage. Hadoop is a horizontally scaling framework So high end server grade hardware is not required. Only commodity hardware is required.

8) Is hadoop good for real time processing.?
Ans: Directly No. Hadoop is a batch processing framework. So it can't be used for real time processing. But it can work along with other technologies to produce real time outputs.

9) Is hadoop a replacement for RDBMS..?
Ans: Hadoop is not suitable for processing small or medium amount of data. Since hadoop is a batch processing framework, hadoop will not provide faster output. What hadoop guarantees is that, it will never fail with large data. In case of large data, which the other data processing technologies can't process, hadoop will perform well

10) If hadoop is open source and free, who is maintaining it and enhancing it.?
Ans: Hadoop is an Apache project, people all over the world are contributing and adding more enhancements to it. Lot of companies are also using hadoop, they are also contributing to hadoop.

11) Why hadoop became very popular.?
Ans: Analyzing hidden insight from data became a very important part of almost every organisation now. The correctness of the insights will be more as the size of the data is more. Now a days the usage of internet and social media is very high. So if we collect that data alone, we can analyse people upto some extent. Similar to this, we can analyse anything and everything using the history data. This is one reason. Similarly real time  monitoring and decision making also became very important now. This is another factor. If we go for a tool / product with licence, the licensing cost itself will be very high. Hadoop is opensource and free. Hadoop runs on commodity hardware, so the cost of the Infrastructure is also less. This made hadoop a hot cake in the market.

12) What do you mean by a pseudo distributed hadoop cluster.?
Ans If all the daemons of the hadoop are running in a single node, it is called pseudo distributed mode. This is not used for production. This is just for development and learning purpose.

Creating Random File of any size in linux

Sometimes we require some random file of some specific size for testing some performance such as file transfer. There are several ways to create such files.

Method 1
If you just want a file with some specific size, you can use the following command.

dd if=/dev/urandom of=dummyfile.txt bs=2G count=1

The bs is the block size, 2G means 2 GB. If you make the count as 2, then 2 GB *2 = 4 GB random file will be generated.

Method 2
If you are concerned about the schema, you can generate the data using a simple approach.
First create a file with a sample data set of 1 or 2 records in a file. Let us call that file name as A.txt

Then using a simple shell script, we can make it very big.

Depending upon the size requirement, you can increase the value of limit to any number.

Method 3
Use any online / offline data generation tools. This will be required only if you need random data with some specific schema and discrete values.
Some useful links are listed below


Downloading a file from linux command line

While dealing with linux, most of the times, it is required to download some files. If the linux is not having a GUI, most of the people will download file directly in windows and transfer it to Linux environment.
This is not required if you have internet access in your Linux machine. The super cool linux is providing us a lot of features, the only thing is we have to learn it and use it properly.
Get the proper download URL and execute the following command.

wget  <download url>

Eg: If you want to download tomcat, the command will be as shown below.


Various File Editors in Linux

When we say about linux, most of the people will think about a black command line. Operating with that black command line is not that much difficult as most of the people think. The basic operation that most of us did in the linux command line is creating a file or editing a file. This is required for all configuration files. For this we require a file editor. Here I am listing down some popular file editors in linux.

1) vi
2) vim
3) nano
4) gedit  (This is a desktop editor)
5) gvim
6) emacs

Setting Java home in Linux and Windows Environments

Java is a very popular programming language. Most of the people in the world are using java directly or indirectly. Most of the times we need to set JAVA_HOME environment variable. This is a very basic activity. But just posting here, because it may help someone. When I started my career, I also searched in internet for the same.

Setting up JAVA_HOME in LINUX environments

1) First we have to install java. Java can be downloaded from oracle website. Based on your operating system        architecture( 32 or 64 bit), and requirement, download the proper java installable.

2) If it is a tarball, extract the tar ball and keep the folder in /opt directory
3) Go to the home folder of java and type pwd. Suppose the result is /opt/jdk1.7.0
4) Open the following file and add the entries as  below.

For ubuntu:
open /etc/bash.bashrc file or ~/.bashrc file

For CentOS and Redhat:
open /etc/bashrc or ~/.bashrc file

Add the following lines to the file

export JAVA_HOME=<path to java home directory>
export PATH=$JAVA_HOME/bin:$PATH

Save the file and exit.
Then refresh the file using the command
source /etc/bashrc or source ~/.bashrc

Setting up JAVA_HOME in Windows environments

1) Download the JDK from Oracle website and install it in your machine
2) Click on the start button
3) Right click on My Computer
4) Click on Advanced System settings
5) Click on Environment variables
6) Depending on the use, click on New in the System Variables or User Variables. The scope of user variables is limited to the particular user account, where as of the System Variables will be accessible to all the users.
7) Click on New and Give variable Name as JAVA_HOME and variable value as the complete path to java installation folder in your windows machine.
      Eg: C:\Program Files\Java\jdk1.7.0_60
8) Edit the Path and append the following entry to the end.
      Don't forget to put  a semicolon delimiter between the existing values and newly added value.

Now your JAVA_HOME is set.. :)

How to check the memory utilization of cluster nodes in a Kubernetes Cluster ?

 The memory and CPU utilization of a Kubernetes cluster can be checked by using the following command. kubectl top nodes The above command...