Wednesday 31 December 2014

Python code to find the md5 checksum of a file

Checksum calculation is an unavoidable and very important step in places where we transfer files/data. The simplest way to ensure whether a file reached the destination properly or not is by comparing the checksum of source and target files. Checksum can be calculated in several ways. One is by calculating the checksum by keeping the entire file as a single block. Another way is multipart checksum calculation, where we calculate the checksum of multiple small chunks in the file and finally calculating the aggregated checksum.
Here I am explaining about the calculation of checksum of a file using the simplest way. I am using the hashlib library in python for calculating the checksum.
Suppose I have a zip file located in the location /home/coder/data.zip. The checksum of the file can be calculated as follows.

import hashlib
file_name = ‘/home/coder/data.zip’
checksum = hashlib.md5(open(file_name).read()).hexdigest()
print checksum

One common mistake I have seen among people is passing the file name directly without opening the file
Eg: hashlib.md5(file_name).hexdigest()

This will also return a checksum. But it will be calculating the checksum of the file name, not the checksum calculated based on the contents of the file.  So always use the checksum calculation as follows

hashlib.md5(open(file_name).read()).hexdigest()

This will return the exact checksum. 

In linux, you can calculate the md5sum using a commandline utility also.
> md5sum file_name

No comments:

Post a Comment

How to check the memory utilization of cluster nodes in a Kubernetes Cluster ?

 The memory and CPU utilization of a Kubernetes cluster can be checked by using the following command. kubectl top nodes The above command...