Bar Plot Using Python MatplotLib
In my previous blog I explained about making line plots using matplotlib. In this post, I am covering bar plot using python matplotlib. I am using a sample hr-analytics dataset for demonstration. You can download it online from Kaggle(https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset#WA_Fn-UseC_-HR-Employee-Attrition.csv).
First lets import all the libraries we will use.
import pandas as pd import numpy as np from pandas import DataFrame as df import matplotlib.pyplot as plt from matplotlib import rcParams import seaborn as sb
To draw a bar chart you have to use the function matplotlib.pyplot.bar(). You have to pass at least two arguments- the first being the data to be shown along x-axis and the second argument would be the data to be shown along y-axis. In addition to that you can provide optional arguments like color of bar faces. Color can be a string or a list of strings. You can call the pyplot.grid() function if you want visible grid lines. To show a title on your graph you can use pyplot.title(). In the end you can call pyplot.show() to display your graph. Make sure you call pyplot.show() at the end, otherwise the functions you have used to customize your graph might not work.
plt.bar(["Juicer","Blender","Toaster","Oven","Speaker"],[12,21,53,35,81],color=['Red','Blue','Green','DodgerBlue','tomato']) plt.grid(True) plt.title("Units Sold Last Month") plt.show()
This will produce the following output:
If you need a horizontal bars you can barh() instead of bar().
plt.barh(["Juicer","Blender","Toaster","Oven","Speaker"],[12,21,53,35,81],color=['Red','Blue','Green','DodgerBlue','Tomato']) plt.grid(True) plt.title("Units Sold Last Month") plt.show()
The above code will generate something like this:
Now lets use our dataset to get some useful plots.
data = pd.read_excel('hrData.xls') data.head()
Below is a snapshot of the data. Although there are a lot of columns I have shown only a few.
To show employee strength in various departments, first you have to calculate the number of employees in each department, insert them into a list and finally plot them in a graph.
dplist=data.EmpDepartment.unique() y_data= for a in dplist: y_data.append(len(data[data.EmpDepartment==a])) x_data=np.arange(len(dplist)) plt.bar(x_data,y_data,color="green") plt.xlabel('Department',fontsize=10) plt.xticks(x_data,dplist,rotation=75,fontsize=10) plt.title("Employee Strength in varioius Departments") plt.grid(True) plt.show()
This will generate something like this:
You can also use the plot function available in dataframe objects to do the same. You have to use group by clause. The advantage here is, you don’t have to write any code for calculations.
ax = (pd.DataFrame(data.groupby('EmpDepartment').EmpNumber.count())).plot(kind='bar', grid=True, legend=False, figsize=[10,5], color='#339933', title='Employee Strength in varioius Departments') ax.set_ylabel('Number of Employees') plt.show()
Lets see another example where I have utilized group by clause to see number of employees with different ratings across various departments.
ax = (pd.DataFrame(data.groupby(['EmpDepartment', 'PerformanceRating']).EmpNumber.count())).plot(kind='bar', grid=True, legend=False, figsize=[10,5], color='#339933', title='Number of Employees Rated 2,3,4 Across Departments') ax.set_ylabel('Number of Employees') plt.show()
This will produce the following plot: