How to Work with Pandas in Python

When we talk about data science, we usually refer to the data analysis through summarization, visualizations, sophisticated algorithms that learn patterns in data (machine learning), and other fancy tools. When we discuss the term with software developers, we also hear a lot of Python, the popular programming language.
But why is Python so popular and special in the data science world? There are many reasons, and an important one is the Python ecosystem and libraries that make data science seem natural to Python.
One of these libraries is pandas , which every data science in the world uses, used, or at least heard of (if you are a data scientist who never used pandas, scream in comments).
Pandas is an essential part of the ecosystem that many other data science tools build on top or provide specific functionalities for pandas.
This guide introduces pandas for developers and aims to cover the what, why, and how of pandas’ most commonly used features.
Before we get started, if you want to access the full source code for this project to follow along, you can download the project’s source code from GitHub .
What Is Pandas For?
Pandas, in the past couple of years, have established itself as one of the best libraries for data analysis. It is such the importance and power of pandas that one of the first things data scientists do when receiving a data set is to load it into pandas to begin understanding it.
Pandas provides them with anything they need for cleaning, transforming, and analyzing data. It gives so many features for analysts that it can make entire books about them.
Here are some of the things you can do with pandas:
- Describe: get information about the data set, calculate statistical values, answer immediate questions like averages, medians, min, max, correlations, distribution, and more.
- Clean: Remove duplicates, replace empty values, filter rows, columns
- Transform: calculate new values, rename columns, mutate your data
- Visualize: Build state of the art visualizations with matplotlib, seaborn, or others directly from your pandas’ dataset
- Storage: load and save data from and to files like CSV, JSON or connect directly to databases
Install and Import
Pandas is an external library, and as such, requires installation for your project. The library name is pandas
, and you can install it using your favorite package manager for Python. In my case, I’m using pipenv
as I’m not a big fan of conda, but the process applies the same.
If you checked out my code from GitHub, simply run the following command as all dependencies are present on the Pipfile
pipenv install
If you want to follow along with this tutorial from a new project, you can simply install pandas by running:
pipenv install pandas
And I highly recommend using a Jupyter notebook as I did for experimenting and testing on the data, so you will have to install that as well by running:
pipenv install notebook
If you need more information about how to use Jupyter notebooks, you can check this guide .
The last thing before starting is to import the library.
import pandas as pd
It’s not mandatory to provide an alias, but it’s very common to use pd
, it makes it straightforward, and it’s how people do it, so let’s follow along.
Basis of Pandas: Series and DataFrames
Pandas has two basic building blocks for all its functionality: Series
and DataFrames
. In simple terms, a Series
is a column, and DataFrames
is a multidimensional dataset with multiple Series
.

Series and DataFrames
Reading Data
Though you can manually create DataFrames in pandas, which makes it easy to use is its ability to directly load data from multiple sources such as CSV, JSON, or databases.
From a CSV file
Reading CSV files is as simple as one line of code with pandas.
df_titanic = pd.read_csv("titanic.csv")
df_titanic.head()
PassengerId | Survived | Pclass | … | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | … | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | … | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | … | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | … | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | … | 8.0500 | NaN | S |
Importing a CSV file is as simple as calling one function read_csv
with the file name. The second line of code calls the function head
, which prints the first five rows of data. More on that later.
From a JSON file
Reading JSON files is not more complicated than reading a CSV, it’s a matter of a simple function call:
df_json = pd.read_json("sample.json")
df_json.head()
Product | Price | |
---|---|---|
0 | Desktop Computer | 900 |
1 | Tablet | 360 |
2 | iPhone | 700 |
3 | Laptop | 1460 |
From a SQL database
Reading data from a SQL database requires some additional steps compared to reading CSV or JSON files. This is because pandas doesn’t support databases natively but instead relies on third-party libraries to establish the connection.
Once the connection to the database is ready, you can work directly with pandas using the function read_sql_query
.
For our scenario, we will read data from SQLLite, but other database engines are supported, such as MySQL, Postgres, etc.
Note: If you are running this code on your own and not using the tutorial’s provided project, make sure you have pysqlite3
installed before continuing.
As discussed, first, we need to establish a connection to the database.
import sqlite3
con = sqlite3.connect("chinook.db")
Later, we can directly run SQL queries in pandas to retrieve information. If you need help with SQL, I can recommend the complete SQL bootcamp course on Udemy.
In our case, we will just fetch data from one table.
df_sql = pd.read_sql_query("SELECT * FROM albums", con)
df_sql.head()
AlbumId | Title | ArtistId | |
---|---|---|---|
0 | 1 | For Those About To Rock We Salute You | 1 |
1 | 2 | Balls to the Wall | 2 |
2 | 3 | Restless and Wild | 2 |
3 | 4 | Let There Be Rock | 1 |
4 | 5 | Big Ones | 3 |
Customization
So far, we read data by using the default values and works great for most cases. However, there may be circumstances in which you need to change how to read and parse the data. Pandas reading functions are complete and offer tons of customizations. You can read about them on the official docs .
One important and highly used parameter is the index_col
which allows you to specify the column or columns used as the index of the DataFrame. Thing that is very useful when slicing or selecting data, as we will do later on in the guide.
df_drinks = pd.read_csv("data/drinks.csv", index_col="country")
df_drinks.head()
Saving Data in Pandas
Sometimes we perform changes to our datasets to facilitate the data analysis. However, those changes are lost unless we save them. Pandas provides simple methods to store a DataFrame into a file or database.
df.to_csv('saved.csv')
df.to_json('saved.json')
df.to_sql('saevd, con)
Note that we can store the data on any type of file regardless of the format we first loaded.
Describing Your Data
When you first load your data set, you want to understand its structure and start making sense of the data. In this section, we will cover basic pandas functions that will help you exactly on that.
The first function we will cover is the head
function, which we already introduced. This function prints on the screen the first n
rows of data, and it is handy to explore the columns and types of a dataset for the first time.
You can run this function with no arguments, in which case will default to 5 rows:
df_titanic.head()
Or you can pass the number of rows you would like to see:
df_titanic.head(10)
PassengerId | Survived | Pclass | … | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | … | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | … | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | … | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | … | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | … | 8.0500 | NaN | S |
5 | 6 | 0 | 3 | … | 8.4583 | NaN | Q |
6 | 7 | 0 | 1 | … | 51.8625 | E46 | S |
7 | 8 | 0 | 3 | … | 21.0750 | NaN | S |
8 | 9 | 1 | 3 | … | 11.1333 | NaN | S |
9 | 10 | 1 | 2 | … | 30.0708 | NaN | C |
If you want to see the very last rows, you can use the function tail
. As with head
, you can provide how many rows you would like to see:
df_titanic.tail()
PassengerId | Survived | Pclass | … | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|
886 | 887 | 0 | 2 | … | 13.0000 | NaN | S |
887 | 888 | 1 | 1 | … | 30.0000 | B42 | S |
888 | 889 | 0 | 3 | … | 23.4500 | NaN | S |
889 | 890 | 1 | 1 | … | 30.0000 | C148 | C |
890 | 891 | 0 | 3 | … | 7.7500 | NaN | Q |
Another interesting function you probably need to run after loading your data is the info
function, which provides essential information about your datasets, such as the number of rows and columns, the number of non-null values, data types, and memory usage.
df_titanic.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
If you are trying to know just how many columns and rows of your data has, then you can get the shape
attribute of the DataFrame:
df_titanic.shape
(note that shape
is an attribute and not a function)
(891, 12)
If you are looking to understand your dataset’s values better, the function describe
will provide a statistical summary of each column, with information like count, mean, standard deviation, etc.
df_titanic.describe()
PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
count | 891.000000 | 891.000000 | 891.000000 | 714.000000 | 891.000000 | 891.000000 | 891.000000 |
mean | 446.000000 | 0.383838 | 2.308642 | 29.699118 | 0.523008 | 0.381594 | 32.204208 |
std | 257.353842 | 0.486592 | 0.836071 | 14.526497 | 1.102743 | 0.806057 | 49.693429 |
min | 1.000000 | 0.000000 | 1.000000 | 0.420000 | 0.000000 | 0.000000 | 0.000000 |
25% | 223.500000 | 0.000000 | 2.000000 | 20.125000 | 0.000000 | 0.000000 | 7.910400 |
50% | 446.000000 | 0.000000 | 3.000000 | 28.000000 | 0.000000 | 0.000000 | 14.454200 |
75% | 668.500000 | 1.000000 | 3.000000 | 38.000000 | 1.000000 | 0.000000 | 31.000000 |
max | 891.000000 | 1.000000 | 3.000000 | 80.000000 | 8.000000 | 6.000000 | 512.329200 |
Manipulating DataFrames
DataFrames expose hundreds of methods to work on data sets, way too many to know them all. However, some operations frequently occur during data analysis tasks. We will review all those methods that are crucial to understanding for all data scientists.
Removing duplicates
Duplicates can distort our results, so it’s essential to deal with them. Pandas offers some utilities to handle duplicates like drop_duplicates
, which automatically removes all duplicate values from the DataFrame.
df_movies = pd.read_csv(‘movies.csv')
df_movies.shape
df_movies = df_movies.drop_duplicates()
df_movies.shape
Note that for our example, we loaded a new file that contains two duplicates, and here are the results:
(77, 8)
(75, 8)
Keep in mind that drop_duplicates
won’t affect the original DataFrame (by default), and instead, it will return a new DataFrame with unique values. You can alternatively pass the argument inplace=True
to modify the original instead.
Renaming columns
There are several ways to rename the data frame columns in pandas but one of the most useful and easy ways is using the rename
function:
df_tmp = df_titanic.rename(columns = {"PassengerId" : "PassId"})
df_tmp.head()
PassId | Survived | Pclass | … | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | … | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | … | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | … | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | … | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | … | 8.0500 | NaN | S |
Similar to drop_column
, rename
will return a new DataFrame with the updates.
rename
also supports renaming more than one column at a time:
df_tmp = df_titanic.rename(columns = {"PassengerId" : "PassId", "Pclass": "PassClass"})
df_tmp.head()
PassId | Survived | PassClass | … | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | … | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | … | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | … | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | … | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | … | 8.0500 | NaN | S |
Working with missing values
Missing data in a real-life scenario can be a big problem. Fortunately, pandas is design to detect the missing values or NA (Not Available) values. To detect the null values, we use the isnull()
function:
df_titanic.isnull()
PassengerId | Survived | Pclass | … | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|
0 | False | False | False | … | False | False | True | False |
1 | False | False | False | … | False | False | False | False |
2 | False | False | False | … | False | False | True | False |
3 | False | False | False | … | False | False | False | False |
4 | False | False | False | … | False | False | True | False |
.. | … | … | … | … | … | … | … | … |
886 | False | False | False | … | False | False | True | False |
887 | False | False | False | … | False | False | False | False |
888 | False | False | False | … | False | False | True | False |
889 | False | False | False | … | False | False | False | False |
890 | False | False | False | … | False | False | True | False |
The isnull
function returns a new DataFrame with the shape of the original one with boolean values (True or False), indicating whether a particular cell is null or not. This can be very helpful to perform calculations or replace empty values.
Sometimes we can’t work with missing values, so it’s best to simply remove the entire rows. You can achieve that with the function dropna
.
df_titanic.shape
tmp = df_titanic.dropna()
tmp.shape
Which results in:
(891, 12)
(183, 12)
Judging by the results, most of our records contain at least one empty column. It wouldn’t be wise in this scenario to simply drop all these rows from the DataFrame.
Slicing, selecting, extracting data
So far, we have been working with entire datasets or performing basic filtering like removing empty rows. But how can we intentionally select data from the DataFrames?
There are two ways to access or slice a pandas DataFrame, either by column or by rows.
By column
Extracting data by columns on a DataFrame is super easy. You simply use []
with the series (column) name as follows:
ds_survived = df_titanic['Survived']
ds_survived
type(ds_survived)
In which case it would output:
0 0
1 1
2 1
3 1
4 0
..
886 0
887 1
888 0
889 1
890 0
Name: Survived, Length: 891, dtype: int64
pandas.core.series.Series
Note that the result is a DataSeries and not a DataFrame. Sometimes because some methods and properties differ between both, we want to transform a DataSeries into a DataFrame, we do that by using the [[]]
notation:
df_survived = df_titanic[['Survived']]
type(df_survived)
Which outputs:
pandas.core.frame.DataFrame
By row When selecting data by row, we have more options than by columns. We can access a row by indices or perform a search in the dataset by querying or applying conditionals.
We will start with the property loc
, allowing access to a group of rows and columns by label(s) or a boolean array.
Note that if we try to use loc
with labels, you need to have your DataFrame indexed as we did in the case of the drinks DataFrame above.
df_drinks.loc['Argentina']
Additionally, you need to consider that looking by index requires an exact match and it is case sensitive, so in our case, Argentina
is correct but Arg
or argentina
will raise an exception.
The output for the code is:
beer_servings 193
spirit_servings 25
wine_servings 221
total_litres_of_pure_alcohol 8.3
continent SA
Name: Argentina, dtype: object
Interestingly enough, you can also access multiple rows at the same time using the same techniques we do for lists, so for example:
df_drinks.loc['Argentina': 'Austria']
This will return a new DataFrame with all the results between Argentina
and Austria
inclusive.
country | beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | continent |
---|---|---|---|---|---|
Argentina | 193 | 25 | 221 | 8.3 | SA |
Armenia | 21 | 179 | 11 | 3.8 | EU |
Australia | 261 | 72 | 212 | 10.4 | OC |
Austria | 279 | 75 | 191 | 9.7 | EU |
How to filter data with no labels? You can use indices using the property iloc
, which is equivalent to slicing on a Python list.
df_drinks.iloc[6:10]
Which will produce the same result as above, selecting from index 6
(Argentina) until index 10-1
(Austria).
It is also possible to assign an increment, the same way as we do with lists.
df_drinks.iloc[6:10:2]
Sometimes you need to show only the rows that meet some conditions like displaying the countries with more than 320 beer servings. For that you can use conditional selections.
Here is that example in code:
df_drinks.loc[df_drinks['beer_servings'] > 320]
Let’s break it down. df_drinks['beer_servings'] > 320
is a special conditional, as it applies to a complete DataSeries. The result will be a new DataSeries with boolean values representing the result of the conditional for each row in the original DataFrame.
Then, by using df_drinks[]
of a boolean DataSeries, we return a new DataFrame filtering out all the rows with False value on the conditional DataSeries.
I wasn’t too surprised about the results, except for the case of Venezuela and Namibia.
country | beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | continent |
---|---|---|---|---|---|
Czech Republic | 361 | 170 | 134 | 11.8 | EU |
Gabon | 347 | 98 | 59 | 8.9 | AF |
Germany | 346 | 117 | 175 | 11.3 | EU |
Lithuania | 343 | 244 | 56 | 12.9 | EU |
Namibia | 376 | 3 | 1 | 6.8 | AF |
Poland | 343 | 215 | 56 | 10.9 | EU |
Venezuela | 333 | 100 | 3 | 7.7 | SA |
Note: in some other tutorials you may find the same code but leaving .loc
aside, e.g.:
df_drinks[df_drinks['beer_servings'] > 320]
And though in this case, that code works, there is a corner case in which it won’t, so better to be explicit than implicit and use .loc
. You can read more about this on this StackOverflow
thread.
Statistics
Statistics and math are among the most important skills that a data scientist should have in his belt. Fortunately, pandas makes it easy for us to perform statistical calculations on our dataset.
Aggregate Statistics
This section will cover how to perform some statistical calculations on the data set such as the mean, groupby, and count functions.
Starting by using the mean()
function, which will return the mean values of the numeric DataSeries:
df_drinks.mean()
-------------------------
Output
-------------------------
beer_servings 106.160622
spirit_servings 80.994819
wine_servings 49.450777
total_litres_of_pure_alcohol 4.717098
dtype: float64
The mean()
function has returned the mean values of only the numerical columns, making sense since you can’t perform a statistical calculation on a text.
We can also perform this calculation on DataSeries level, in which case we will return a single computed number.
df_drinks["beer_servings"].mean()
-------------------------
Output
-------------------------
106.16062176165804
Let’s say that you want to calculate the mean of some columns based on a specific group so for that goal you have to use a pandas function called groupby()
function.
df_drinks.groupby("continent")["beer_servings"].mean()
-------------------------
Output
-------------------------
continent
AF 61.471698
AS 37.045455
EU 193.777778
OC 89.687500
SA 175.083333
Name: beer_servings, dtype: float64
By using groupby
we can perform calculations grouped by sets of data, for example the mean of each continent.
Same as mean , you can calculate the median , min , max , sum , and much more.
Another popular function is to count the number of rows/columns, which can be done by using count
.
df.count(axis = 0)
-------------------------
Output
-------------------------
beer_servings 193
spirit_servings 193
wine_servings 193
total_litres_of_pure_alcohol 193
continent 170
dtype: int64
By specifying the axis=0 (default value) we count the number of rows per column. However, we can also invert the axis (axis=1) and count the number of columns per row.
df.count(axis = 1)
-------------------------
Output
-------------------------
country
Afghanistan 5
Albania 5
Algeria 5
Andorra 5
Angola 5
..
Venezuela 5
Vietnam 5
Yemen 5
Zambia 5
Zimbabwe 5
Length: 193, dtype: int64
Common functions
This section will discuss some of the most common statistical functions in pandas, a must-have in any data scientist’s arsenal.
Let’s start with the cov function that will calculate the column’s pairwise covariance. See the below code:
df_drinks.cov()
beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | |
---|---|---|---|---|
beer_servings | 10229.927191 | 4096.953962 | 4249.458468 | 318.992031 |
spirit_servings | 4096.953962 | 7794.119765 | 1370.601306 | 218.184985 |
wine_servings | 4249.458468 | 1370.601306 | 6351.707200 | 200.762044 |
total_litres_of_pure_alcohol | 318.992031 | 218.184985 | 200.762044 | 14.237779 |
The cov
functions have calculated the covariance values of the pairwise covariance of all the columns.
Another commonly used pandas function to calculate the change percentage of the value is the pct_change
function. This function will calculate the percentage change of the current values compared to the previous element.
ind = pd.date_range('01/01/2020', periods = 7, freq ='W')
df_tmp = pd.DataFrame({"A":[4, 14, 3, 6, 2, 55, 33],
"B":[5, 2, 54, 34, 2, 32, 56],
"C":[20, 20, 17, 31, 8, 5, 3],
"D":[14, 3, 6, 2, 3, 4, 12]}, index = ind)
df_tmp.pct_change()
A | B | C | D | |
---|---|---|---|---|
2020-01-05 | NaN | NaN | NaN | NaN |
2020-01-12 | 2.500000 | -0.600000 | 0.000000 | -0.785714 |
2020-01-19 | -0.785714 | 26.000000 | -0.150000 | 1.000000 |
2020-01-26 | 1.000000 | -0.370370 | 0.823529 | -0.666667 |
2020-02-02 | -0.666667 | -0.941176 | -0.741935 | 0.500000 |
2020-02-09 | 26.500000 | 15.000000 | -0.375000 | 0.333333 |
2020-02-16 | -0.400000 | 0.750000 | -0.400000 | 2.000000 |
First, we’ve created a data series index using the date_range()
function and a data frame containing just random values for demonstration purposes and finally applied the percentage change function on that data.
Let’s see now how to calculate the relationship of two values in what’s known as the correlation. The used function called corr()
function and it calculates the relationship of all columns in your data frame and doesn’t include the null values, not available values, and non-numeric data such as texts.
df_tmp = pd.DataFrame({"A":[6, 8, 3, 4],
"B":[51, 2, 6, 3],
"C":[7, 2, 9, 5],
"D":[5, 4, 4, 22]})
df_tmp.corr()
A | B | C | D | |
---|---|---|---|---|
A | 1.000000 | 0.167894 | -0.792908 | -0.369624 |
B | 0.167894 | 1.000000 | 0.345789 | -0.300948 |
C | -0.792908 | 0.345789 | 1.000000 | -0.154586 |
D | -0.369624 | -0.300948 | -0.154586 | 1.000000 |
We’ve used the previous random values of the covariance example to calculate the correlation of the columns.
Learning how to rank your data based on a specific standard like ranking the movies dataset based on their rating score is one of the most commonly used things in data science when dealing with datasets. The below example will use the rank function in Pandas:
df_movies['revenue_rank'] = df_movies["Worldwide Gross"].rank()
df_movies.head()
We’ve created a new column called “‘revenue_rank’” with the ranking assigned by the rank
function based on the “Worldwide Gross”.
Plotting
Pandas can’t plot by itself, but it provides helper methods that would use plotting engines such as matplotlib
or seaborn
to perform the task.
For our guide, we will use matplotlib , if you are running the code on your own, make sure that you have the library installed. If you are using the code from GitHub, this is already a project dependency.
Scatter plot
Scatter plot is a diagram that displays points based on two variables and luckily Pandas makes that easy for us to create a scatter plot just by a simple function called plot()
function:
colors = df_titanic["Survived"].replace(1, 'green').replace(0, 'red')
df_titanic.plot(x='Fare', y="Age", c=colors, colormap="Reds", kind="scatter")

Sample scatter plot
The first chart shows the survival status spread by age and fare, painting with red those who did not survive and the others in green. To set the colors is necessary to generate a color map depending on the status. This can be done in a better way which we will analyze later.
The plot
function is doing all the heavy work by rendering the chart.
Line plot
Line chart is a series of values that are connected with a line and it is one of the most used charts. Pandas also able to display this kind of plot very easily. See the following code:
df_titanic.loc[df_titanic["Survived"] == 1].groupby("Age")['Survived'].count().plot()

Sample line plot
A line plot showing the number of survivors by age.
Bar plot
Bar plot is one of the most charts in the data visualization field and essentially represents categorical data with a rectangular bar. Pandas as well offer this kind of chart. See the following code:
df_titanic["Survived"].value_counts().plot(kind="bar")

Sample bar plot
You can also build histograms.
df_titanic.hist(bins=10,figsize=(9,7),grid=False)

Sample histogram plot
Or much more complex charts. Here is an interesting article on building charts for this particular dataset.
Conclusion
Data scientists need first to explore, clean, and transform their data before going to the visualization process. Pandas makes it easy to perform these actions with just some simple commands and can be used to plot the data instead of using the other libraries such as matplotlib and seaborn. Still, you may find yourself in need to learn these visualization libraries for more complex charts.
You can learn more on how to use pandas, plotting and much more on the Data Science Bootcamp on Udemy.
Thanks for reading!
If you liked what you saw, please support my work!

Juan Cruz Martinez
Juan has made it his mission to help aspiring developers unlock their full potential. With over two decades of hands-on programming experience, he understands the challenges and rewards of learning to code. By providing accessible and engaging educational content, Juan has cultivated a community of learners who share their passion for coding. Leveraging his expertise and empathetic teaching approach, Juan has successfully guided countless students on their journey to becoming skilled developers, transforming lives through the power of technology.