Home

Hands-on Pandas(8): Cleaning Data

In an ideal world, all the data you need is available in the right format and with complete content. In the real world, you will probably need to scrape data from lots of different and incomplete sources. That’s why it’s important to learn how to clean your data before analyzing it or feeding it into a ML algorithm. Data cleaning might not the...

Read more

Hands-on Pandas(7): Loading data from files

Data analysis usually starts by loading data into the structures of your library/tools of choice. Almost always this data will either come from a database, the web, or a collection of files. The files that contain your data can come in many different formats: Comma-separated values in a text file, JSON files, excel files, or files with values s...

Read more

Hands-on Pandas(6): Descriptive Statistics

Pandas provides many options for calculating descriptive statistics and other reduction operations with just a simple function call. You might want to calculate these values as part of a ML/Data Analysis pipeline, or just because you want to get a better understanding of the data you are dealing with. Most of these operations are similar to Num...

Read more

Hands-on Pandas(5): Mapping, apply and applymap

In this article, we will learn about mapping and the apply and applymap functions. This technique will help you manipulate your data in very convenient ways, and is another important addition to your toolbox. As always, we will explore the topic with examples that will help you understand what’s going on. Great, let’s get started! Mapping M...

Read more

Hands-on Pandas(4): Arithmetics with DataFrames and Series

Arithmetic operations are some of the most fundamental (and important) things you can do with series and dataframes. In this article, we will learn how to perform basic operations using both series and dataframes. We are interested in the following scenarios: Operations between series with the same index. Operations between dataframes wit...

Read more

Hands-on Pandas(3): Reindexing and Deletion

Today we will deal with two techniques we need to cover before moving to more advanced Pandas topics: Reindexing and element deletion. It will be a bit shorter than the first two articles in the series, but that doesn’t mean it’s not important. Both techniques are very useful, and you will probably use them in your day-to-day work if you become...

Read more

Hands-on Pandas(2): Selection, Filtering, loc and iloc

In the last article, we learned about the two basic pandas data structures: Series and DataFrames. We also built a couple of them on our own and learned the basics of indexing and selection. Today we will learn a bit more about selecting and filtering elements from Pandas data structures. This might seem like an incredibly basic topic, but it’s...

Read more

Hands-on Pandas(1): Series and Dataframes

In a previous series we covered the fundamentals of NumPy, now it’s time to deal with another important tool frequently used in data analysis: Pandas. Pandas is a library for data manipulation and analysis that lets you manipulate heterogeneous data in tabular form (in contrast to NumPy, designed to work with homogeneous numerical data in array...

Read more