Hello There Reader, It's Been a Minute
It’s been a while since I last posted on this blog (almost 4 years!), as life has been demanding a lot of attention. I may be able to start writing with a bit more frequency, or at least that’s what I hope.
So, what have I been up to? Well, a bunch of things.
I started a consulting business!
I finally decided to take the plunge and start my o...
It's Fine, Nobody Can Remember Everything
A couple of days ago I had a conversation with a friend who is learning to program.
We were talking about the difficulty of remembering what each concept means and what every keyword does. The conversation eventually led to this question:
Ok, but when will I stop needing the docs?
I (probably most people) had the same feeling when I was learn...
On Abstraction and Coupling
This article is about the second group of concepts I wanted to talk about after re-reading Clean Architecture.
I want to try something different this time: Instead of elaborating each idea in long, continuous prose, I’ll just list them as separate chunks.
So, here it goes:
We already know that tight coupling is bad. It binds together softw...
On Shape and Behavior
I recently started re-reading Bob Martin’s Clean Architecture and found two other ideas I wanted to share. One of them (the topic of this article) is the dual nature of the way software developers provide value through code.
When you implement (or modify) a feature in your system you are creating value by altering or expanding its behavior. Mos...
Domain-Driven Design
This article is a summary of what I consider to be the most important concepts of the book Domain-Driven Design, by Eric Evans. I tried to condense the most important ideas in a single article for anyone interested in the topic. I attempted to pack in as much information as possible, but it was not an easy task: The book is a very condensed work...
Hands-on Pandas(11): The apply function
We have already covered most of the fundamentals of working with data using the Pandas library. There is one more topic I’d like to discuss before concluding the series: The Apply function.
In the previous article, we learned how to create subgroups of data using the groupby function. This is quite useful when you want to gain a better understa...
Hands-on Pandas(10): Group Operations using groupby
Sometimes you need to perform operations on subsets of data. Your rows might have attributes in common or somehow form logical groups based on other properties. Common operations like finding the average, maximum, count, or standard deviation of values from groups of data is a really common task, and Pandas makes this really easy to accomplish.
...
Hands-on Pandas(9): Merging Dataframes
Merge/join operations in Pandas let you gather information from many tables into a single dataframe for further processing or analysis. This is another important skill that you will probably use a lot when working with data.
If you have some experience with relational databases you can recognize the analogous behavior with table joins. In this ...
Hands-on Pandas(8): Cleaning Data
In an ideal world, all the data you need is available in the right format and with complete content.
In the real world, you will probably need to scrape data from lots of different and incomplete sources. That’s why it’s important to learn how to clean your data before analyzing it or feeding it into a ML algorithm.
Data cleaning might not the...
Hands-on Pandas(7): Loading data from files
Data analysis usually starts by loading data into the structures of your library/tools of choice. Almost always this data will either come from a database, the web, or a collection of files.
The files that contain your data can come in many different formats: Comma-separated values in a text file, JSON files, excel files, or files with values s...
Hands-on Pandas(6): Descriptive Statistics
Pandas provides many options for calculating descriptive statistics and other reduction operations with just a simple function call. You might want to calculate these values as part of a ML/Data Analysis pipeline, or just because you want to get a better understanding of the data you are dealing with.
Most of these operations are similar to Num...
Hands-on Pandas(5): Mapping, apply and applymap
In this article, we will learn about mapping and the apply and applymap functions.
This technique will help you manipulate your data in very convenient ways, and is another important addition to your toolbox.
As always, we will explore the topic with examples that will help you understand what’s going on.
Great, let’s get started!
Mapping
M...
Hands-on Pandas(4): Arithmetics with DataFrames and Series
Arithmetic operations are some of the most fundamental (and important) things you can do with series and dataframes. In this article, we will learn how to perform basic operations using both series and dataframes.
We are interested in the following scenarios:
Operations between series with the same index.
Operations between dataframes wit...
Hands-on Pandas(3): Reindexing and Deletion
Today we will deal with two techniques we need to cover before moving to more advanced Pandas topics: Reindexing and element deletion.
It will be a bit shorter than the first two articles in the series, but that doesn’t mean it’s not important. Both techniques are very useful, and you will probably use them in your day-to-day work if you become...
Hands-on Pandas(2): Selection, Filtering, loc and iloc
In the last article, we learned about the two basic pandas data structures: Series and DataFrames. We also built a couple of them on our own and learned the basics of indexing and selection.
Today we will learn a bit more about selecting and filtering elements from Pandas data structures. This might seem like an incredibly basic topic, but it’s...
Hands-on Pandas(1): Series and Dataframes
In a previous series we covered the fundamentals of NumPy, now it’s time to deal with another important tool frequently used in data analysis: Pandas.
Pandas is a library for data manipulation and analysis that lets you manipulate heterogeneous data in tabular form (in contrast to NumPy, designed to work with homogeneous numerical data in array...
Hands-on NumPy(VI): Linear Algebra
Linear algebra has many useful applications in science and engineering. If you are doing scientific computing, it’s very likely that sooner or later you will need to use linear algebra to solve problems.
If your linear algebra is a bit rusty, you can take a look at Khan Academy’s linear algebra path, it’s free and it does a great job at explain...
Hands-on NumPy(V): Reductions/Aggregations
Reductions (or aggregations) are a family of NumPy functions that operate over an array returning a result with fewer dimensions.
Many of these functions perform typical statistical operations on arrays, while others perform dimensionality-reductions.
In this article, we will learn about some of the most common aggregations, but before we get ...
103 post articles, 6 pages.