# Book Review: Machine Learning with Python Cookbook

Book Review: Machine Learning with Python Cookbook Source – PbPython.com

## Introduction

This article is a review of Chris Albon’s book, Machine Learning with Python Cookbook.

This book is in the tradition of other O’Reilly “cookbook” series in that it

contains short “recipes” for dealing with common machine learning scenarios in python.

It covers the full spectrum of tasks from simple data wrangling and pre-processing

to more complex machine learning model development and deep learning implementations.

Since this is such a fast moving and broad topic, it is nice to get a new book

that covers the latest topics and presents them in a compact but very useful format.

Bottom line, I enjoyed reading this book and think it will be a useful resource to have

on my python bookshelf. Read on for some more details about the book and who will benefit

most from reading it.

## Where does this book fit?

As data science, machine learning and AI have become more and more popular, there

is a proliferation of books that try to cover these topics in differing manners.

Some books go very deep in the math and theory behind the various machine learning

algorithms. Others try to cover a lot of content but do not provide a quick reference

resource with code examples for solving real world problems. Machine Learning with Python Cookbook,

fills this code-heavy niche with lots of examples. There

are very few paragraphs with math equations or details behind the implementation

of machine learning algorithms. Instead, Chris Albon breaks the topics down into

bite size chunks that solve a very specific problem. Each of the nearly 200

recipes follows a similar format:

- Problem definition
- Solution
- Discussion (optional)
- Additional resources (optional)

In most cases, the problem definition is as simple as “You want to multiply two matrices” or

“You need to visualize a model created by a decision tree learning algorithm.” This organization

makes it convenient to look at the table of contents, and find the relevant section with ease.

Each solution is fully self-contained and can be copied and pasted into

a standalone script or jupyter notebook and executed. In addition, the code sample includes all

the necessary imports as well as sample data sets (e.g. Iris, Titanic, MNIST). They are all

around 12-20 lines of code with comments included so they are easy to dissect and understand.

In some cases, there is further discussion about the approach as well as hints and tips related

to the solutions. In many cases, topics like performance for larger and more complex

data sets are discussed and options are presented for managing those situations.

Finally, the author also includes links to more details that might be useful when

you need to dive into the problem in more depth.

## Who should read it?

The author is very clear that this book is not an introduction to python or machine learning.

Since the recipes are short, the actual python code is fairly simple. There’s no need

to understand complex python data structures or programming constructs outside of

lists and dictionaries. You should know how to install python libraries such as

numpy, pandas and scikit-learn.

More importantly, you should have at least some experience using these libraries

to load and manipulate data. I also highly recommend that you have done some work

with building predictive models with scikit-learn. A lot of the value I gained from

this book was related to learning solutions to problems I encountered in my own work.

Finally, some basic understanding of supervised and unsupervised machine learning

algorithms is going to be really helpful. For example, if you do not know the types

of problems where you would use linear vs. logistic regression or

why you might need to use dimensionality reduction, then this book (especially

chapters 9 and higher) might not make sense.

## How should you read it?

Because the book is a cookbook, it’s not necessary to read it from page 1 through

340. However, I do think it is best to skim through it in order to understand what

content is available. For instance, I felt very comfortable with the content in

chapter 2 (Loading Data) and Chapter 3 (Data Wrangling) so I skimmed the content.

For other chapters, I felt like I got a lot more out of reading the examples

in depth since I did not have as much experience with those topics.

Ultimately though, this is a resource that is meant to sit beside your computer and

provide a quick lookup for a specific problem. With that goal in mind, it achieves

its aim admirably.

## Chapter Overview

The book only has 340 pages of content but it is broken down into 21 chapters. In my opinion,

this is a good structure because each chapter provides a concise introduction

of a topic and specific code examples that solve common problems.

The chapters start with basic numpy functions, then move to more complex pandas and sckit-learn

functions and close out with some keras examples. Here’s a list of each chapter

along with its primary focus:

- Vectors, Matrices and Arrays [numpy]
- Loading Data [scikit-learn, pandas]
- Data Wrangling [pandas]
- Handling Numerical Data [pandas, scikit-learn]
- Handling Categorical Data [pandas, scikit-learn]
- Handling Text [NLTK, scikit-learn]
- Handling Dates and Times [pandas]
- Handling Images [OpenCV, matplotlib]
- Dimensionality Reduction Using Feature Extraction [scikit-learn]
- Dimensionality Reduction Using Feature Selection [scikit-learn]
- Model Evaluation [scikit-learn]
- Model Selection [scikit-learn]
- Linear Regression [scikit-learn]
- Trees and Forests [scikit-learn]
- K-Nearest Neighbors [scikit-learn]
- Logistic Regression [scikit-learn]
- Support Vector Machines [scikit-learn]
- Naive Bayes [scikit-learn]
- Clustering [scikit-learn]
- Neural Networks [keras]
- Saving and Loading Trained Models [scikit-learn, keras]

To illustrate how the chapters work, let’s look at chapter 15 which cover K-Nearest Neighbors (KNN).

In this cases, the introduction recipe (15.0) gives a concise summary of KNN and why it is a

popular tool.

Now that we remember what KNN is used for, we’re likely going to want to apply it

to our data. First, we will want “to find an observation’s

k

nearest observations (neighbors).”

Recipe 15.1 contains specific code as well as some more detail around the various

algorithm parameters we can tweak such as the distance metrics (Euclidean, Manhattan or Minkowski).

Next, recipe 15.2 shows how to take some unknown data and predict its class based on neighbors.

This recipe uses the iris data set but also includes important caveats about scaling data when using KNN.

Recipe 15.3 then moves on to cover a common challenge with KNN, specifically how do you select the

best value for k? This recipe uses scikit-learn’s

Pipeline

function and

GridSearchCV

to conduct a cross-validation of KNN classifiers with different values of

k

. The code is simple

to comprehend and easy to extend to your own data sources.

The point is that each chapter can be consumed at the individual recipe level or

read more broadly to understand the concept in more detail. I really like this approach

because so many topics are covered at a quick pace. If I feel the need to dive into

the mathematical rationale for an approach, I can use these recipes as a jumping off

point for further review.

## Additional Considerations

The only criticism I can place is that I wish there were more topics covered

in the content. Some specific areas I would have liked to learn about are

coverage of ensemble methods as well as a discussion about xgboost.

In some cases, it might be useful to understand some of the additional libraries

in the python eco-system. From a NLP perspective, I know that NLTK is the standard

but have heard good things about spaCy as well so would be curious where it fits

in this space. The neural network space is changing rapidly so I think keras was

a good choice but it might be interesting to learn about some of the other options like PyTorch.

I am sure there are a lot of other potential topics that were considered so I can imagine it was really

tough to decide what was in and out of scope. All of my suggestions are based on topics that

sprang to my mind and are meant only as potential ideas for another edition (if that is the plan).

Originally, I had some concerns about using the basic data sets (Titanic, Iris, etc) in most

examples. However, now that I have reflected on it, I like that the examples are so self-contained

and think it would be much more difficult to create such a great resource if there

needed to be more explanation of the data.

Also, it would be nice if the code examples were available online so you could do some quick

copying and pasting instead of typing it all in by hand. This may be available so

if I find it, I’ll be sure to update it.

The final comment I have is related to the price of the book. The current US list

price is $59.99 which may seem steep for a 340 page book. However, I think the book

is worth it and encourage those interested to purchase it. The content is

great and I see it being very useful to those using pandas + scikit-learn on a frequent

basis. It is clear that Chris knows what he is talking about and he explains

the details well. I predict that this book will become well broken in as I frequently refer to it.

The second reason it is important to purchase these books is so that

authors and publishers know that the python community values this type of content.

I can not imagine how long it took Chris to write this book. I can only guess that

the royalties will probably not afford him an early retirement any time soon! Still,

I do want to make sure he gets at least some compensation for this valuable resource

and want to provide encouragement to him for a job well done.

## Conclusion

Overall, the Machine Learning with Python Cookbook is an extremely useful book which

is aptly described in the tag line as “Practical Solutions From Preprocessing to Deep Learning.”

Chris has done a fabulous job of collecting a lot of the most common machine learning

problems and summarizing solutions. I definitely encourage those of you using

any of the libraries mentioned here to pick up this book. I have added this

book to my recommended resources page so please check it out and see if

any of the other recommendations might be useful. Also, let me know if you

find this review useful.