How pandas infers data types when parsing CSV files

Python

How pandas infers data types when parsing CSV files – Source Rushter.com

I was always wondering how pandas infers data types and why sometimes it takes a lot of memory when reading large CSV files. Well, it is time to understand how it works.

This article describes a default C-based CSV parsing engine in pandas.

First off, there is a low_memory parameter in the read_csv function that is set to True by default. Instead of processing whole file in a single pass, it splits CSV into chunks, which size is limited by the number of lines. A special heuristic determines the number of lines — 2**20 / number_of_c ...