How pandas infers data types when parsing CSV files – Source Rushter.com
I was always wondering how pandas infers data types and why sometimes it takes a lot of memory when reading large CSV files. Well, it is time to understand how it works.
This article describes a default C-based CSV parsing engine in pandas.
First off, there is a
low_memory parameter in the
read_csv function that is set to
True by default. Instead of processing whole file in a single pass, it splits CSV into chunks, which size is limited by the number of lines. A special heuristic determines the number of lines —
2**20 / number_of_c ...