Pandas, the data analysis library for Python, has finally reached a 1.0 release candidate. Pandas 1.0 removes a great deal of deprecated functionality and requires Python 3.6 or better.
Pandas was created for working easily with data in structured formats, such as tables, matrices, and time series data. Pandas eclipses much of the functionality of R’s dataframes, and works well with other scientific computing libraries in the Python world.
With Pandas 1.0, the creators of Pandas introduce a slew of breaking changes that have been in the works for some time now. Here is a rundown of the most significant ones, and how to handle them going forward.
Pandas requires Python 3.6.1 or higher
The biggest change in Pandas 1.0 is dropping support for all versions of Python earlier than Python 3.6.1. Pandas dropped support for Python 2 and committed exclusively to Python 3 as of 2019, so this is mostly a refinement of an existing policy.
The project also has a new support policy for future versions of Pandas. Any drop of support for a version of Python will be rolled out in major new versions of Pandas (2.0, 3.0, etc.). Minor releases will deprecate features, but not remove them; major releases will remove features.
Pandas’s new NA value
Earlier versions of Pandas used different types to represent missing data, depending on the type of the container — one for datetime types, one for objects, etc. All of these are being merged into a single missing-data type called NA. Right now, support for NA is limited to a few object types, and it’s considered experimental, so it should not yet be used in production.
Because of the number of changes to Pandas 1.0, some of Pandas’s APIs are now backwards-incompatible. This includes changes to the behaviors of many common elements:
Many of these incompatibilties will raise warnings, but it’s best to test existing Pandas scripts side by side with their Pandas 1.0 counterparts to see how they operate.
Deprecated features in Pandas 1.0
Pandas’s documentation lists all of the features to be deprecated but not removed in Pandas 1.0. Some of them have simply been renamed or reorganized, such as the testing module, while others change the use of certain function parameters. In a couple of cases, such as with
Index.item(), features have been rescued from deprecation and will continue to be available.
If you’re using a version of Pandas earlier than 0.25, the creators of Pandas recommend migrating to Pandas 0.25 first, making sure all Panda-dependent code behaves as expected, then migrating to Pandas 1.0. This is to ensure that any code that uses deprecated functionality will be flagged.
Features removed in Pandas 1.0
Some key Pandas features have been removed altogether in Pandas 1.0:
- Matplotlib unit registration. This is to prevent Matplotlib from being affected when you import Pandas.
- Many other features that were previously deprecated.
Again, this is another reason to test the Pandas 1.0 release candidate side-by-side with your existing Pandas installation, and ensure your scripts behave as intended.
Installing Pandas 1.0
Pandas 1.0 can be installed directly within Python by way of the Pip package manager, by typing
pip install pandas. Pandas 1.0 is also available as part of the Anaconda Python distribution for scientific computing.
In all cases, it’s best to install Pandas in a virtual environment, especially if you want to run tests of Pandas 1.0 scripts side-by-side with their earlier-version counterparts.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.