Smart digital content platform

Smart Digital Content Platform

Migrating to Python 3.7 and Django 3.2

During the last few weeks of February 2022, we migrated our product to Python 3.7 and Django 3.2. It was a hard road, but it made us grow professionally and gave us deep knowledge of these technologies that we would like to share with you.
Picture of Alejandro Villegas

Alejandro Villegas

Purpose and starting point

Athento is a complex product that has been under development and evolution for several years. Until now, Athento’s technology stack was based on Django 1.11 and Python 2.7, versions of these technologies that we needed to renew. In addition to the support issues we faced when using technologies that were already a few years old, we had limitations in terms of the functionality we could offer to our customers: many of the packages and libraries we wanted to use to implement cool features required newer versions of these technologies. Although we often managed to find a workaround, the time had come: we had to upgrade.

Our starting point was a product in these technologies, with a hundred installations. We had to upgrade the technological stack with the least impact on the more than 10K users who work with Athento daily. More than 230,000 lines of code to migrate, 80+ Django apps, and an additional challenge: to continue providing support to our customers, applying fixes, and implementing some new features in the old version of the product.

Analysis and start

Our first steps were to evaluate the strategy to follow and the upgrades we were going to apply, and in what order. None of us had done an upgrade of this caliber before. It was clear to us that any plan we might devise at this point was going to be subject to change and that we needed to be flexible and patient in trying new ideas. Our first question was: Python first or Django? or both at the same time?

We solved this question by studying compatibilities.  We found that Django 1.11 supported Python versions up to 3.7. Having Python 3.7 allowed us to increase our Django version up to 3.2. While not the most modern versions of these technologies, they were a vast improvement to the versions we had with probably less effort than it would take to migrate to more modern versions. In addition, the candidate versions of Python and Django already had enough experience and background to give us peace of mind. So we decided to migrate the entire project to Python 3.7 with Django 1.11 and, in a second step, to migrate from Django 1.11 to Django 3.2.

Upgrading to Python 3.7

Upgrading the Python version from 2.7 to 3.7 involved two main tasks: updating the code and updating all the pip requirements of the project.

To update the code we had to do a complete analysis of Athento with the help of tools like 2to3 that basically, analyze the code and suggest the syntax changes required to convert from Python 2 to Python 3.

Analyzing the results returned by the tool, we decided not to opt for automatic conversion, but to get the suggested changes and apply them if they made sense. The reason for this decision is that some changes did not seem reasonable to us: function calls such as print added a double parenthesis or the results of certain iterators or Django QuerySets were converted to lists (this sometimes means a loss of performance if the evaluation was lazy). As Athento consists of more than 2000 Python 2 source files, we divided the task between the whole team to finish it as soon as possible. The most common changes we had to make to the code were:

  • Change imports, since in-project imports work differently in Python 3 (now require import path to start with package name)
  • Unicode and str types are now the same.
  • Some types have lost some functions, or they have changed their behavior (e.g. has_key() of dictionaries is gone, and the keys() method behaves differently)
  • When collecting an exception you could use a comma to separate the exception type and the collected exception object; now you have to use the reserved word ‘as’.
  • String encode and decode methods have different behavior.
  • Some packages disappear (like commands, replaced by subprocess) or their structure changes (like urllib).

Using external packages that we installed with pip presented another challenge. First of all, certain packages are not available in Python 3, or the versions we had are not compatible. Also, since Athento is a complex product, it can happen that upgrading package versions results in some incompatibility between them, which we had to work around. The lessons learned are the following:

  • It is advisable to remove all requirement packages that are not needed. This is also a good security practice. However, this requires a deep knowledge of the application (which in our case is quite large).
  • You have to check the versions of each package for compatibility. To do this you should visit sites like PyPI, readthedocs, or the repositories of each package.
  • Sometimes we find dependencies between packages: version X of a package requires to have another one in a later version than Y but, we were installing it in another incompatible version. A common procedure that we had when faced with these failures consisted of:
    • Remove the failed dependency from the requirements
    • Install it from pip without indicating its version (it will look for the appropriate one).
    • Check in the documentation if that version is compatible and there are no breaking changes with the one we originally had.
    • Uninstall it from pip, and add the new version in the requirements
    • On other occasions, there were no Python 3 versions of some packages, but we could access the Git repository that contains them. In these cases, we were able to make a fork, and rewrite the code to adapt it to Python 3.

Despite all our efforts and the changes we made, there were several packages that, although compatible with Python 3.7, were still incompatible with Django 1.11, and we had to plan a migration from Django earlier than expected. It is also worth pointing out a flaw in our procedure that we could have targeted differently: when most of the requirements were installed, part of the team continued fixing the rest of the requirements while others were dedicated to testing some of the functionality that could already work. Perhaps it would have been more appropriate to start the testing phase once the complete migration to the new version of Django was finished.

Migrating to Django 3.2

The Django upgrade, in general, was lighter than the Python upgrade. Django-related requirements had to be reviewed, again, as some of them were already incorporated natively and others required updating.

A couple of cases that created some uncertainty were the Hijack package, used to implement the user login impersonation functionality and others that served to extend the Django administration functionality. These functionalities are very important within Athento as they are widely used by our project and support teams to quickly debug possible bugs or facilitate the implementation of the platform for our customers. Finally, we found a suitable version of Hijack and chose to use Grapelli as the administration system.

Tests and corrections

Once the code problems were solved, the versions were updated, and the project was started, testing had to begin. In this first phase, the whole team got involved, testing each and every feature of Athento. Although now the syntax of the project was correct with Python 3, semantically there are things that have changed and did not work well.

Most of the encode and decode problems on strings that we had changed during the code migration, caused unwanted effects: the conversion from bytes to str and vice versa sometimes returned the wrong type of data and certain sections of the code had to be rewritten. Related to the above, opening files in binary or text mode causes several problems. We had to study well the io package of Python 3.7 and rewrite part of the code using the BytesIO, TextIO, and TextIOWrapper classes, among others.

It should also be noted that while the whole team was doing tests and corrections on this new version of Athento, we had to keep adding functionality and providing support to users of the previous version, in Python 2. This was a challenge for the day-to-day work of the development team since we had to prepare the database and our entire development environment to support both versions, and sometimes we encountered certain problems. One of them was the Django cache, which could cause us problems when switching from one version to another. We solved this by clearing this cache after each change:

from django.core.cache import cache; cache.clear()

Another common problem is bringing to the new version of Python 3 all the hotfixes or new features that are added to the Python 3 project. To do this we apply the following procedure: add to git a remote of the Python 2 version of the project, place ourselves in the main branch in the Python 3 project, and merge the changes. This is a costly process, as there are often some conflicts or code written that is not compatible with Python 3 and needs to be rewritten. This process is now part of our workflow until Athento’s Python 2 version is deprecated and all our customers have migrated to the new version.

$ git remote add
$ git fetch
$ git switch -c merge/merge-py2-py3
$ git merge /fast-track

Installation on a new server

Our next objective, once all the fixes had been applied, was to install Athento on a new clean server. The idea was to make it easier for system colleagues to install and configure environments in the new version and, at the same time, to allow support and project teams to test the product functionalities in the updated version.

Based on the existing for the previous version, we performed a review and update of the required OS level package. The most significant change is that many of the installed packages have different versions depending on whether Python2 or Python3 is used. As an example, we can take the zbar package, where we had to change from python-zbar to python3-zbar.

As mentioned above, deploying the new version of Athento allowed the rest of the teams within the company to test the product in an environment as close as possible to production. For a few days, we iterated between tests and fixes, following the established testing protocol.

Migrating Athento servers

Once Athento was working correctly in its Python 3 version, and having already placed this version in several new servers for different customers, we had to proceed with the upgrade of servers that have Athento with Python 2.7 and Django 1.11 to the new versions. To do so, we established the following protocol with the help of the company’s systems team:

  • A backup of the product, in its old version, is performed.
  • The new product is installed.
  • Symbolic links, server configuration, database configuration, etc… are changed from the old product to the new one.
  • Any customizations that the product may have (usually made by the project team) are incorporated into the new installation.
  • Migrations of Django models from the new project are performed (some may have changed). We will talk about this in more detail below.

Migrations are usually generated by Django and have to be applied because some models may have changed their implementation, especially those models handled by external dependencies of the project. We have encountered different problems in this regard. For example, when generating new migrations and applying them, we get errors because the models already exist (remember, we must use the database already installed so that the client does not lose any data). Finally, we have proposed a couple of strategies that will have to be evaluated depending on the state of each server. On the one hand, we can use the -fake flag when applying migrations. In this way, they are registered as applied, although they do not change the data model stored in the database. Then, an application testing process is performed and any inconsistent models are fixed. Another alternative is to generate migrations from the old version of the product and copy them to the new one. We then generate them again in the new version, so they will be added after the previous ones and leave the database models consistent. The problem with this approach is that migrations generated in the Python 2 version of the product may contain code that is not executable in Python 3 and has to be corrected by hand. Following one strategy or the other is going to depend heavily on the individual servers.

Continuous improvement work

After all this adventure and learning process we continue to update our customers on the new version of the product, on which we will be adding new functionality. As a product team, we will continue to update this new version with all the fixes that are added to the previous version. In addition, we will continue to collaborate with the project, support, and systems teams in bringing more and more customers to this version of the product. While all this is happening, we will be adding new features to Athento to make it a more complete, useful, fast, and reliable tool. Python 3 will also allow us to delve deeper into technologies such as Machine Learning, another of the great incentives we have had to undertake this migration.

We hope that this experience will help the community and all those who are about to embark on a journey like the one we have undertaken.