Summary

What have we learned?

  1. Porting any non-trivial amount of code from Python 2 to Python 3 is going to be a pain. There’s no way around it. It’s hard.
  2. The automated 2to3 tool is helpful as far as it goes, but it will only do the easy parts — function renames, module renames, syntax changes. It’s an impressive piece of engineering, but in the end it’s just an intelligent search-and-replace bot.
  3. The #1 porting problem in this library was the difference between strings and bytes. In this case that seems obvious, since the whole point of the chardet library is to convert a stream of bytes into a string. But “a stream of bytes” comes up more often than you might think. Reading a file in “binary” mode? You’ll get a stream of bytes. Fetching a web page? Calling a web api? They return a stream of bytes, too.
  4. You need to understand your program. Thoroughly. Preferably because you wrote it, but at the very least, you need to be comfortable with all its quirks and musty corners. The bugs are everywhere.
  5. Test cases are essential. Don’t port anything without them. The only reason I have any confidence that chardet works in Python 3 is that I started with a test suite that exercised all major code paths. If you don’t have any tests, write some tests before you start porting to Python 3. If you have a few tests, write more. If you have a lot of tests, then the real fun can begin.

Get hands-on with 1400+ tech skills courses.