Diving In
In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in UTF-8
, or a Python string encoded as CP-1252. “Is this string UTF-8
?” is an invalid question. UTF-8
is a way of encoding characters as a sequence of bytes. If you want to take a string and turn it into a sequence of bytes in a particular character encoding, Python 3 can help you with that. If you want to take a sequence of bytes and turn it into a string, Python 3 can help you with that too. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.
Get hands-on with 1400+ tech skills courses.