Server Encoding and Client Encoding

Learn what is available for encoding in PostgreSQL.

Encoding

An encoding is a particular representation of characters in bits and bytes. In the ASCII encoding, the letter A is encoded as the 7-bits byte 1000001, 65 in decimal, or 41 in hexadecimal. All those numbers are going to be written the same way on disk, and the letter A too.

Database encoding

The SQL_ASCII encoding is a trap we need to avoid falling into. To know which encoding our database is using, run the psql command \l. Type the following command after logging in to the database to see the output.

\l

The encoding here is UTF-8, which is the best choice these days, and we can see that the collation and ctype are English based in the UTF-8 encoding, which is good for our installation. We can, of course, pick something else.

The non-encoding SQL_ASCII accepts any data we throw at it, whereas the UTF-8 encoding (and some others) do check for valid input. The use of SQL_ASCII is not advised because we will not be able to retrieve data in any encoding and will lose data. Migrating away from SQL_ASCII to a proper encoding such as UTF-8 is possible but lossyInvolving or causing some loss of data and complex.

How PostgreSQL deals with encoding

We can also have a UTF-8 encoded database and use a legacy application (or programming language) that doesn’t know how to handle Unicode properly. In that case, we can ask PostgreSQL to convert all and any data on the fly between the server-side encoding and our favorite client-side encoding, thanks to the client_encoding setting.

show client_encoding;

Here again, we use UTF-8 client side, which allows handling the accentuated French characters we saw previously.

 client_encoding 
═════════════════
 UTF8
(1 row)

Be aware that not all combinations of server encoding and client encoding make sense. While it’s possible for PostgreSQL to communicate with our application using the latin1 encoding on the client side, if the server-side dataset includes texts in incompatible encodings, PostgreSQL will issue an error. Such texts might be written using non-Latin scripts such as Cyrillic, Chinese, Japanese, Arabic, or other languages.

Emacs facilityEmacs

Now, of course, we can’t have that data sent to us in latin1:

yesql# set client_encoding to latin1;
SET
yesql# select * from hello where language ~ 'Georgian';
ERROR:  character with byte sequence 0xe1 0x83 0xa5 in encoding "UTF8" ⏎
has no equivalent in encoding "LATIN1"
yesql# reset client_encoding;
RESET

Note: Please check it in the playground given in the last section.

So, if it’s possible, we’ll use UTF-8 encoding, which will be much simpler. It must be noted that Unicode encoding makes comparing and sorting text a rather costly operation. That said, being fast and wrong is not an option, so we’re still going to use Unicode text!

Try it yourself

The service for PostgreSQL has been started in the playground. Use the following command to list the databases and their details:

Get hands-on with 1300+ tech skills courses.