DIY: UTF-8 Validation

Solve the interview question "UTF-8 Validation" in this lesson.

Problem description

Given an integer array data, return whether it is a valid UTF-8 encoding.

A character in UTF8 can be from 1 to 4 bytes long, subject to the following rules:

  • For a 1 byte character, the first bit of the packet is 0, followed by its Unicode code.
  • For an n-bytes character, the first n bits are all 1s, the n + 1 bit is 0, followed by n - 1 bytes, with the most significant 2 bits being 10.

This is how the UTF-8 encoding represents characters in specific ranges:

Char. number range (hexadecimal) UTF-8 octet sequence (binary)
0000 0000 - 0000 007F 0xxxxxxx
0000 0080 - 0000 07FF 110xxxxx 10xxxxxx
0000 0800 - 0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx
0001 0000 - 0010 FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Note: The input is an array of integers. Only the least significant 8 bits of each integer are used to store the data. This means each integer represents only 1 byte of data.

Input

The input will be a vector of integers data. The following two are example inputs to the function:

// Example - 1
data = [198, 150, 9, 8]

// Example - 2
data = [255, 129, 129, 129, 129, 129, 129, 129]

Output

For the above input, the output will be:

// Example - 1
true

// Example - 2
false

Coding exercise

For this coding exercise, you have to implement the validUtf8(data) function, where data represents a vector of integers. The function will return true or false depending on whether the given vector of data is valid UTF8 encoding.

Access this course and 1400+ top-rated courses and projects.