Positions, Lines, and Columns

Learn how to get the line or column number of a character.

When we look at a cursor’s position with a text editor, we often see it displayed in lines and columns (e.g., line 7, column 62). We know that strings are represented in memory as a contiguous array of bytes. The question then becomes: how do we apply 2D concepts on top of a unidimensional array of bytes?

Implementation of the unpack method

We can unpack a string value in the following manner:

Press + to interact
<?php
$text = "one\ntwo\nthree four\nfive";
unpack('C*', $text);

This returns the following list of bytes in decimal form:

array:23 [
     1 => 111
     2 => 110
     3 => 101
     4 => 10
     5 => 116
     6 => 119
     7 => 111
     8 => 10
     9 => 116
    10 => 104
    11 => 114
    12 => 101
    13 => 101
    14 => 32
    15 => 102
    16 => 111
    17 => 117
    18 => 114
    19 => 10
    20 => 102
    21 => 105
    22 => 118
    23 => 101
]

Implementation of the getLineNumber method

Looking at the results in the code above, we can glean a few clues as to what we can do to convert arbitrary string positions into their line and column numbers. For instance, we know that the decimal value 10 corresponds to the newline character. To get the current line number from a position within the string, we can count the number of times this character has appeared before the current position. The following code example demonstrates a simple way we could convert any string position into a line number:

Press + to interact
<?php
$text = "one\ntwo\nthree four\nfive";
function getLineNumber($value, $position) {
return mb_substr_count(
mb_substr($value, 0, $position),
"\n"
) + 1;
}
// Get line number of the first "o".
// Returns 1
getLineNumber($text, 0);
// Get the line number of the first "t".
// Returns 2
getLineNumber($text, 4);

This initial implementation works by using the desired position and extracting a substring from the original string. We then count the number of newline characters that appear in the resulting string. The line count will be zero-based, so we add one to get a human-friendly line number. This technique works well on smaller strings but can become quite memory intensive, creating substrings on huge strings. We will explore ways to address this later.

Implementation of the getCloumnNumber method

We can employ a similar strategy ...