CSV Parsing: The Property
Learn to design the right properties and generators for our example application.
We'll cover the following...
CSV format
CSV is a loose format that nobody really implements the same way. This can be quite confusing even though RFC 4180 tries to provide a simple specification:
-
Each record is on a separate line, separated by CRLF (a
\r
followed by a\n
). -
The last record of the file may or may not have a CRLF after it. This is optional.
-
The first line of the file may be a header line, ending with a CRLF. In this case, the problem description includes a header, which will be assumed to always be there.
-
Commas go between fields of a record.
-
Any spaces are considered to be part of the record. The example in the problem description doesn’t respect that, since it adds a space after each comma even though it’s clearly not part of the record.
-
Double quotes (
"
) can be used to wrap a given field. Fields that contain line breaks (CRLF), double quotes, or commas must be wrapped in double-quotes. -
All records in a document contain the same number of fields.
-
A double-quote within a double-quoted field can be escaped by preceding it with another double quote (
"a""b"
meansa"b
). -
Field values or header names can be empty.
-
Valid characters for records include only the following special and alphabetic characters:
! #$%&'()*+-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]`^_abcdefghijklmnopqrstuvwxyz{|}~
This means the official CSV specs won’t let us have employees whose names don’t fit that pattern. We can always extend the tests later for better customizations, but for now we’ll implement this ...