CSV Parsing: The CSV Parser
Take a look at the implementation of the CSV parser in our example application.
We'll cover the following...
The CSV parser
We can now move on to implementing a CSV parser. Here is a possible implementation:
defmodule Bday.Csv dodef encode([]), do: ""def encode(maps) dokeys = Enum.map_join(Map.keys(hd(maps)), ",", &escape(&1))vals =for map <- maps, do: Enum.map_join(Map.values(map), ",", &escape(&1))to_string([keys, "\r\n", Enum.join(vals, "\r\n")])enddef decode(""), do: []def decode(csv) do{headers, rest} = decode_header(csv, [])rows = decode_rows(rest)for row <- rows, do: Map.new(Enum.zip(headers, row))endend
Note: Decoding is done by fetching the headers, then fetching all of the rows. A header line is parsed by reading each column name one at a time, and a row is parsed by reading each field one at a time.
First, there’s the public interface with two functions:
encode/1
decode/1
.
The functions are fairly straightforward, delegating the more complex operations to private helper functions. Let’s start by looking at those helping with encoding:
defp escape(field) doif escapable(field) do~s|"| <> do_escape(field) <> ~s|"|elsefieldendenddefp escapable(string) doString.contains?(string, [~s|"|, ",", "\r", "\n"])enddefp do_escape(""), do: ""defp do_escape(~s|"| <> str), do: ~s|""| <> do_escape(str)defp do_escape(<<char>> <> rest), do: <<char>> <> do_escape(rest)
If a string is judged to need escaping (according to escapable/1
), then the string is wrapped in double quotes ("
) and all double quotes inside of it are escaped with another double quote. With this, encoding is covered. Next, there are decoding’s private functions:
defp decode_header(string, acc) docase decode_name(string) do{:ok, name, rest} -> decode_header(rest, [name | acc]){:done, name, rest} -> {[name | acc], rest}endenddefp decode_rows(string) docase decode_row(string, []) do{row, ""} -> [row]{row, rest} -> [row | decode_rows(rest)]endenddefp decode_row(string, acc) docase decode_field(string) do{:ok, field, rest} -> decode_row(rest, [field | acc]){:done, field, rest} -> {[field | acc], rest}endenddefp decode_name(~s|"| <> rest), do: decode_quoted(rest)defp decode_name(string), do: decode_unquoted(string)defp decode_field(~s|"| <> rest), do: decode_quoted(rest)defp decode_field(string), do: decode_unquoted(string)
Decoding is done by fetching the headers, then fetching all of the rows. A header line is parsed by reading each column name one at a time, and a row is parsed by reading each field one at a time. At the end we can see that both fields and names are actually implemented as quoted or unquoted strings:
defp decode_quoted(string), do: decode_quoted(string, "")defp decode_quoted(~s|"|, acc), do: {:done, acc, ""}defp decode_quoted(~s|"\r\n| <> rest, acc), do: {:done, acc, rest}defp decode_quoted(~s|",| <> rest, acc), do: {:ok, acc, rest}defp decode_quoted(~s|""| <> rest, acc) dodecode_quoted(rest, acc <> ~s|"|)enddefp decode_quoted(<<char>> <> rest, acc) dodecode_quoted(rest, acc <> <<char>>)enddefp decode_unquoted(string), do: decode_unquoted(string, "")defp decode_unquoted("", acc), do: {:done, acc, ""}defp decode_unquoted("\r\n" <> rest, acc), do: {:done, acc, rest}defp decode_unquoted("," <> rest, acc), do: {:ok, acc, rest}defp decode_unquoted(<<char>> <> rest, acc) dodecode_unquoted(rest, acc <> <<char>>)end
Both functions that read ...