...

/

Stream-based Dataflow Programming

Stream-based Dataflow Programming

Understand how streams allow us to formulate dataflow programs that work with signals of infinite elements.

Limitations of list-based dataflow programming

The functional paradigm allows us to formulate programs in the dataflow style by composing function lists, such as map, filter, and fold. Unfortunately, list-based dataflow programming in OCaml has a severe limitation—lists are finite.

To illustrate, suppose we implement a function called first_prime_greater_equal. This function returns the first prime number greater than or equal to a given n. We can attempt to formulate a dataflow program to solve this problem with the following steps:

  • Enumerate integers greater than or equal n
  • Filter the list, keeping only prime numbers
  • Take the head of the resulting list

In other words, we attempt to formulate the following dataflow diagram:

To differentiate an infinite list from a finite one, we will write infinitely many elements in triangle brackets, e.g., <100; 101; 102; ...>.

Unfortunately, as elegant as the dataflow diagram is, we can not formulate it. The enumerate and filter components need to work with signals of infinitely many elements. However, OCaml lists can only hold a finite number of elements. To see why, let’s try to define a list of infinite natural numbers: <0; 1; 2; ...> with the following recursive function:

let rec naturals_from n = n :: naturals_from (n + 1)

We applythat function to 0 to obtain the sequence of naturals ( 0, 1, 2,...).

let naturals = naturals_from 0
Press + to interact
(** [naturals_from n] is a list of naturals greater than or equal [n]. *)
let rec naturals_from n = n :: naturals_from (n + 1)
(** [naturals] is a list containing all natural numbers. *)
let naturals = naturals_from 0

If we run the code above, we’ll see a stack overflow exception. This is because when we try to apply naturals_from 0, it calls naturals_from 1, which calls naturals_from 2, and so forth. This continues until the stack limit is reached. In other words, because OCaml is a strict language, the hd :: tl list constructor always evaluates the two arguments, hd and tl, regardless of whether their values are needed or not.

To circumvent the problem of infinite elements, we can define a first_prime_between function that returns the first prime number between a and b. Since the [a, b] range is finite, we can formulate a list-based dataflow program to solve this problem.

We can implement first_prime_between in OCaml as follows.

Press + to interact
(** [enumerate_integers a b] returns a list of integers between [a] and [b]. *)
let rec enumerate_integers a b = if a > b then [] else a :: enumerate_integers (a + 1) b
(** [is_prime n] is [true] if [n] is a prime or [false] otherwise. *)
let rec is_prime n =
if n < 2 then false else if n = 2 then true else
let rec aux m = if n mod m = 0 then false
else if m * m > n then true else aux (m + 1) in aux 2
(** [first_prime_between a b] is the first prime number between [a] and [b]. *)
let first_prime_between a b = enumerate_integers a b |> List.filter is_prime |> List.hd
(** Printing the result of 'first_prime_between 100 1000'. *)
let _ = print_int (first_prime_between 100 1000)

Although this function works, it’s incredibly inefficient. It enumerates all integers from 100 to 1000, and filter checks every single number to see if it’s prime or not. It would be much more efficient to incrementally check whether a candidate is a prime number starting from a (100 in this example). Once a prime number is found, the execution stops.

In an imperative programming language ...