Changing Probabilities

Learn how changing probabilities works and how to use the technique in making custom generators.

Why is changing probabilities required?

To understand why and when changing probabilities is useful, let’s take a look at an example. Let’s imagine that we have a generator that looks for ISO Latin1 strings by using the io_lib:printable_latin1_list function, which will let us restrict the range of string():

latin1_string() ->
    ?SUCHTHAT(S, string(), io_lib:printable_latin1_list(S)).

Similarly as with Unicode, ensuring that nothing goes out of range looks like the following:

unicode_string() ->
    ?SUCHTHAT(S, string(), io_lib:printable_unicode_list(S)).

With transforms, filters, and resizes, we can get pretty far in terms of retargeting our generators to do what we want. The latin1 generator shows something interesting, though. The default string() generator has a large search space, and therefore filtering out the unwanted data can be expensive. On the other hand, most Unicode characters can’t be represented within latin1, and transforming the generated strings themselves would also be expensive: How would we map emojis to latin1 characters?

We can’t solve this problem efficiently with transforms and filters alone, and for this specific issue, resizing wouldn’t be of much help either. Instead, we’ll have to build our own generators while controlling probabilities to make them do what we want.

Changing probabilities

The last fundamental building block that really gives us control over data generation is having the ability to tweak the probabilities of how data is generated. By default, the generators provided either generate a large potential space, like string(), number(), or binary(), or in a rather narrow scope, such as boolean() or range(X,Y).

Using ?LET() allows us to transform all of the data, and ?SUCHTHAT() allows us to remove some of it. But it’s difficult to achieve a middle ground between the two. When we truly need a custom solution, probabilistic generators can help.

We had a look a oneof(ListOfGenerators) already in the Collecting lesson, which helped us gain more repeatable keys in the following generator:

key() -> oneof([range(1,10), integer()]).

This shows how two distinct generators can be used together to help build and steer things in the direction we want. The oneof(Types) generator is simple and useful, but the most interesting generator is frequency(), which allows us to control and choose the probability of each generator it contains.

Let’s take strings as an example since they were already causing us problems. Just using string() tended to yield a lot of control characters, extremely variable codepoints, and very little in terms of the latin1 or ASCII characters. Let’s look at how frequency() can be used to help us with our problem. Let’s look at the code.

Note: Running the code will open the shell.

Run the following commands to see the output and compare the result.

  • Run proper_gen:pick(proper_types:string()). to see regular string generation of Erlang.
  • Run proper_gen:pick(prop_generators:text_like()). to see our generator in action.
  • Run proper_gen:pick(proper_types:resize(79, prop_generators:text_like())). to impose size restrictions.

Get hands-on with 1300+ tech skills courses.