This directory contains tests for serialized objects from the regex-automata crate. Currently, there are only two supported such objects: dense and sparse DFAs.

The idea behind these tests is to commit some serialized objects and run some basic tests by deserializing them and running searches and ensuring they are correct. We also make sure these are run under Miri, since deserialization is one of the biggest places where undefined behavior might occur in this crate (at the time of writing).

The main thing we're testing is that the current code can still deserialize old objects correctly. Generally speaking, compatibility extends to semver compatible releases of this crate. Beyond that, no promises are made, although in practice callers can at least depend on errors occurring. (The serialized format always includes a version number, and incompatible changes increment that version number such that an error will occur if an unsupported version is detected.)

To generate the dense DFAs, I used this command:

$ regex-cli generate serialize dense regex \
    MULTI_PATTERN_V2 \
    tests/gen/dense/ \
    --rustfmt \
    --safe \
    --starts-for-each-pattern \
    --specialize-start-states \
    --start-kind both \
    --unicode-word-boundary \
    --minimize \
    '\b[a-zA-Z]+\b' \
    '(?m)^\S+$' \
    '(?Rm)^\S+$'

And to generate the sparse DFAs, I used this command, which is the same as above, but with s/dense/sparse/g.

$ regex-cli generate serialize sparse regex \
    MULTI_PATTERN_V2 \
    tests/gen/sparse/ \
    --rustfmt \
    --safe \
    --starts-for-each-pattern \
    --specialize-start-states \
    --start-kind both \
    --unicode-word-boundary \
    --minimize \
    '\b[a-zA-Z]+\b' \
    '(?m)^\S+$' \
    '(?Rm)^\S+$'

The idea is to try to enable as many of the DFA's options as possible in order to test that serialization works for all of them.

Arguably we should increase test coverage here, but this is a start. Note that in particular, this does not need to test that serialization and deserialization correctly roundtrips on its own. Indeed, the normal regex test suite has a test that does a serialization round trip for every test supported by DFAs. So that has very good coverage. What we're interested in testing here is our compatibility promise: do DFAs generated with an older revision of the code still deserialize correctly?