While the approach to generate and store a list of strings does allow for this list to be re-used in the future, the I/O involved turned out to be quite costly. While the generation can run at up to 500 MiB/s, even compressing that on the fly doesn't reach fast enough write speeds on a HDD. And compression is also necessary to store this amount of data (generation reached two TB of raw data with a word length of just three, which is still 600 GB compressed). But compression also makes working with that data a lot harder. So this instead combines both the generation and search into a single step. The intermediate result of the generation is therefore lost, but the overall pipeline is much faster. |
||
---|---|---|
.. | ||
dtmm | ||
dtmt |