Guide#
Parallelisation strategy#
The code is designed based on the following assumptions:
Input data are stored in individual 2D slices. Reading part of a single slice requires reading the whole slice into memory, and this is an expensive operation.
Writing a single chunk of output data is an expensive operation.
Reading a single chunk of output data is a cheap operation.
If we have input slices of shape (nx, ny), and an output chunk shape of (nc, nc, nc) it makes sense to split the conversion into individual shards that have shape (nx, ny, nc).
This means there is a one-to-one mapping from slices to shards, allowing each shard to be written without interfering with the other shards.
Third-party multi-threading#
stack-to-chunk turns off third-party multi-threading in blosc when running.
This allows the n_processes argument to be respected when set to 1, and
prevents issues when stack_to_chunk uses a larger number of parallel processes.
Zarr group layout#
The zarr groups produced by stack-to-chunk contain zarr arrays that are labelled 0, 1, 2, 3… etc.
The array at 0 is the full-resolution dataset, and each subsequent array is downsampled by a factor of \(2^{i}\).