Tutorial#

This page steps through going from a set of 2D image files to a 3D chunked zarr dataset.

To get this tutorial running, you will need to install stack-to-chunk, matplotlib, skimage, tifffile.

import pathlib
import sys
import tempfile

import matplotlib.pyplot as plt
import skimage.color
import skimage.data
import tifffile
import zarr
from dask.array.image import imread
from loguru import logger
from pydantic_zarr.v3 import ArraySpec

import stack_to_chunk

Generating sample data#

We’ll start by generating a set of sample data to downsample. To do this we’ll just save 35 copies of a grayscale cat to a temporary directory.

data_2d = skimage.color.rgb2gray(skimage.data.cat())
temp_dir = tempfile.TemporaryDirectory()
temp_dir_path = pathlib.Path(temp_dir.name)
slice_dir = temp_dir_path / "slices"
slice_dir.mkdir()

for i in range(35):
    tifffile.imwrite(slice_dir / f"{str(i).zfill(3)}.tif", data_2d.T)

plt.imshow(data_2d, cmap="gray")
tutorial
<matplotlib.image.AxesImage object at 0x7d685833d670>

Setting up input#

stack-to-chunk takes a 3D dask array as input. dask provides an interface to lazily load each slice as and when it’s needed. So although the dask array we create looks and behaves like an array, no data is actually read in from the TIFF files at this point.

This also makes stack-to-chunk flexible - as long as you can put your 2D images into a 3D dask array, they can be used with stack-to-chunk.

For this tutorial, dask.array.image.imread provides a convenient way for us to read in all our TIFF files:

images = imread(str(slice_dir / "*.tif")).T
print(images)
dask.array<transpose, shape=(300, 451, 35), dtype=float64, chunksize=(300, 451, 1), chunktype=numpy.ndarray>
A few things to note here:
  • We have a single 3D dask array, with the array axes being the x, y, z axes of the image.

  • The chunk size of the dask array is (nx, ny, 1) ie each individual slice (corresponding) to each individual file on disk) is a chunk in the dask array.

Running stack-to-chunk#

The starting point for running stack-to-chunk is creating a MultiScaleGroup. This represents a local zarr group that will contain the output multi-scale dataset.

Once we’ve created it, the levels property shows that no levels have been added to the group yet.

We’ll also enable logging here, so we can see that stack-to-chunk provides some useful progress messages:

logger.enable("stack_to_chunk")
logger.add(sys.stdout, level="INFO")

group = stack_to_chunk.MultiScaleGroup(
    temp_dir_path / "chunked.ome.zarr",
    name="my_zarr_group",
    spatial_unit="centimeter",
    voxel_size=(3, 4, 5),
    array_spec=ArraySpec.from_zarr(
        zarr.empty(images.shape, chunks=(16, 16, 16), dimension_names=("z", "y", "x"))
    ),
)
print(group.levels)
2026-05-03 07:02:34.048 | INFO     | stack_to_chunk.main:_add_sharding_codec:174 - Adding sharding codec with shard shape (304, 464, 16)
[0]

The first step in creating new data in the group is to make a copy of the data slices without any downsampling. Before doing this lets do a quick check of how much memory each process will take up when we run stack-to-chunk:

bytes_per_process = stack_to_chunk.memory_per_slab_process(images, chunk_size=16)
print(f"Each process will use {bytes_per_process / 1e6:.1f} MB")
Each process will use 17.3 MB

And finally, lets create our first data copy:

group.add_full_res_data(images, n_processes=1)
2026-05-03 07:02:34.062 | INFO     | stack_to_chunk.main:add_full_res_data:291 - Setting up copy to zarr...
2026-05-03 07:02:34.064 | INFO     | stack_to_chunk.main:add_full_res_data:296 - Each process will read ~17.32 MB into memory
2026-05-03 07:02:34.072 | INFO     | stack_to_chunk.main:add_full_res_data:314 - Starting full resolution copy to zarr...
2026-05-03 07:02:34.073 | INFO     | stack_to_chunk._array_helpers:_copy_slab:28 - Reading z=0 -> 15
2026-05-03 07:02:34.073 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=0
2026-05-03 07:02:34.082 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=1
2026-05-03 07:02:34.085 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=2
2026-05-03 07:02:34.089 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=3
2026-05-03 07:02:34.092 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=4
2026-05-03 07:02:34.095 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=5
2026-05-03 07:02:34.099 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=6
2026-05-03 07:02:34.102 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=7
2026-05-03 07:02:34.105 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=8
2026-05-03 07:02:34.108 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=9
2026-05-03 07:02:34.111 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=10
2026-05-03 07:02:34.115 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=11
2026-05-03 07:02:34.118 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=12
2026-05-03 07:02:34.121 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=13
2026-05-03 07:02:34.124 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=14
2026-05-03 07:02:34.127 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=15
2026-05-03 07:02:34.131 | INFO     | stack_to_chunk._array_helpers:_copy_slab:34 - Writing z=0 -> 15
2026-05-03 07:02:34.177 | INFO     | stack_to_chunk._array_helpers:_copy_slab:38 - Finished copying z=0 -> 15
2026-05-03 07:02:34.178 | INFO     | stack_to_chunk._array_helpers:_copy_slab:28 - Reading z=16 -> 31
2026-05-03 07:02:34.178 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=16
2026-05-03 07:02:34.184 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=17
2026-05-03 07:02:34.188 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=18
2026-05-03 07:02:34.191 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=19
2026-05-03 07:02:34.195 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=20
2026-05-03 07:02:34.198 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=21
2026-05-03 07:02:34.201 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=22
2026-05-03 07:02:34.205 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=23
2026-05-03 07:02:34.208 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=24
2026-05-03 07:02:34.211 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=25
2026-05-03 07:02:34.215 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=26
2026-05-03 07:02:34.218 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=27
2026-05-03 07:02:34.221 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=28
2026-05-03 07:02:34.224 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=29
2026-05-03 07:02:34.227 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=30
2026-05-03 07:02:34.231 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=31
2026-05-03 07:02:34.234 | INFO     | stack_to_chunk._array_helpers:_copy_slab:34 - Writing z=16 -> 31
2026-05-03 07:02:34.263 | INFO     | stack_to_chunk._array_helpers:_copy_slab:38 - Finished copying z=16 -> 31
2026-05-03 07:02:34.263 | INFO     | stack_to_chunk._array_helpers:_copy_slab:28 - Reading z=32 -> 34
2026-05-03 07:02:34.263 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=32
2026-05-03 07:02:34.267 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=33
2026-05-03 07:02:34.270 | INFO     | stack_to_chunk._array_helpers:_copy_slab:31 - Reading z=34
2026-05-03 07:02:34.273 | INFO     | stack_to_chunk._array_helpers:_copy_slab:34 - Writing z=32 -> 34
2026-05-03 07:02:34.307 | INFO     | stack_to_chunk._array_helpers:_copy_slab:38 - Finished copying z=32 -> 34
2026-05-03 07:02:34.307 | INFO     | stack_to_chunk.main:add_full_res_data:322 - Finished full resolution copy to zarr.

The levels property can be inspected to show we’ve added the first level. Ekach level is downsampled by a factor of 2**level, so level 0 is downsampled by a factor of 1, which is just a copy of the original data (as expected).

print(group.levels)
[0]

Now lets add some downsampling levels:

group.add_downsample_level(1, n_processes=1)
group.add_downsample_level(2, n_processes=1)
group.add_downsample_level(3, n_processes=1)
print(group.levels)
2026-05-03 07:02:34.310 | INFO     | stack_to_chunk.main:add_downsample_level:362 - Downsampling to level 1 with n_processes=1
2026-05-03 07:02:34.316 | INFO     | stack_to_chunk.main:add_downsample_level:423 - Starting downsampling from level 0 > 1...
2026-05-03 07:02:34.316 | INFO     | stack_to_chunk.main:add_downsample_level:428 - Launching 8 jobs
[Parallel(n_jobs=1)]: Done 1 out of 8 | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done 4 out of 8 | elapsed:    0.1s
[Parallel(n_jobs=1)]: Done 7 out of 8 | elapsed:    0.2s
[Parallel(n_jobs=1)]: Done 8 out of 8 | elapsed:    0.2s finished
2026-05-03 07:02:34.481 | INFO     | stack_to_chunk.main:add_downsample_level:433 - Finished downsampling from level 0 > 1
2026-05-03 07:02:34.481 | INFO     | stack_to_chunk.main:add_downsample_level:362 - Downsampling to level 2 with n_processes=1
2026-05-03 07:02:34.489 | INFO     | stack_to_chunk.main:add_downsample_level:423 - Starting downsampling from level 1 > 2...
2026-05-03 07:02:34.489 | INFO     | stack_to_chunk.main:add_downsample_level:428 - Launching 4 jobs
[Parallel(n_jobs=1)]: Done 1 out of 4 | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed:    0.0s finished
2026-05-03 07:02:34.533 | INFO     | stack_to_chunk.main:add_downsample_level:433 - Finished downsampling from level 1 > 2
2026-05-03 07:02:34.533 | INFO     | stack_to_chunk.main:add_downsample_level:362 - Downsampling to level 3 with n_processes=1
2026-05-03 07:02:34.541 | INFO     | stack_to_chunk.main:add_downsample_level:423 - Starting downsampling from level 2 > 3...
2026-05-03 07:02:34.541 | INFO     | stack_to_chunk.main:add_downsample_level:428 - Launching 4 jobs
[Parallel(n_jobs=1)]: Done 1 out of 4 | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed:    0.0s finished
2026-05-03 07:02:34.573 | INFO     | stack_to_chunk.main:add_downsample_level:433 - Finished downsampling from level 2 > 3
[0, 1, 2, 3]

The downsampled data can be accessed as zarr.Array objects by indexing group. As an example, lets plot the third downsampled level:

plt.imshow(group[3][:, :, 0], cmap="gray")
tutorial
<matplotlib.image.AxesImage object at 0x7d685840f800>

Cleanup#

Finally we need to clean up the temporary directory we made earlier.

Total running time of the script: (0 minutes 0.803 seconds)

Gallery generated by Sphinx-Gallery