-
-
Notifications
You must be signed in to change notification settings - Fork 346
Add CLI for converting v2 metadata to v3 #3257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…sting a zarr version greater than 3
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3257 +/- ##
==========================================
+ Coverage 94.62% 94.69% +0.06%
==========================================
Files 78 80 +2
Lines 8690 8817 +127
==========================================
+ Hits 8223 8349 +126
- Misses 467 468 +1
🚀 New features to boost your workflow:
|
this is awesome! A few high-level suggestions:
|
Thanks for the comments @d-v-b !
|
Attempting to clean up on failure is a good idea, but that can also fail. It would still be possible for the program to be interrupted while
There might actually be situations where people want the v2 and v3 metadata in the same hierarchy, and this is not against the rules of the spec. So I'd say that creating v3 metadata should be a separate (and precedent) operation from deleting the v2 metadata.
i.e., taking a v2 hierarchy from foo.zarr/bar, and creating a v3 hierarchy at foo.zarr/bar. Thus I think
i.e., resaving the hierarchy somewhere else, should also be supported. For now users would be responsible for copying chunks in this case.
that seems right, but instead of And a broader request: I am sure that while working on this kind of low-level tool, you find tasks that should be direct, but are made difficult by design choices in zarr-python, or the lack of low-level features. If you have the patience for it, please keep track of the stuff we could change in the core library to make this kind of tool easier to write, and feel free to open issues to track these features. |
Ok - that all sounds good. How about the following for an updated CLI: The base command is
I think it's probably worth also keeping the |
This design would make the in-place conversion the default, and writing the hierarchy to a new path would be an opt-in kind of thing. Speaking for myself, I think I would expect the convert-to-somewhere-else mode to be the default, and the in-place conversion to be opt-in. It might be helpful to sample opinions from other folks here (cc @zarr-developers/python-core-devs). In any case this is a minor thing.
That makes perfect sense. Since "clear" is kind of ambiguous on its own, and since the |
also, in case anyone wants to try this tool out, here's a 1-liner with uvx:
|
For #1798
Adds a CLI using
typer
to convert v2 metadata (.zarray
/.zattrs
...) to v3 metadatazarr.json
.To test, you will need to install the new optional cli dependency e.g.
pip install -e ".[remote,cli]"
This should make the
zarr-converter
command available e.g. try:convert
addszarr.json
files to every group / array, leaving the v2 metadata as-is. A zarr with both sets of metadata can still be opened withzarr.open
, but will give a UserWarning:Both zarr.json (Zarr format 3) and .zarray (Zarr format 2) metadata objects exist... Zarr v3 will be used.
. This can be avoided by passingzarr_format=3
tozarr.open
, or by using theclear
command to remove the v2 metadata.clear
can also remove v3 metadata. This is useful if the conversion fails part way through e.g. if one of the arrays uses a codec with no v3 equivalent.All code for the cli is in
src/zarr/core/metadata/converter/cli.py
, with the actual conversion functions insrc/zarr/core/metadata/converter/converter_v2_v3.py
. These functions can be called directly, for those who don't want to use the CLI (although currently they are part of/core
which is considered private API, so it may be best to move them elsewhere in the package).Some points to consider:
set_path
fromtest_dtype_registry.py
andtest_codec_entrypoints.py
, as they were causing the CLI tests to fail if they were run after. This seems to be due to thelazy_load_list
of the numcodecs codecs registries being cleared, meaning they were no longer available in my code which finds thenumcodecs.zarr3
equivalent of a numcodecs codec.TODO:
docs/user-guide/*.rst
changes/