tom/qubed

2025-03-27 18:30:12 +00:00

1.8 KiB

Raw Blame History

jupytext

text_representation

extension	format_name	format_version	jupytext_version
.md	myst	0.13	1.16.4

Qubed

:maxdepth: 1
quickstart.md
background.md
algorithms.md
fiab.md
cmd.md

Qubed provides a datastructure called a Qube which represents sets of data identified by multiple key value pairs as a tree of datacubes. To understand what that means go to Background, to just start using the library skip straight to the Quickstart.

Here's a real world dataset from the Climate DT:

import requests
from qubed import Qube
climate_dt = Qube.from_json(requests.get("https://github.com/ecmwf/qubed/raw/refs/heads/main/tests/example_qubes/climate_dt.json").json())
climate_dt.html(depth=1)

Click the arrows to expand and drill down deeper into the data. Any particular dataset is uniquely identified by a set of key value pairs:

import json
for i, identifier in enumerate(climate_dt.leaves()):
    print(identifier)
    break

Here's an idea of the set of values each key can take:

axes = climate_dt.axes()
for key, values in axes.items():
    print(f"{key} : {list(sorted(values))[:10]}")

This dataset isn't dense, you can't choose any combination of the above key values pairs, but it does contain many dense datacubes. Hence it makes sense to store and process the set as a tree of dense datacubes, what we call a Qube. For a sense of scale, this dataset contains about 200 million distinct datasets but only contains a few thousand unique nodes.

print(f"""
Distinct datasets: {climate_dt.n_leaves},
Number of nodes in the tree: {climate_dt.n_nodes}
""")

1.8 KiB Raw Blame History

Qubed

1.8 KiB

Raw Blame History