qubed/docs/index.md
2025-02-26 09:11:30 +00:00

57 lines
1.8 KiB
Markdown

---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.16.4
---
# Qubed
```{toctree}
:maxdepth: 1
background.md
quickstart.md
api.md
development.md
algorithms.md
fiab.md
```
Qubed provides a datastructure called a Qube which represents sets of data identified by multiple key value pairs as a tree of datacubes. To understand what that means go to [Background](background.md), to just start using the library skip straight to the [Quickstart](quickstart.md).
Here's a real world dataset from the [Climate DT](https://destine.ecmwf.int/climate-change-adaptation-digital-twin-climate-dt/):
```{code-cell} python3
import requests
from qubed import Qube
climate_dt = Qube.from_json(requests.get("https://github.com/ecmwf/qubed/raw/refs/heads/main/tests/example_qubes/climate_dt.json").json())
climate_dt.html(depth=1)
```
Click the arrows to expand and drill down deeper into the data. Any particular dataset is uniquely identified by a set of key value pairs:
```{code-cell} python3
import json
for i, identifier in enumerate(climate_dt.leaves()):
print(identifier)
break
```
Here's an idea of the set of values each key can take:
```{code-cell} python3
axes = climate_dt.axes()
for key, values in axes.items():
print(f"{key} : {list(sorted(values))[:10]}")
```
This dataset isn't dense, you can't choose any combination of the above key values pairs, but it does contain many dense datacubes. Hence it makes sense to store and process the set as a tree of dense datacubes, what we call a Qube. For a sense of scale, this dataset contains about 200 million distinct datasets but only contains a few thousand unique nodes.
```{code-cell} python3
print(f"""
Distinct datasets: {climate_dt.n_leaves},
Number of nodes in the tree: {climate_dt.n_nodes}
""")
```