Add more docs

This commit is contained in:
Tom 2025-02-13 16:04:20 +00:00
parent dca1e628df
commit adeccec4e5
6 changed files with 86 additions and 5 deletions

View File

@ -1,4 +1,12 @@
# Qube Algorithms
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.16.4
---
# Under the Hood
## Set Operations
@ -55,16 +63,17 @@ This structure means that node.values can take different types, the two most use
Qube.fused_set_operations can dispatch on the two types given in order to efficiently compute set/set, set/range and range/range intersection operations.
### Performance considerations
### Performance considerations
This algorithm is quadratic in the number of matching keys, this means that if we have a level with a huge number of nodes with key 'date' and range types (since range types are currently restricted to being contiguous) we could end up with a quadtratic slow down.
There are some ways this can be sped up:
* Once we know any of just_A, intersection or just_B are empty we can discard them. Only for quite pathological inputs (many enums sparse enums with a lot of overlap) would you actually get quadratically many non-empty terms.
* For ranges intersected with ranges, we could speed the algorithm up significantly by sorting the ranges and walking the two lists in tandem which reduces it to linear in the number of ranges.
* Once we know any of just_A, intersection or just_B are empty we can discard them. Only for quite pathological inputs (many enums sparse enums with a lot of overlap) would you actually get quadratically many non-empty terms.
* If we have N_A and N_B nodes to compare between the two trees we have N_A*N_B comparisons to do. However if at the end of the day we're just trying to determine for each value whether it's in A, B or both. If N_A*N_B >> M the number of value s we might be able to switch to an alternative algorithm.
* For ranges intersected with ranges, we could speed the algorithm up significantly by sorting the ranges and walking the two lists in tandem which reduces it to linear in the number of ranges.
* If we have N_A and N_B nodes to compare between the two trees we have N_A*N_B comparisons to do. However if at the end of the day we're just trying to determine for each value whether it's in A, B or both. If N_A*N_B >> M the number of value s we might be able to switch to an alternative algorithm.
## Compression

14
docs/api.md Normal file
View File

@ -0,0 +1,14 @@
# API
## Set Operations
```{code-cell} python3
from qubed import Qube
A = Qube.from_dict({
"a=1": {"b": {1, 2, 3}, "c": {1}},
"a=2": {"b": {1, 2, 3}, "c": {1}},
})
A
```

8
docs/development.md Normal file
View File

@ -0,0 +1,8 @@
# Development
To build the develop branch from source install a rust toolchain and pip install maturin then run:
```
git clone -b develop git@github.com:ecmwf/qubed.git
cd qubed
maturin develop
```

View File

@ -10,6 +10,10 @@ jupytext:
# Qubed
```{toctree}
:maxdepth: 1
quickstart.md
api.md
development.md
algorithms.md
```

45
docs/quickstart.md Normal file
View File

@ -0,0 +1,45 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.16.4
---
# Quickstart
## Installation
```
pip install qubed
```
## Usage
Make an uncompressed qube:
```{code-cell} python3
from qubed import Qube
q = Qube.from_dict({
"class=od" : {
"expver=0001": {"param=1":{}, "param=2":{}},
"expver=0002": {"param=1":{}, "param=2":{}},
},
"class=rd" : {
"expver=0001": {"param=1":{}, "param=2":{}, "param=3":{}},
"expver=0002": {"param=1":{}, "param=2":{}},
},
})
q
```
Compress the qube:
```{code-cell} python3
q.compress()
```
Load some example qubes:
```{code-cell} python3
### Set Operations

File diff suppressed because one or more lines are too long