kub-dataset: Full Reference

This page keeps the complete reference documentation for kub-dataset. For a shorter and task-oriented entry point, start at kub-dataset.

The kub-dataset command-line tool allows you to pack, unpack, synchronize, inspect, migrate, and enrich CEMDB location datasets with remote Data Management Platforms (DMPs). It supports multiple backends including Girder and CKAN, can generate weather/IAQ input data from Open-Meteo, and can generate DVC tracking files for version control.

1. Synopsis

kub-dataset [-h] [-v] {pack,unpack,list,init,status,login,logout,whoami,push,pull,delete,list-versions,summary,tag-reference,list-reference,migrate,migrate-layout,migrate-org-layout,migrate-simulator,manifest-show,manifest-regenerate,manifest-verify,manifest-validate,fix,policy,policy-validate,generate,list-dmps,list-locations,list-locations-ckan,list-versions-ckan,pull-ckan,push-ckan,pack-simulator,unpack-simulator,push-simulator,pull-simulator,list-simulators,list-simulator-versions,delete-simulator,copy-version,rename-version,rename-location,set-current,duplicate,push-component,pull-component,list-components,...} ...

1.1. Location Dataset Commands

kub-dataset pack <location> [options]
kub-dataset unpack <archive> [options]
kub-dataset list <archive>
kub-dataset push <location> --version <version> --api-key <key> [--dmp <dmp>]
kub-dataset pull <location> [--version <version>] [--dmp <dmp>] [--dvc]
kub-dataset delete <location> [--version <version>] --api-key <key>
kub-dataset list-versions <location> [--dmp <dmp>]
kub-dataset list-locations [--dmp <dmps>] [--show-versions]
kub-dataset summary [<location> ...] [--all] [--version <version>] [--format {table,json}]
kub-dataset list-dmps
kub-dataset generate {meteo|iaq|both} <location> --start <date|datetime> --end <date|datetime> [--version <version>]
kub-dataset manifest-show <location> [--version <version>]
kub-dataset manifest-regenerate <location> [--version <version>]
kub-dataset manifest-verify <location> [--version <version>]
kub-dataset manifest-validate [manifest|location] [--version <version>] [--cemdb-root <path>] [--strict]
kub-dataset fix <location> [--version <version>] [--policy-file <path>] [--dry-run]
kub-dataset policy list
kub-dataset policy show [policy_ref] [--template <name>|--file <path>]
kub-dataset policy validate [policy_ref] [--template <name>|--file <path>] [-v]
kub-dataset policy apply <location> [--version <version>] [--template <name>|--file <path>] [--dry-run]
kub-dataset policy-validate <policy-file> [-v]
kub-dataset migrate [<location>] [--execute]
kub-dataset migrate-layout [<location>|--all]
kub-dataset migrate-org-layout [<location>|--all] [--organization <org>|--org-map <json>]
kub-dataset copy-version <location> <source-version> <target-version>
kub-dataset rename-version <location> <old-version> <new-version>
kub-dataset rename-location <old-name> <new-name>
kub-dataset set-current <location> <version>
kub-dataset duplicate <location> [<source-version>]
kub-dataset tag-reference <location> <run-id> [--type <single|ensemble>]
kub-dataset list-reference <location> [--type <single|ensemble>]
kub-dataset push-component <location> --version <version> --component <geo|config|inputs|preprocessing|all> --api-key <key>
kub-dataset pull-component <location> --version <version> --component <geo|config|inputs|preprocessing>
kub-dataset list-components <location> --version <version>

1.2. Simulator Dataset Commands

kub-dataset pack-simulator --version <version> [--lod <lod>] [--single]
kub-dataset unpack-simulator <archive> [options]
kub-dataset push-simulator --version <version> [--lod <lod>] --api-key <key>
kub-dataset pull-simulator --version <version> [--lod <lod>] [--dvc]
kub-dataset list-simulators [--local] [--remote] [--version <version>]
kub-dataset delete-simulator --version <version> [--lod <lod>] --api-key <key>

1.3. DVC Commands

kub-dataset init [--path <path>]
kub-dataset status [--path <path>]
kub-dataset login --provider <provider-id>
kub-dataset logout [--provider <provider-id>|--all]
kub-dataset whoami

2. Features

  • Pack/Unpack: Create and extract dataset archives (excludes simulation outputs)

  • Multi-DMP Support: Push/pull location datasets to/from Girder and CKAN

  • Input Generation: Generate hourly weather and IAQ inputs from Open-Meteo

  • Simulator Datasets: Manage FMU building models with LOD-based organization (Girder only)

  • DVC Integration: Initialize DVC, track datasets, and check status

  • Unified Listing: List all datasets across multiple DMPs

  • Version Tracking: Track dataset versions with metadata in .dvc files

3. Description

CEMDB locations contain both input data and simulation outputs. For distribution or CI testing, you typically only need the input data:

  • weather/ - Weather data files

  • air-quality/ - Air quality data files (when available)

  • preprocessing/ - Preprocessed mesh and metadata

  • geo/ - GIS data, mesh files, enrichment data

  • scenarios/ - HVAC scenario configurations

The kub-dataset tool creates archives that exclude:

  • feelppdb/ - Simulation database outputs

  • simulations/ - Simulation results

4. Data Management Platforms (DMPs)

kub-dataset supports multiple data management platforms:

DMP ID Type Description

girder-unistra

Girder

University of Strasbourg Girder instance (default)

ckan-hidalgo2

CKAN

HiDALGO2 CKAN data portal

Use --dmp <dmp-id> to specify which platform to use for push/pull operations.

5. Simulator Datasets

Simulator datasets contain pre-compiled FMU (Functional Mock-up Unit) building models used for energy simulations. Unlike location datasets, simulators are:

  • Girder-only: Not available on CKAN

  • Version-based: Organized by version folders (e.g., v0.1.0)

  • LOD-organized: Contains Level of Detail directories (lod0, lod1) and support directories (others)

5.1. Local Structure

cemdb/simulators/
└── v0.1.0/                         # Version folder
    ├── manifest.json               # Version manifest (describes all directories)
    ├── lod0/                       # LOD 0 building models
    │   ├── App4Walls1Floor1Roof.fmu
    │   ├── App4Walls2Floor1RoofBoiler.fmu
    │   └── ... (60 FMU files)
    ├── lod0.metadata.json          # LOD 0 metadata (building model details)
    ├── others/                     # Support FMU files
    │   └── Sun.fmu
    └── others.metadata.json        # Support files metadata

5.2. Girder Structure

On Girder, simulators are stored as items with individual FMU files:

UrbanBuilding/cemdb/simulators/
└── v0.1.0/
    ├── manifest.json               # Manifest item
    ├── lod0                        # Item containing all LOD0 FMU files
    └── others                      # Item containing support FMU files

5.3. Manifest File

The manifest.json describes the version contents:

{
    "version": "0.1.0",
    "description": "Dataset of physical building models in FMU format",
    "creationDate": "2025-10-22",
    "itemType": "Simulators",
    "maxNumberOfFloors": 10,
    "totalNumberOfBuildingModels": 60,
    "directories": [
        {
            "name": "lod0",
            "type": "lod",
            "lod": 0,
            "description": "Level of Detail 0 building models",
            "count": 60
        },
        {
            "name": "others",
            "type": "support",
            "description": "Supporting FMU files (Sun model, etc.)",
            "files": ["Sun.fmu"]
        }
    ]
}

6. Commands

6.1. DVC Commands

6.1.1. init

Initialize DVC for dataset tracking in the repository.

kub-dataset init [--path <path>]
Options
--path <path>

Path to initialize DVC in. Default: current directory

Example
# Initialize DVC in current directory
kub-dataset init

# Initialize in specific path
kub-dataset init --path /path/to/repo

# With verbose output
kub-dataset -v init

Output:

Initialized DVC in /path/to/repo
Next steps:
  git add .dvc .dvcignore
  git commit -m 'Initialize DVC'

6.1.2. status

Show DVC status and tracked datasets with version information.

kub-dataset status [--path <path>]
Options
--path <path>

Path to check status in. Default: current directory

Example
# Check status
kub-dataset status

# With verbose output
kub-dataset -v status

Output:

DVC Status:
  Initialized: True
  No changes

Tracked datasets:
  - kernante (version: 0.98.0, source: girder-unistra)
  - paris-6km-with-storeys-nd (version: latest, source: girder-unistra)

6.2. Authentication Commands

6.2.1. login

Authenticate with a configured Keycloak provider for DMP operations.

kub-dataset login --provider <provider-id> [--username <user> --password <pass>] [--timeout <seconds>]
Example
# Interactive browser/device login
kub-dataset login --provider hidalgo2

# Non-interactive login
kub-dataset login --provider hidalgo2 --username "$USER" --password "$PASSWORD"

6.2.2. logout

Remove stored credentials for one provider or all providers.

kub-dataset logout [--provider <provider-id>|--all]
Example
# Logout from one provider
kub-dataset logout --provider hidalgo2

# Logout from all providers
kub-dataset logout --all

6.2.3. whoami

Display active authenticated sessions and mapped DMP identities.

kub-dataset whoami [-v]

6.3. Location Dataset Commands

6.3.1. pack

Pack a CEMDB location into a zip archive.

kub-dataset pack <location> [options]
Arguments
location

Name of the location to pack (e.g., kernante, strasbourg)

Options
-o, --output <path>

Output zip file path. Default: <location>_input.zip in current directory

--cemdb-root <path>

Path to cemdb/locations directory. Default: cemdb/locations

--normalize

Auto-normalize invalid dataset names (convert dashes to underscores).

Example
# Pack kernante location
kub-dataset pack kernante --cemdb-root cemdb/locations

# Pack with custom output path
kub-dataset pack kernante -o /tmp/kernante_dataset.zip --cemdb-root cemdb/locations

# Verbose mode showing each file added
kub-dataset -v pack kernante

6.3.2. unpack

Unpack a dataset archive to the CEMDB locations directory.

kub-dataset unpack <archive> [options]
Arguments
archive

Path to the zip file to unpack

Options
--cemdb-root <path>

Path to cemdb/locations directory where the location will be extracted. Default: cemdb/locations

-f, --force

Overwrite existing location if present

Example
# Unpack to default cemdb/locations
kub-dataset unpack kernante_input.zip

# Unpack with custom destination
kub-dataset unpack kernante_input.zip --cemdb-root /data/cemdb/locations

# Force overwrite existing location
kub-dataset unpack kernante_input.zip --force

6.3.3. list

List contents of a dataset archive.

kub-dataset list <archive>
Example
kub-dataset list kernante_input.zip

Output:

kernante/.dataregistry.json
kernante/geo/.dataregistry.json
kernante/geo/gis.json
kernante/preprocessing/preprocessing.json
kernante/scenarios/scenarios.json
kernante/weather/weather-data-0.hourly-variables.csv
...

6.3.4. list-dmps

List all registered Data Management Platforms.

kub-dataset list-dmps [--format {table,json}]
Options
--format {table,json}

Output format. Default: table

Example
kub-dataset list-dmps

Output:

ID                   Name                      Type       Description
------------------------------------------------------------------------------------------
girder-unistra       Girder Unistra            girder     University of Strasbourg Girder instance
ckan-hidalgo2        CKAN HiDALGO2             ckan       HiDALGO2 CKAN data portal
------------------------------------------------------------------------------------------
Total: 2 registered DMPs

6.3.5. list-locations

List all available locations and versions from registered DMPs.

kub-dataset list-locations [--dmp <dmp-ids>] [--format {table,json}] [--show-versions]
Options
--dmp <dmp-ids>

Comma-separated list of DMP IDs to query. Default: all registered DMPs

--api-key <key>

API key (optional for public datasets)

--format {table,json}

Output format. Default: table

--show-versions

Show all available versions for each location (expanded view)

Example
# List all locations from all DMPs
kub-dataset list-locations

# List only from Girder
kub-dataset list-locations --dmp girder-unistra

# List from multiple specific DMPs
kub-dataset list-locations --dmp girder-unistra,ckan-hidalgo2

# Show all versions (expanded view)
kub-dataset list-locations --show-versions

# JSON output for scripting
kub-dataset list-locations --format json

Output (table format):

DMP               Location                            Latest      Versions    Total Size
=======================================================================================
girder-unistra    kernante                            0.98.0             1      0.17 MB
                  lingolsheim                         0.97.0             1      1.24 MB
                  ub_uap_coupling_lingolsheim_district 0.97.0            1     12.45 MB
                  (3 locations)                                                13.86 MB
---------------------------------------------------------------------------------------
ckan-hidalgo2     berlin-district                     1.0.0              2      8.34 MB
                  (1 locations)                                                 8.34 MB
---------------------------------------------------------------------------------------
Total: 4 locations (girder-unistra: 3, ckan-hidalgo2: 1) - 22.20 MB

6.3.6. push

Push a location dataset to a DMP (Girder or CKAN).

kub-dataset push <location> --version <version> --api-key <key> [--dmp <dmp-id>] [options]
Arguments
location

Name of the location to push (e.g., kernante)

Options
--version, -V <version> (required)

Dataset version (e.g., 0.98.0)

--api-key <key> (required)

API key for authentication

--dmp <dmp-id>

DMP to push to. Default: girder-unistra

--cemdb-root <path>

Path to cemdb/locations directory. Default: cemdb/locations

--normalize

Auto-normalize invalid dataset names (convert dashes to underscores).

--create-package

Create CKAN package if it doesn’t exist. Default: True (CKAN only)

--no-create-package

Fail if CKAN package doesn’t exist instead of creating it (CKAN only)

Example
# Push to Girder (default)
kub-dataset push kernante --version 0.98.0 --api-key $GIRDER_API_KEY

# Push to CKAN
kub-dataset push kernante --version 0.98.0 --dmp ckan-hidalgo2 --api-key $CKAN_API_KEY

# Verbose mode
kub-dataset -v push kernante --version 1.0.0 --api-key $GIRDER_API_KEY

6.3.7. pull

Pull a location dataset from a DMP with optional DVC tracking.

kub-dataset pull <location> [--version <version>] [--dmp <dmp-id>] [--dvc] [options]
Arguments
location

Name of the location to pull (e.g., kernante)

Options
--version, -V <version>

Dataset version (e.g., 0.98.0). If omitted, pulls the latest version.

--dmp <dmp-id>

DMP to pull from. Default: girder-unistra

--api-key <key>

API key (optional for public datasets)

--cemdb-root <path>

Path to cemdb/locations directory. Default: cemdb/locations

-f, --force

Overwrite existing location if present

--dvc

Generate DVC tracking file (.dvc) and update .gitignore after pull. This enables reproducible data pipelines.

--normalize

Auto-normalize invalid dataset names (convert dashes to underscores).

Example
# Pull latest from Girder (default)
kub-dataset pull kernante --api-key $GIRDER_API_KEY

# Pull specific version
kub-dataset pull kernante --version 0.98.0 --api-key $GIRDER_API_KEY

# Pull from CKAN
kub-dataset pull berlin-district --dmp ckan-hidalgo2 --api-key $CKAN_API_KEY

# Pull with DVC tracking
kub-dataset pull kernante --dvc --api-key $GIRDER_API_KEY

# Force overwrite and generate DVC tracking
kub-dataset pull kernante --version 0.98.0 --force --dvc --api-key $GIRDER_API_KEY

6.3.8. delete

Delete a location dataset from a DMP (Girder or CKAN).

kub-dataset delete <location> [--version <version>] --api-key <key> [--dmp <dmp-id>]
Arguments
location

Name of the location to delete (e.g., kernante)

Options
--version, -V <version>

Specific version to delete. If omitted, deletes ALL versions.

--dmp <dmp-id>

DMP to delete from. Default: girder-unistra

--api-key <key> (required)

API key for authentication

--yes, -y

Skip confirmation prompt

Example
# Delete specific version from Girder
kub-dataset delete kernante --version 0.98.0 --api-key $GIRDER_API_KEY --yes

# Delete all versions (with confirmation)
kub-dataset delete kernante --api-key $GIRDER_API_KEY

# Delete from CKAN
kub-dataset delete berlin-district --version 1.0.0 --dmp ckan-hidalgo2 --api-key $CKAN_API_KEY --yes

6.3.9. list-versions

List available versions of a location dataset on a DMP.

kub-dataset list-versions <location> [--dmp <dmp-id>] [options]
Arguments
location

Name of the location (e.g., kernante)

Options
--dmp <dmp-id>

DMP to query. Default: girder-unistra

--api-key <key>

API key (optional for public datasets)

Example
# List versions on Girder
kub-dataset list-versions kernante

# List versions on CKAN
kub-dataset list-versions berlin-district --dmp ckan-hidalgo2

Output:

Available versions for kernante on girder-unistra:
  0.98.0           178.8 KB  kernante_input-v0.98.0.zip
  0.97.0           165.2 KB  kernante_input-v0.97.0.zip

6.3.10. summary

Summarize local datasets in cemdb/locations.

This command inspects local locations and reports:

  • resolved version (current/latest or --version)

  • manifest and config availability

  • detected building count

  • detected preprocessing partitions

  • available plans and simulation run counts

kub-dataset summary [<location> ...] [--all] [--version <version>] [--format {table,json}] [--compact]
Arguments
location

Optional location names. If omitted, all local locations are summarized.

Options
--all

Include all local locations (merged with explicit location arguments).

--version <version>

Inspect a specific version (e.g. 0.1.0 or v0.1.0).

--cemdb-root <path>

Path to cemdb/locations. Default: cemdb/locations

--format {table,json}

Output format. Default: table

--compact

Force compact table output even for a single location.

Examples
# Summarize one local dataset
kub-dataset summary kernante --cemdb-root cemdb/locations

# Summarize all local datasets
kub-dataset summary --all --cemdb-root cemdb/locations

# Summarize specific version for multiple datasets in JSON
kub-dataset summary kernante strasbourg --version 0.1.0 --format json
kub-case-summary is deprecated; use kub-dataset summary for versioned cemdb/locations datasets.

6.3.11. generate

Generate location input data from Open-Meteo and write it into the target dataset version directory.

Weather and IAQ outputs are hourly time series.

kub-dataset generate meteo <location> --start <date|datetime> --end <date|datetime> [options]
kub-dataset generate iaq <location> --start <date|datetime> --end <date|datetime> [options]
kub-dataset generate both <location> --start <date|datetime> --end <date|datetime> [options]
Options
--version <version>

Version to target. If omitted, uses current/latest detected version.

--cemdb-root <path>

Path to cemdb/locations. Default: cemdb/locations

--start, --end <date|datetime>

Time window to generate. Supports YYYY-MM-DD and ISO datetime.

--timezone <tz>

Timezone for request and local date interpretation (e.g. Europe/Paris, UTC). Default: auto

--force

Overwrite existing generated files.

--update-manifest / --no-update-manifest

Update manifest.json and checksums after generation (default: enabled).

--output <filename> (for meteo/iaq)

Output filename inside weather/ or air-quality/.

--weather-output <filename>, --iaq-output <filename> (for both)

Separate output filenames for each generated stream.

Notes
  • generate meteo alias: generate weather

  • generate both alias: generate all

  • With naive datetimes (2023-01-01T12:00:00), use explicit --timezone.

Example
# Generate one year of weather
kub-dataset generate meteo strasbourg \
  --version 0.1.0 \
  --start 2023-01-01 \
  --end 2023-12-31 \
  --timezone Europe/Paris \
  --cemdb-root "$PWD/cemdb"

# Generate IAQ with default output name
kub-dataset generate iaq strasbourg \
  --version 0.1.0 \
  --start 2023-01-01 \
  --end 2023-12-31 \
  --timezone Europe/Paris \
  --cemdb-root "$PWD/cemdb"

# Generate both streams in one command
kub-dataset generate both strasbourg \
  --version 0.1.0 \
  --start 2023-01-01 \
  --end 2023-12-31 \
  --timezone Europe/Paris \
  --cemdb-root "$PWD/cemdb"

6.3.12. tag-reference

Tag a simulation run as a validation reference (single or ensemble).

kub-dataset tag-reference <location> <run-id> [--type <single|ensemble>] [--description <text>] [--expected-energy <kWh>]

6.3.13. list-reference

List reference-tagged runs for a location.

kub-dataset list-reference <location> [--type <single|ensemble>] [--format {table,json}]

6.3.14. migrate

Migrate dataset folders from flat layout to versioned layout.

kub-dataset migrate [<location>] [--default-version <X.Y.Z>] [--execute]

6.3.15. migrate-layout

Migrate location data layout from legacy input-dataset/ to geo/.

kub-dataset migrate-layout [<location>|--all] [--dry-run]

6.3.16. migrate-org-layout

Migrate local datasets to organization-aware layout: cemdb/<org>/locations/<location>.

kub-dataset migrate-org-layout [<location>|--all] [--organization <org>|--org-map <json>] [--apply]

6.3.17. migrate-simulator

Migrate simulator manifest structure to current schema conventions.

kub-dataset migrate-simulator [<version>] [--dry-run]

6.3.18. manifest-show

Display manifest metadata for a location version.

kub-dataset manifest-show <location> [--version <version>] [--format {table,json}]

6.3.19. manifest-regenerate

Regenerate manifest.json for a location version (optionally with checksums).

kub-dataset manifest-regenerate <location> [--version <version>] [--no-checksums]

6.3.20. manifest-verify

Verify manifest schema and optionally file checksums.

kub-dataset manifest-verify <location> [--version <version>] [--verify-checksums]

6.3.21. manifest-validate

Validate manifest refs, artifacts, and SHA256 checksums. You can pass an explicit manifest path or a location name (optionally with --version). This is the canonical CLI entry for manifest validation.

kub-dataset manifest-validate [manifest|location] [--version <version>] [--cemdb-root <path>] [--strict]

6.3.22. fix

Auto-fix common dataset issues and write normalized policy/config artifacts for a location version.

kub-dataset fix <location> [--version <version>] [--cemdb-root <path>] [--policy-file <path>] [--dry-run]
Options
--version <version>

Target version. If omitted, uses current or latest local version.

--policy-file <path>

Optional filter policy JSON file (volume, ground_area, precomputed) to apply while fixing.

--filter-volume-min, --filter-volume-max, --filter-ground-area-min, --filter-ground-area-max

Override filter thresholds directly on CLI.

--no-manifest-regenerate

Skip manifest regeneration after applying fixes.

--dry-run

Print planned actions without writing files.

Example
# Preview fixes
kub-dataset fix kernante --version 0.98.0 --dry-run

# Apply fixes with explicit policy file
kub-dataset fix kernante --version 0.98.0 --policy-file cemdb/locations/kernante/v0.98.0/policy/filter_policy.json

6.3.23. policy

Manage filter policy templates and dataset policy artifacts.

kub-dataset policy list
kub-dataset policy show [policy_ref] [--template <name>|--file <path>]
kub-dataset policy validate [policy_ref] [--template <name>|--file <path>] [-v]
kub-dataset policy apply <location> [--version <version>] [--template <name>|--file <path>] [--dry-run]
Examples
# List built-in policy templates
kub-dataset policy list

# Show normalized template payload
kub-dataset policy show legacy-default

# Validate a policy file
kub-dataset policy validate --file cemdb/locations/kernante/v0.98.0/policy/filter_policy.json -v

# Apply a template to a location version
kub-dataset policy apply kernante --version 0.98.0 --template legacy-default

6.3.24. policy-validate

Compatibility alias for kub-dataset policy validate. Prefer kub-dataset policy validate in new scripts.

kub-dataset policy-validate <policy-file> [-v]

6.3.25. copy-version

Copy one location version to another. Alias: cp.

kub-dataset copy-version <location> <source-version> <target-version> [--set-current]

6.3.26. rename-version

Rename a location version folder. Alias: mv.

kub-dataset rename-version <location> <old-version> <new-version>

6.3.27. rename-location

Rename a location and update cem.location.name in config files. Alias: rl.

kub-dataset rename-location <old-name> <new-name>

6.3.28. set-current

Set the current symlink to a specific version. Alias: sc.

kub-dataset set-current <location> <version>

6.3.29. duplicate

Duplicate a version (or current version) with patch increment. Alias: dup.

kub-dataset duplicate <location> [<source-version>] [--set-current]

6.3.30. push-component

Push a single component (browsable item mode, Girder).

kub-dataset push-component <location> --version <version> --component <geo|config|inputs|preprocessing|all> --api-key <key>

6.3.31. pull-component

Pull a single component for a location version (Girder).

kub-dataset pull-component <location> --version <version> --component <geo|config|inputs|preprocessing>

6.3.32. list-components

List available remote components for a location version (Girder).

kub-dataset list-components <location> --version <version> [--api-key <key>]

6.3.33. list-locations-ckan

Legacy compatibility command (deprecated). Use list-locations --dmp ckan-hidalgo2.

kub-dataset list-locations-ckan [--format {table,json}]

6.3.34. list-versions-ckan

Legacy compatibility command (deprecated). Use list-versions <location> --dmp ckan-hidalgo2.

kub-dataset list-versions-ckan <location>

6.3.35. pull-ckan

Legacy compatibility command (deprecated). Use pull <location> --dmp ckan-hidalgo2.

kub-dataset pull-ckan <location> [--version <version>] [--api-key <key>] [--force]

6.3.36. push-ckan

Legacy compatibility command (deprecated). Use push <location> --dmp ckan-hidalgo2.

kub-dataset push-ckan <location> --version <version> --api-key <key>

6.4. Simulator Dataset Commands

Simulator datasets are stored on Girder only and follow a version-based structure with LOD directories.

6.4.1. pack-simulator

Pack simulator directory(ies) into zip file(s).

kub-dataset pack-simulator --version <version> [--lod <lod>] [--single] [options]
Options
--version, -V <version> (required)

Version to pack (e.g., 0.1.0)

--lod, -L <lod>

Directory to pack: LOD number (0, 1, 2) or name (others). If omitted, packs ALL directories (one zip per directory).

--single, -s

Create a single zip containing all directories and manifest.json.

-o, --output <path>

Output zip file path. Default: <dir_name>.zip or simulators-vX.Y.Z.zip with --single

--cemdb-root <path>

Path to cemdb/simulators directory. Default: cemdb/simulators

Example
# Pack all directories (one zip per directory)
kub-dataset pack-simulator --version 0.1.0

# Pack specific LOD directory
kub-dataset pack-simulator --version 0.1.0 --lod 0

# Pack the 'others' directory
kub-dataset pack-simulator --version 0.1.0 --lod others

# Create single unified zip (includes manifest.json)
kub-dataset pack-simulator --version 0.1.0 --single

# Custom output path
kub-dataset pack-simulator --version 0.1.0 --single -o /tmp/simulators.zip

Output:

# With --single
Created: /path/to/simulators-v0.1.0.zip

# Without --single (all directories)
Packing all directories for version 0.1.0: [0, 'others']
  lod0: simulators_lod0-v0.1.0.zip
  others: simulators_others-v0.1.0.zip
Created 2 zip file(s)

6.4.2. unpack-simulator

Unpack a simulator archive to the cemdb/simulators directory. Automatically detects the archive format and routes to the appropriate unpacking logic.

kub-dataset unpack-simulator <archive> [options]
Arguments
archive

Path to the zip file to unpack. Supports two formats:

  • Unified version zip: simulators-vX.Y.Z.zip - Contains all directories and manifest.json

  • Individual directory zip: simulators_<dir>-vX.Y.Z.zip - Contains a single directory (e.g., simulators_lod0-v0.1.0.zip, simulators_others-v0.1.0.zip)

Options
--cemdb-root <path>

Path to cemdb/simulators directory. Default: cemdb/simulators

-f, --force

Overwrite existing directory if present

Example
# Unpack a unified version zip (extracts everything including manifest.json)
kub-dataset unpack-simulator simulators-v0.1.0.zip

# Unpack a single LOD directory
kub-dataset unpack-simulator simulators_lod0-v0.1.0.zip

# Unpack the 'others' directory
kub-dataset unpack-simulator simulators_others-v0.1.0.zip

# Force overwrite
kub-dataset unpack-simulator simulators_lod0-v0.1.0.zip --force
Output
# Unified zip
Unpacked to: cemdb/simulators/v0.1.0 (version: 0.1.0)

# Individual directory zip
Unpacked to: cemdb/simulators/v0.1.0/lod0 (version: 0.1.0, directory: lod0)
When unpacking individual directory zips, manifest.json is NOT included (it only exists in unified zips). To get the manifest, either use the unified zip or pull the manifest from Girder separately.

6.4.3. push-simulator

Push simulator directory(ies) to Girder.

kub-dataset push-simulator --version <version> [--lod <lod>] --api-key <key> [options]
Options
--version, -V <version> (required)

Version to push (e.g., 0.1.0)

--lod, -L <lod>

Directory to push: LOD number (0, 1, 2) or name (others). If omitted, pushes ALL directories and manifest.json.

--api-key <key> (required)

Girder API key for authentication

--cemdb-root <path>

Path to cemdb/simulators directory. Default: cemdb/simulators

Example
# Push all directories (including manifest.json)
kub-dataset push-simulator --version 0.1.0 --api-key $GIRDER_API_KEY

# Push specific LOD only
kub-dataset push-simulator --version 0.1.0 --lod 0 --api-key $GIRDER_API_KEY

# Push 'others' directory
kub-dataset push-simulator --version 0.1.0 --lod others --api-key $GIRDER_API_KEY

# With verbose output
kub-dataset -v push-simulator --version 0.1.0 --api-key $GIRDER_API_KEY

Output:

Pushing all directories for version 0.1.0: [0, 'others']
  lod0: lod0
  others: others
  manifest: manifest.json
Uploaded 3 item(s) to Girder for version 0.1.0

6.4.4. pull-simulator

Pull simulator directories from Girder.

kub-dataset pull-simulator --version <version> [--lod <lod>] [--dvc] [options]
Options
--version, -V <version> (required)

Version to pull (e.g., 0.1.0)

--lod, -L <lod>

Directory to pull: LOD number (0, 1, 2) or name (others). If not specified, pulls all directories.

--api-key <key>

Girder API key (optional for public datasets)

--cemdb-root <path>

Path to cemdb/simulators directory. Default: cemdb/simulators

-f, --force

Overwrite existing directory if present

--dvc

Generate DVC tracking file (.dvc) and update .gitignore after pull.

Example
# Pull all directories for a version
kub-dataset pull-simulator --version 0.1.0

# Pull specific LOD
kub-dataset pull-simulator --version 0.1.0 --lod 0

# Pull with DVC tracking
kub-dataset pull-simulator --version 0.1.0 --dvc

# Force overwrite
kub-dataset pull-simulator --version 0.1.0 --force --dvc

Output:

Pulling all directories for version 0.1.0: ['lod0', 'others']
  Pulled: lod0 -> cemdb/simulators/v0.1.0/lod0
  Pulled: others -> cemdb/simulators/v0.1.0/others
Pulled 2 directory(ies) for version 0.1.0

6.4.5. list-simulators

List available simulator versions (local and/or Girder).

kub-dataset list-simulators [--local] [--remote] [--version <version>] [options]
Options
--version, -V <version>

Filter by specific version (e.g., 0.1.0)

--local

List local versions only (from cemdb/simulators/)

--remote

List remote versions only (from Girder)

--api-key <key>

Girder API key (optional for public datasets)

--format {table,json}

Output format. Default: table

--cemdb-root <path>

Path to cemdb/simulators directory. Default: cemdb/simulators

Example
# List all (local and remote)
kub-dataset list-simulators

# List local versions only
kub-dataset list-simulators --local

# List remote versions only
kub-dataset list-simulators --remote

# Filter by version
kub-dataset list-simulators --version 0.1.0

# JSON output
kub-dataset list-simulators --format json

Output (table format):

Source    Version    Items                           Total Size
=================================================================
local     v0.1.0     lod0, others                    209.5 MB
remote    v0.1.0     lod0, manifest.json, others     209.5 MB
-----------------------------------------------------------------

6.4.6. list-simulator-versions

List available simulator versions on Girder (alias for list-simulators --remote).

kub-dataset list-simulator-versions [--version <version>] [--api-key <key>]
Options
--version, -V <version>

Filter by specific version

--api-key <key>

Girder API key (optional for public datasets)

6.4.7. delete-simulator

Delete a simulator version or specific item from Girder.

kub-dataset delete-simulator --version <version> [--lod <lod>] --api-key <key>
Options
--version, -V <version> (required)

Version to delete from (e.g., 0.1.0)

--lod, -L <lod>

Specific item to delete (e.g., 0, others). If omitted, deletes the entire version.

--api-key <key> (required)

Girder API key for authentication

--yes, -y

Skip confirmation prompt

Example
# Delete entire version (all items and version folder)
kub-dataset delete-simulator --version 0.1.0 --api-key $GIRDER_API_KEY --yes

# Delete specific LOD only
kub-dataset delete-simulator --version 0.1.0 --lod 0 --api-key $GIRDER_API_KEY --yes

# Delete 'others' directory
kub-dataset delete-simulator --version 0.1.0 --lod others --api-key $GIRDER_API_KEY --yes

7. DVC Integration

kub-dataset provides built-in DVC (Data Version Control) support for reproducible data pipelines.

7.1. Quick Start

# 1. Initialize DVC (one-time setup)
kub-dataset init

# 2. Pull dataset with DVC tracking
kub-dataset pull kernante --version 0.98.0 --dvc

# 3. Check tracked datasets
kub-dataset status

# 4. Commit to git
git add .dvc .dvcignore cemdb/locations/kernante.dvc cemdb/locations/.gitignore
git commit -m "Track kernante v0.98.0 with DVC"

7.2. How It Works

The --dvc flag enables DVC tracking for pulled datasets by:

  1. Computing MD5 checksums of the unpacked dataset

  2. Generating a .dvc tracking file with version metadata

  3. Adding the data directory to .gitignore

7.3. Generated .dvc File

When using --dvc, a tracking file is created:

# DVC tracking file for kernante
# Generated by kub-dataset
# DO NOT EDIT - managed by kub-dataset pull --dvc
outs:
- md5: a1b2c3d4e5f6...
  path: kernante
  hash: md5
  size: 183456
  nfiles: 42
meta:
  source: girder-unistra
  version: 0.98.0
  tool: kub-dataset

The meta section stores:

  • source: The DMP where the dataset was pulled from

  • version: The exact version of the dataset

  • tool: Identifies kub-dataset as the tracking tool

7.4. Complete DVC Workflow

# 1. Initialize DVC in the repository (one-time)
kub-dataset init
git add .dvc .dvcignore
git commit -m "Initialize DVC"

# 2. Pull dataset with DVC tracking
kub-dataset pull kernante --version 0.98.0 --dvc

# 3. Verify tracking
kub-dataset status
# Output:
# Tracked datasets:
#   - kernante (version: 0.98.0, source: girder-unistra)

# 4. Commit the .dvc file to git
git add cemdb/locations/kernante.dvc cemdb/locations/.gitignore
git commit -m "Track kernante v0.98.0"

# 5. Later, anyone can reproduce the exact dataset
kub-dataset pull kernante --version 0.98.0 --dvc --force

7.5. Updating Dataset Version

# Update to a new version
kub-dataset pull kernante --version 1.0.0 --dvc --force

# Check the updated version
kub-dataset status
# Output:
# Tracked datasets:
#   - kernante (version: 1.0.0, source: girder-unistra)

# Commit the change
git add cemdb/locations/kernante.dvc
git commit -m "Update kernante to v1.0.0"

8. Global Options

-v, --verbose

Print verbose output showing detailed progress:

[pull] DMP: girder-unistra (girder)
[pull] Location: kernante
[pull] Version: 0.98.0
[pull] Target: cemdb/locations
Connecting to Girder at https://girder.math.unistra.fr/api/v1...
Found dataset: kernante_input-v0.98.0.zip
Downloading item: kernante_input-v0.98.0.zip...
Unpacking to cemdb/locations...
  Extracted: kernante/.dataregistry.json
  Extracted: kernante/geo/gis.json
...
-h, --help

Show help message and exit

9. Archive Structure

The created archive preserves the location directory structure:

<location>/
├── .dataregistry.json
├── kub-cem-sim-config.cfg
├── geo/
│   ├── .dataregistry.json
│   ├── gis.json
│   ├── mesh_0_Lod0.msh
│   ├── mesh_1_Lod1.msh
│   ├── enriched.jsonl.gz
│   └── geosetup/
│       └── geographicdata_setup.json
├── preprocessing/
│   ├── preprocessing.json
│   └── partitioning/
│       ├── np_1/
│       ├── np_2/
│       ├── np_4/
│       └── ...
├── scenarios/
│   ├── scenarios.json
│   └── scenarioHVAC_*.csv
├── weather/
│   ├── weather-data-0.hourly-variables.csv
│   └── weather-data.json
└── air-quality/                              # Optional
    └── air-quality-data-0.hourly-variables.csv

10. Use Cases

10.1. CI/CD Pipeline Integration

Pull datasets automatically in CI workflows:

# GitHub Actions example
- name: Install kub-dataset
  run: pip install feelpp-ktirio-urban-building

- name: Pull dataset
  run: |
    kub-dataset pull kernante \
      --cemdb-root ./cemdb/locations \
      --api-key "${{ secrets.GIRDER_API_KEY }}" \
      --force -v

10.2. Setting Up a New Development Environment

# Clone repository
git clone https://github.com/feelpp/ktirio-urban-building.git
cd ktirio-urban-building

# List available datasets
kub-dataset list-locations

# Pull required datasets with DVC tracking
kub-dataset pull kernante --dvc --force
kub-dataset pull lingolsheim --dvc --force

# Verify downloaded data
ls -la cemdb/locations/

10.3. Publishing a New Dataset Version

# Verify local dataset is ready
ls cemdb/locations/kernante/

# Push new version to Girder
kub-dataset -v push kernante \
  --version 1.0.0 \
  --api-key $GIRDER_API_KEY \
  --cemdb-root cemdb/locations

# Verify it's available
kub-dataset list-versions kernante

10.4. Migrating Datasets Between DMPs

# Pull from Girder
kub-dataset pull kernante \
  --dmp girder-unistra \
  --api-key $GIRDER_API_KEY \
  --cemdb-root ./temp

# Push to CKAN
kub-dataset push kernante \
  --version 1.0.0 \
  --dmp ckan-hidalgo2 \
  --api-key $CKAN_API_KEY \
  --cemdb-root ./temp

10.5. Reproducible Research Workflow

Use DVC tracking for reproducible experiments:

# Initial setup: pull with DVC tracking
kub-dataset pull kernante --version 0.98.0 --dvc

# Commit the .dvc file
git add cemdb/locations/kernante.dvc
git commit -m "Track kernante v0.98.0"

# Later: reproduce the exact dataset
# (read version from .dvc file's meta section)
cat cemdb/locations/kernante.dvc | grep version

10.6. Creating Test Datasets for Distribution

# Pack location for distribution
kub-dataset -v pack kernante \
  --cemdb-root cemdb/locations \
  -o kernante_test_input.zip

# Verify archive contents
kub-dataset list kernante_test_input.zip | head -20

# Share or upload the archive
# Recipients can unpack with:
# kub-dataset unpack kernante_test_input.zip --force

10.7. Querying Multiple DMPs

# Get overview of all available data
kub-dataset list-locations --show-versions

# Export to JSON for processing
kub-dataset list-locations --format json > datasets.json

# Query specific DMPs with verbose output
kub-dataset -v list-locations --dmp girder-unistra

10.8. Working with Simulator Datasets

10.8.1. Setting Up Simulators for Development

# List available simulator versions
kub-dataset list-simulators

# Pull simulators with DVC tracking
kub-dataset pull-simulator --version 0.1.0 --dvc --force

# Verify downloaded files
ls cemdb/simulators/v0.1.0/

10.8.2. Publishing New Simulator Version

# Prepare local simulator files
ls cemdb/simulators/v0.2.0/
# manifest.json, lod0/, lod1/, others/

# Push all directories to Girder
kub-dataset push-simulator --version 0.2.0 --api-key $GIRDER_API_KEY

# Verify upload
kub-dataset list-simulators --remote

10.8.3. CI/CD with Simulators

Pull simulators in CI pipelines:

# GitHub Actions example
- name: Pull simulators
  run: |
    kub-dataset pull-simulator \
      --version 0.1.0 \
      --force \
      --api-key "${{ secrets.GIRDER_API_KEY }}"

10.8.4. Creating Simulator Distribution Package

# Create single zip with all FMUs and manifest
kub-dataset pack-simulator --version 0.1.0 --single

# Result: simulators-v0.1.0.zip containing:
# - manifest.json
# - lod0/*.fmu (60 files)
# - lod0.metadata.json
# - others/Sun.fmu
# - others.metadata.json

10.8.5. Updating Simulator Version

# Check current version
kub-dataset list-simulators --local

# Pull new version
kub-dataset pull-simulator --version 0.2.0 --dvc --force

# Commit DVC tracking
git add cemdb/simulators/v0.2.0/lod0.dvc
git add cemdb/simulators/v0.2.0/others.dvc
git commit -m "Update simulators to v0.2.0"

11. Environment Variables

The following environment variables can be used:

GIRDER_API_KEY

Girder API key for authentication (can be passed to --api-key)

CKAN_API_KEY

CKAN API key for authentication

12. Python API

kub-dataset can also be used programmatically.

12.1. Location Dataset API

from feelpp.ktirio.ub.dataset import (
    pack_dataset,
    unpack_dataset,
    pull_dataset_girder,
    pull_dataset_ckan,
    push_dataset_girder,
    delete_dataset_girder,
    list_locations_all,
    list_versions_girder,
    generate_dvc_file,
    init_dvc,
    status_dvc,
)

# Initialize DVC
init_dvc(verbose=True)

# Pack a location
zip_path = pack_dataset("kernante", cemdb_root="cemdb/locations")

# Pull from Girder
location_path = pull_dataset_girder(
    "kernante",
    version="0.98.0",
    api_key="your-api-key",
    cemdb_root="cemdb/locations",
)

# Generate DVC tracking
dvc_path = generate_dvc_file(
    location_path,
    dmp_id="girder-unistra",
    version="0.98.0",
)

# Check DVC status
result = status_dvc()
print(f"Initialized: {result['initialized']}")
print(f"Status: {result['status']}")

# List all locations
locations = list_locations_all(verbose=True)
for loc in locations:
    print(f"{loc['dmp']}: {loc['name']} v{loc['latest_version']}")

12.2. Simulator Dataset API

from feelpp.ktirio.ub.dataset import (
    # Pack operations
    pack_simulator_lod,
    pack_simulator_version,
    unpack_simulator_lod,
    unpack_simulator_version,
    list_simulator_lods_local,
    list_simulator_versions_local,
    # DMP operations (Girder only)
    push_simulator_lod_girder,
    push_simulator_manifest_girder,
    pull_simulator_lod_girder,
    list_simulator_lods_girder,
    delete_simulator_lod_girder,
)

# Pack a specific LOD
zip_path = pack_simulator_lod(
    version="0.1.0",
    lod=0,  # Can also be 'others'
    cemdb_root="cemdb/simulators",
)

# Pack entire version (single zip with manifest)
zip_path = pack_simulator_version(
    version="0.1.0",
    cemdb_root="cemdb/simulators",
)

# Unpack a single directory zip (lod0 or others)
dir_path, version, dir_name = unpack_simulator_lod(
    "simulators_lod0-v0.1.0.zip",
    cemdb_root="cemdb/simulators",
)
print(f"Unpacked {dir_name} to {dir_path}")

# Unpack a unified version zip (all directories + manifest)
version_path, version = unpack_simulator_version(
    "simulators-v0.1.0.zip",
    cemdb_root="cemdb/simulators",
)
print(f"Unpacked version {version} to {version_path}")

# List local versions
versions = list_simulator_versions_local(cemdb_root="cemdb/simulators")
print(f"Local versions: {versions}")

# List directories in a version
dirs = list_simulator_lods_local("0.1.0", cemdb_root="cemdb/simulators")
print(f"Directories: {dirs}")  # [0, 'others']

# Pull from Girder
dir_path, version, dir_name = pull_simulator_lod_girder(
    lod=0,
    version="0.1.0",
    api_key="your-api-key",
    cemdb_root="cemdb/simulators",
)
print(f"Pulled {dir_name} to {dir_path}")

# Push to Girder
result = push_simulator_lod_girder(
    version="0.1.0",
    lod=0,
    api_key="your-api-key",
    cemdb_root="cemdb/simulators",
)
print(f"Uploaded: {result['item_name']}")

# Push manifest
result = push_simulator_manifest_girder(
    version="0.1.0",
    api_key="your-api-key",
    cemdb_root="cemdb/simulators",
)

# List remote versions
remote_data = list_simulator_lods_girder(verbose=True)
for version_info in remote_data:
    print(f"v{version_info['version']}: {version_info['item_count']} items")

13. Exit Codes

Code Description

0

Success

1

Error (file not found, permission denied, invalid archive, network error, etc.)

2

Invalid command-line arguments

The feelpp.ktirio.ub.dataset module includes additional submodules for dataset generation and external data sources:

14.1. generators

Dataset generation utilities for creating reference test cases.

from feelpp.ktirio.ub.dataset.generators import bestest

# Generate BESTEST reference cases
bestest.generate_bestest_dataset(output_dir="cemdb/locations/bestest")

See also: kub-bestest CLI tool for generating ASHRAE 140-2020 BESTEST reference cases.

14.2. sources

External data sources for dataset preparation and simulation support.

from feelpp.ktirio.ub.dataset.sources import openmeteo, events

# Fetch IAQ data for simulation
from feelpp.ktirio.ub.dataset.sources.openmeteo import fetch_iaq_for_simulation

# Fetch historical weather for event scenarios
from feelpp.ktirio.ub.dataset.sources.events import fetch_event_to_location

fetch_event_to_location(
    location_root="cemdb/locations/notre_dame/v0.1.0",
    event_name="fire-2019-04-15",
    lat=48.853, lon=2.349,
    start="2019-04-15", end="2019-04-16",
)

See also: kub-event-weather CLI tool for fetching historical weather data.

15. See Also