kub-dataset: Full Reference
This page keeps the complete reference documentation for kub-dataset.
For a shorter and task-oriented entry point, start at kub-dataset.
The kub-dataset command-line tool allows you to pack, unpack, synchronize, inspect, migrate, and enrich CEMDB location datasets with remote Data Management Platforms (DMPs).
It supports multiple backends including Girder and CKAN, can generate weather/IAQ input data from Open-Meteo, and can generate DVC tracking files for version control.
1. Synopsis
kub-dataset [-h] [-v] {pack,unpack,list,init,status,login,logout,whoami,push,pull,delete,list-versions,summary,tag-reference,list-reference,migrate,migrate-layout,migrate-org-layout,migrate-simulator,manifest-show,manifest-regenerate,manifest-verify,manifest-validate,fix,policy,policy-validate,generate,list-dmps,list-locations,list-locations-ckan,list-versions-ckan,pull-ckan,push-ckan,pack-simulator,unpack-simulator,push-simulator,pull-simulator,list-simulators,list-simulator-versions,delete-simulator,copy-version,rename-version,rename-location,set-current,duplicate,push-component,pull-component,list-components,...} ...
1.1. Location Dataset Commands
kub-dataset pack <location> [options]
kub-dataset unpack <archive> [options]
kub-dataset list <archive>
kub-dataset push <location> --version <version> --api-key <key> [--dmp <dmp>]
kub-dataset pull <location> [--version <version>] [--dmp <dmp>] [--dvc]
kub-dataset delete <location> [--version <version>] --api-key <key>
kub-dataset list-versions <location> [--dmp <dmp>]
kub-dataset list-locations [--dmp <dmps>] [--show-versions]
kub-dataset summary [<location> ...] [--all] [--version <version>] [--format {table,json}]
kub-dataset list-dmps
kub-dataset generate {meteo|iaq|both} <location> --start <date|datetime> --end <date|datetime> [--version <version>]
kub-dataset manifest-show <location> [--version <version>]
kub-dataset manifest-regenerate <location> [--version <version>]
kub-dataset manifest-verify <location> [--version <version>]
kub-dataset manifest-validate [manifest|location] [--version <version>] [--cemdb-root <path>] [--strict]
kub-dataset fix <location> [--version <version>] [--policy-file <path>] [--dry-run]
kub-dataset policy list
kub-dataset policy show [policy_ref] [--template <name>|--file <path>]
kub-dataset policy validate [policy_ref] [--template <name>|--file <path>] [-v]
kub-dataset policy apply <location> [--version <version>] [--template <name>|--file <path>] [--dry-run]
kub-dataset policy-validate <policy-file> [-v]
kub-dataset migrate [<location>] [--execute]
kub-dataset migrate-layout [<location>|--all]
kub-dataset migrate-org-layout [<location>|--all] [--organization <org>|--org-map <json>]
kub-dataset copy-version <location> <source-version> <target-version>
kub-dataset rename-version <location> <old-version> <new-version>
kub-dataset rename-location <old-name> <new-name>
kub-dataset set-current <location> <version>
kub-dataset duplicate <location> [<source-version>]
kub-dataset tag-reference <location> <run-id> [--type <single|ensemble>]
kub-dataset list-reference <location> [--type <single|ensemble>]
kub-dataset push-component <location> --version <version> --component <geo|config|inputs|preprocessing|all> --api-key <key>
kub-dataset pull-component <location> --version <version> --component <geo|config|inputs|preprocessing>
kub-dataset list-components <location> --version <version>
1.2. Simulator Dataset Commands
kub-dataset pack-simulator --version <version> [--lod <lod>] [--single]
kub-dataset unpack-simulator <archive> [options]
kub-dataset push-simulator --version <version> [--lod <lod>] --api-key <key>
kub-dataset pull-simulator --version <version> [--lod <lod>] [--dvc]
kub-dataset list-simulators [--local] [--remote] [--version <version>]
kub-dataset delete-simulator --version <version> [--lod <lod>] --api-key <key>
2. Features
-
Pack/Unpack: Create and extract dataset archives (excludes simulation outputs)
-
Multi-DMP Support: Push/pull location datasets to/from Girder and CKAN
-
Input Generation: Generate hourly weather and IAQ inputs from Open-Meteo
-
Simulator Datasets: Manage FMU building models with LOD-based organization (Girder only)
-
DVC Integration: Initialize DVC, track datasets, and check status
-
Unified Listing: List all datasets across multiple DMPs
-
Version Tracking: Track dataset versions with metadata in
.dvcfiles
3. Description
CEMDB locations contain both input data and simulation outputs. For distribution or CI testing, you typically only need the input data:
-
weather/- Weather data files -
air-quality/- Air quality data files (when available) -
preprocessing/- Preprocessed mesh and metadata -
geo/- GIS data, mesh files, enrichment data -
scenarios/- HVAC scenario configurations
The kub-dataset tool creates archives that exclude:
-
feelppdb/- Simulation database outputs -
simulations/- Simulation results
4. Data Management Platforms (DMPs)
kub-dataset supports multiple data management platforms:
| DMP ID | Type | Description |
|---|---|---|
|
Girder |
University of Strasbourg Girder instance (default) |
|
CKAN |
HiDALGO2 CKAN data portal |
Use --dmp <dmp-id> to specify which platform to use for push/pull operations.
5. Simulator Datasets
Simulator datasets contain pre-compiled FMU (Functional Mock-up Unit) building models used for energy simulations. Unlike location datasets, simulators are:
-
Girder-only: Not available on CKAN
-
Version-based: Organized by version folders (e.g.,
v0.1.0) -
LOD-organized: Contains Level of Detail directories (lod0, lod1) and support directories (others)
5.1. Local Structure
cemdb/simulators/
└── v0.1.0/ # Version folder
├── manifest.json # Version manifest (describes all directories)
├── lod0/ # LOD 0 building models
│ ├── App4Walls1Floor1Roof.fmu
│ ├── App4Walls2Floor1RoofBoiler.fmu
│ └── ... (60 FMU files)
├── lod0.metadata.json # LOD 0 metadata (building model details)
├── others/ # Support FMU files
│ └── Sun.fmu
└── others.metadata.json # Support files metadata
5.2. Girder Structure
On Girder, simulators are stored as items with individual FMU files:
UrbanBuilding/cemdb/simulators/
└── v0.1.0/
├── manifest.json # Manifest item
├── lod0 # Item containing all LOD0 FMU files
└── others # Item containing support FMU files
5.3. Manifest File
The manifest.json describes the version contents:
{
"version": "0.1.0",
"description": "Dataset of physical building models in FMU format",
"creationDate": "2025-10-22",
"itemType": "Simulators",
"maxNumberOfFloors": 10,
"totalNumberOfBuildingModels": 60,
"directories": [
{
"name": "lod0",
"type": "lod",
"lod": 0,
"description": "Level of Detail 0 building models",
"count": 60
},
{
"name": "others",
"type": "support",
"description": "Supporting FMU files (Sun model, etc.)",
"files": ["Sun.fmu"]
}
]
}
6. Commands
6.1. DVC Commands
6.2. Authentication Commands
6.2.1. login
Authenticate with a configured Keycloak provider for DMP operations.
kub-dataset login --provider <provider-id> [--username <user> --password <pass>] [--timeout <seconds>]
6.3. Location Dataset Commands
6.3.1. pack
Pack a CEMDB location into a zip archive.
kub-dataset pack <location> [options]
6.3.2. unpack
Unpack a dataset archive to the CEMDB locations directory.
kub-dataset unpack <archive> [options]
6.3.4. list-dmps
List all registered Data Management Platforms.
kub-dataset list-dmps [--format {table,json}]
Example
kub-dataset list-dmps
Output:
ID Name Type Description
------------------------------------------------------------------------------------------
girder-unistra Girder Unistra girder University of Strasbourg Girder instance
ckan-hidalgo2 CKAN HiDALGO2 ckan HiDALGO2 CKAN data portal
------------------------------------------------------------------------------------------
Total: 2 registered DMPs
6.3.5. list-locations
List all available locations and versions from registered DMPs.
kub-dataset list-locations [--dmp <dmp-ids>] [--format {table,json}] [--show-versions]
Options
--dmp <dmp-ids>-
Comma-separated list of DMP IDs to query. Default: all registered DMPs
--api-key <key>-
API key (optional for public datasets)
--format {table,json}-
Output format. Default:
table --show-versions-
Show all available versions for each location (expanded view)
Example
# List all locations from all DMPs
kub-dataset list-locations
# List only from Girder
kub-dataset list-locations --dmp girder-unistra
# List from multiple specific DMPs
kub-dataset list-locations --dmp girder-unistra,ckan-hidalgo2
# Show all versions (expanded view)
kub-dataset list-locations --show-versions
# JSON output for scripting
kub-dataset list-locations --format json
Output (table format):
DMP Location Latest Versions Total Size
=======================================================================================
girder-unistra kernante 0.98.0 1 0.17 MB
lingolsheim 0.97.0 1 1.24 MB
ub_uap_coupling_lingolsheim_district 0.97.0 1 12.45 MB
(3 locations) 13.86 MB
---------------------------------------------------------------------------------------
ckan-hidalgo2 berlin-district 1.0.0 2 8.34 MB
(1 locations) 8.34 MB
---------------------------------------------------------------------------------------
Total: 4 locations (girder-unistra: 3, ckan-hidalgo2: 1) - 22.20 MB
6.3.6. push
Push a location dataset to a DMP (Girder or CKAN).
kub-dataset push <location> --version <version> --api-key <key> [--dmp <dmp-id>] [options]
Options
--version, -V <version>(required)-
Dataset version (e.g.,
0.98.0) --api-key <key>(required)-
API key for authentication
--dmp <dmp-id>-
DMP to push to. Default:
girder-unistra --cemdb-root <path>-
Path to
cemdb/locationsdirectory. Default:cemdb/locations --normalize-
Auto-normalize invalid dataset names (convert dashes to underscores).
--create-package-
Create CKAN package if it doesn’t exist. Default:
True(CKAN only) --no-create-package-
Fail if CKAN package doesn’t exist instead of creating it (CKAN only)
6.3.7. pull
Pull a location dataset from a DMP with optional DVC tracking.
kub-dataset pull <location> [--version <version>] [--dmp <dmp-id>] [--dvc] [options]
Options
--version, -V <version>-
Dataset version (e.g.,
0.98.0). If omitted, pulls the latest version. --dmp <dmp-id>-
DMP to pull from. Default:
girder-unistra --api-key <key>-
API key (optional for public datasets)
--cemdb-root <path>-
Path to
cemdb/locationsdirectory. Default:cemdb/locations -f, --force-
Overwrite existing location if present
--dvc-
Generate DVC tracking file (
.dvc) and update.gitignoreafter pull. This enables reproducible data pipelines. --normalize-
Auto-normalize invalid dataset names (convert dashes to underscores).
Example
# Pull latest from Girder (default)
kub-dataset pull kernante --api-key $GIRDER_API_KEY
# Pull specific version
kub-dataset pull kernante --version 0.98.0 --api-key $GIRDER_API_KEY
# Pull from CKAN
kub-dataset pull berlin-district --dmp ckan-hidalgo2 --api-key $CKAN_API_KEY
# Pull with DVC tracking
kub-dataset pull kernante --dvc --api-key $GIRDER_API_KEY
# Force overwrite and generate DVC tracking
kub-dataset pull kernante --version 0.98.0 --force --dvc --api-key $GIRDER_API_KEY
6.3.8. delete
Delete a location dataset from a DMP (Girder or CKAN).
kub-dataset delete <location> [--version <version>] --api-key <key> [--dmp <dmp-id>]
Options
--version, -V <version>-
Specific version to delete. If omitted, deletes ALL versions.
--dmp <dmp-id>-
DMP to delete from. Default:
girder-unistra --api-key <key>(required)-
API key for authentication
--yes, -y-
Skip confirmation prompt
Example
# Delete specific version from Girder
kub-dataset delete kernante --version 0.98.0 --api-key $GIRDER_API_KEY --yes
# Delete all versions (with confirmation)
kub-dataset delete kernante --api-key $GIRDER_API_KEY
# Delete from CKAN
kub-dataset delete berlin-district --version 1.0.0 --dmp ckan-hidalgo2 --api-key $CKAN_API_KEY --yes
6.3.9. list-versions
List available versions of a location dataset on a DMP.
kub-dataset list-versions <location> [--dmp <dmp-id>] [options]
6.3.10. summary
Summarize local datasets in cemdb/locations.
This command inspects local locations and reports:
-
resolved version (current/latest or
--version) -
manifest and config availability
-
detected building count
-
detected preprocessing partitions
-
available plans and simulation run counts
kub-dataset summary [<location> ...] [--all] [--version <version>] [--format {table,json}] [--compact]
Options
--all-
Include all local locations (merged with explicit location arguments).
--version <version>-
Inspect a specific version (e.g.
0.1.0orv0.1.0). --cemdb-root <path>-
Path to
cemdb/locations. Default:cemdb/locations --format {table,json}-
Output format. Default:
table --compact-
Force compact table output even for a single location.
Examples
# Summarize one local dataset
kub-dataset summary kernante --cemdb-root cemdb/locations
# Summarize all local datasets
kub-dataset summary --all --cemdb-root cemdb/locations
# Summarize specific version for multiple datasets in JSON
kub-dataset summary kernante strasbourg --version 0.1.0 --format json
kub-case-summary is deprecated; use kub-dataset summary for versioned cemdb/locations datasets.
|
6.3.11. generate
Generate location input data from Open-Meteo and write it into the target dataset version directory.
Weather and IAQ outputs are hourly time series.
kub-dataset generate meteo <location> --start <date|datetime> --end <date|datetime> [options]
kub-dataset generate iaq <location> --start <date|datetime> --end <date|datetime> [options]
kub-dataset generate both <location> --start <date|datetime> --end <date|datetime> [options]
Options
--version <version>-
Version to target. If omitted, uses current/latest detected version.
--cemdb-root <path>-
Path to
cemdb/locations. Default:cemdb/locations --start, --end <date|datetime>-
Time window to generate. Supports
YYYY-MM-DDand ISO datetime. --timezone <tz>-
Timezone for request and local date interpretation (e.g.
Europe/Paris,UTC). Default:auto --force-
Overwrite existing generated files.
--update-manifest/--no-update-manifest-
Update
manifest.jsonand checksums after generation (default: enabled). --output <filename>(formeteo/iaq)-
Output filename inside
weather/orair-quality/. --weather-output <filename>,--iaq-output <filename>(forboth)-
Separate output filenames for each generated stream.
Notes
-
generate meteoalias:generate weather -
generate bothalias:generate all -
With naive datetimes (
2023-01-01T12:00:00), use explicit--timezone.
Example
# Generate one year of weather
kub-dataset generate meteo strasbourg \
--version 0.1.0 \
--start 2023-01-01 \
--end 2023-12-31 \
--timezone Europe/Paris \
--cemdb-root "$PWD/cemdb"
# Generate IAQ with default output name
kub-dataset generate iaq strasbourg \
--version 0.1.0 \
--start 2023-01-01 \
--end 2023-12-31 \
--timezone Europe/Paris \
--cemdb-root "$PWD/cemdb"
# Generate both streams in one command
kub-dataset generate both strasbourg \
--version 0.1.0 \
--start 2023-01-01 \
--end 2023-12-31 \
--timezone Europe/Paris \
--cemdb-root "$PWD/cemdb"
6.3.12. tag-reference
Tag a simulation run as a validation reference (single or ensemble).
kub-dataset tag-reference <location> <run-id> [--type <single|ensemble>] [--description <text>] [--expected-energy <kWh>]
6.3.13. list-reference
List reference-tagged runs for a location.
kub-dataset list-reference <location> [--type <single|ensemble>] [--format {table,json}]
6.3.14. migrate
Migrate dataset folders from flat layout to versioned layout.
kub-dataset migrate [<location>] [--default-version <X.Y.Z>] [--execute]
6.3.15. migrate-layout
Migrate location data layout from legacy input-dataset/ to geo/.
kub-dataset migrate-layout [<location>|--all] [--dry-run]
6.3.16. migrate-org-layout
Migrate local datasets to organization-aware layout:
cemdb/<org>/locations/<location>.
kub-dataset migrate-org-layout [<location>|--all] [--organization <org>|--org-map <json>] [--apply]
6.3.17. migrate-simulator
Migrate simulator manifest structure to current schema conventions.
kub-dataset migrate-simulator [<version>] [--dry-run]
6.3.18. manifest-show
Display manifest metadata for a location version.
kub-dataset manifest-show <location> [--version <version>] [--format {table,json}]
6.3.19. manifest-regenerate
Regenerate manifest.json for a location version (optionally with checksums).
kub-dataset manifest-regenerate <location> [--version <version>] [--no-checksums]
6.3.20. manifest-verify
Verify manifest schema and optionally file checksums.
kub-dataset manifest-verify <location> [--version <version>] [--verify-checksums]
6.3.21. manifest-validate
Validate manifest refs, artifacts, and SHA256 checksums.
You can pass an explicit manifest path or a location name (optionally with --version).
This is the canonical CLI entry for manifest validation.
kub-dataset manifest-validate [manifest|location] [--version <version>] [--cemdb-root <path>] [--strict]
6.3.22. fix
Auto-fix common dataset issues and write normalized policy/config artifacts for a location version.
kub-dataset fix <location> [--version <version>] [--cemdb-root <path>] [--policy-file <path>] [--dry-run]
Options
--version <version>-
Target version. If omitted, uses
currentor latest local version. --policy-file <path>-
Optional filter policy JSON file (
volume,ground_area,precomputed) to apply while fixing. --filter-volume-min,--filter-volume-max,--filter-ground-area-min,--filter-ground-area-max-
Override filter thresholds directly on CLI.
--no-manifest-regenerate-
Skip manifest regeneration after applying fixes.
--dry-run-
Print planned actions without writing files.
6.3.23. policy
Manage filter policy templates and dataset policy artifacts.
kub-dataset policy list
kub-dataset policy show [policy_ref] [--template <name>|--file <path>]
kub-dataset policy validate [policy_ref] [--template <name>|--file <path>] [-v]
kub-dataset policy apply <location> [--version <version>] [--template <name>|--file <path>] [--dry-run]
Examples
# List built-in policy templates
kub-dataset policy list
# Show normalized template payload
kub-dataset policy show legacy-default
# Validate a policy file
kub-dataset policy validate --file cemdb/locations/kernante/v0.98.0/policy/filter_policy.json -v
# Apply a template to a location version
kub-dataset policy apply kernante --version 0.98.0 --template legacy-default
6.3.24. policy-validate
Compatibility alias for kub-dataset policy validate.
Prefer kub-dataset policy validate in new scripts.
kub-dataset policy-validate <policy-file> [-v]
6.3.25. copy-version
Copy one location version to another.
Alias: cp.
kub-dataset copy-version <location> <source-version> <target-version> [--set-current]
6.3.26. rename-version
Rename a location version folder.
Alias: mv.
kub-dataset rename-version <location> <old-version> <new-version>
6.3.27. rename-location
Rename a location and update cem.location.name in config files.
Alias: rl.
kub-dataset rename-location <old-name> <new-name>
6.3.28. set-current
Set the current symlink to a specific version.
Alias: sc.
kub-dataset set-current <location> <version>
6.3.29. duplicate
Duplicate a version (or current version) with patch increment.
Alias: dup.
kub-dataset duplicate <location> [<source-version>] [--set-current]
6.3.30. push-component
Push a single component (browsable item mode, Girder).
kub-dataset push-component <location> --version <version> --component <geo|config|inputs|preprocessing|all> --api-key <key>
6.3.31. pull-component
Pull a single component for a location version (Girder).
kub-dataset pull-component <location> --version <version> --component <geo|config|inputs|preprocessing>
6.3.32. list-components
List available remote components for a location version (Girder).
kub-dataset list-components <location> --version <version> [--api-key <key>]
6.3.33. list-locations-ckan
Legacy compatibility command (deprecated).
Use list-locations --dmp ckan-hidalgo2.
kub-dataset list-locations-ckan [--format {table,json}]
6.3.34. list-versions-ckan
Legacy compatibility command (deprecated).
Use list-versions <location> --dmp ckan-hidalgo2.
kub-dataset list-versions-ckan <location>
6.4. Simulator Dataset Commands
Simulator datasets are stored on Girder only and follow a version-based structure with LOD directories.
6.4.1. pack-simulator
Pack simulator directory(ies) into zip file(s).
kub-dataset pack-simulator --version <version> [--lod <lod>] [--single] [options]
Options
--version, -V <version>(required)-
Version to pack (e.g.,
0.1.0) --lod, -L <lod>-
Directory to pack: LOD number (0, 1, 2) or name (
others). If omitted, packs ALL directories (one zip per directory). --single, -s-
Create a single zip containing all directories and manifest.json.
-o, --output <path>-
Output zip file path. Default:
<dir_name>.ziporsimulators-vX.Y.Z.zipwith--single --cemdb-root <path>-
Path to
cemdb/simulatorsdirectory. Default:cemdb/simulators
Example
# Pack all directories (one zip per directory)
kub-dataset pack-simulator --version 0.1.0
# Pack specific LOD directory
kub-dataset pack-simulator --version 0.1.0 --lod 0
# Pack the 'others' directory
kub-dataset pack-simulator --version 0.1.0 --lod others
# Create single unified zip (includes manifest.json)
kub-dataset pack-simulator --version 0.1.0 --single
# Custom output path
kub-dataset pack-simulator --version 0.1.0 --single -o /tmp/simulators.zip
Output:
# With --single
Created: /path/to/simulators-v0.1.0.zip
# Without --single (all directories)
Packing all directories for version 0.1.0: [0, 'others']
lod0: simulators_lod0-v0.1.0.zip
others: simulators_others-v0.1.0.zip
Created 2 zip file(s)
6.4.2. unpack-simulator
Unpack a simulator archive to the cemdb/simulators directory. Automatically detects the archive format and routes to the appropriate unpacking logic.
kub-dataset unpack-simulator <archive> [options]
Arguments
archive-
Path to the zip file to unpack. Supports two formats:
-
Unified version zip:
simulators-vX.Y.Z.zip- Contains all directories and manifest.json -
Individual directory zip:
simulators_<dir>-vX.Y.Z.zip- Contains a single directory (e.g.,simulators_lod0-v0.1.0.zip,simulators_others-v0.1.0.zip)
-
Options
--cemdb-root <path>-
Path to
cemdb/simulatorsdirectory. Default:cemdb/simulators -f, --force-
Overwrite existing directory if present
Example
# Unpack a unified version zip (extracts everything including manifest.json)
kub-dataset unpack-simulator simulators-v0.1.0.zip
# Unpack a single LOD directory
kub-dataset unpack-simulator simulators_lod0-v0.1.0.zip
# Unpack the 'others' directory
kub-dataset unpack-simulator simulators_others-v0.1.0.zip
# Force overwrite
kub-dataset unpack-simulator simulators_lod0-v0.1.0.zip --force
Output
# Unified zip
Unpacked to: cemdb/simulators/v0.1.0 (version: 0.1.0)
# Individual directory zip
Unpacked to: cemdb/simulators/v0.1.0/lod0 (version: 0.1.0, directory: lod0)
| When unpacking individual directory zips, manifest.json is NOT included (it only exists in unified zips). To get the manifest, either use the unified zip or pull the manifest from Girder separately. |
6.4.3. push-simulator
Push simulator directory(ies) to Girder.
kub-dataset push-simulator --version <version> [--lod <lod>] --api-key <key> [options]
Options
--version, -V <version>(required)-
Version to push (e.g.,
0.1.0) --lod, -L <lod>-
Directory to push: LOD number (0, 1, 2) or name (
others). If omitted, pushes ALL directories and manifest.json. --api-key <key>(required)-
Girder API key for authentication
--cemdb-root <path>-
Path to
cemdb/simulatorsdirectory. Default:cemdb/simulators
Example
# Push all directories (including manifest.json)
kub-dataset push-simulator --version 0.1.0 --api-key $GIRDER_API_KEY
# Push specific LOD only
kub-dataset push-simulator --version 0.1.0 --lod 0 --api-key $GIRDER_API_KEY
# Push 'others' directory
kub-dataset push-simulator --version 0.1.0 --lod others --api-key $GIRDER_API_KEY
# With verbose output
kub-dataset -v push-simulator --version 0.1.0 --api-key $GIRDER_API_KEY
Output:
Pushing all directories for version 0.1.0: [0, 'others']
lod0: lod0
others: others
manifest: manifest.json
Uploaded 3 item(s) to Girder for version 0.1.0
6.4.4. pull-simulator
Pull simulator directories from Girder.
kub-dataset pull-simulator --version <version> [--lod <lod>] [--dvc] [options]
Options
--version, -V <version>(required)-
Version to pull (e.g.,
0.1.0) --lod, -L <lod>-
Directory to pull: LOD number (0, 1, 2) or name (
others). If not specified, pulls all directories. --api-key <key>-
Girder API key (optional for public datasets)
--cemdb-root <path>-
Path to
cemdb/simulatorsdirectory. Default:cemdb/simulators -f, --force-
Overwrite existing directory if present
--dvc-
Generate DVC tracking file (
.dvc) and update.gitignoreafter pull.
Example
# Pull all directories for a version
kub-dataset pull-simulator --version 0.1.0
# Pull specific LOD
kub-dataset pull-simulator --version 0.1.0 --lod 0
# Pull with DVC tracking
kub-dataset pull-simulator --version 0.1.0 --dvc
# Force overwrite
kub-dataset pull-simulator --version 0.1.0 --force --dvc
Output:
Pulling all directories for version 0.1.0: ['lod0', 'others']
Pulled: lod0 -> cemdb/simulators/v0.1.0/lod0
Pulled: others -> cemdb/simulators/v0.1.0/others
Pulled 2 directory(ies) for version 0.1.0
6.4.5. list-simulators
List available simulator versions (local and/or Girder).
kub-dataset list-simulators [--local] [--remote] [--version <version>] [options]
Options
--version, -V <version>-
Filter by specific version (e.g.,
0.1.0) --local-
List local versions only (from
cemdb/simulators/) --remote-
List remote versions only (from Girder)
--api-key <key>-
Girder API key (optional for public datasets)
--format {table,json}-
Output format. Default:
table --cemdb-root <path>-
Path to
cemdb/simulatorsdirectory. Default:cemdb/simulators
Example
# List all (local and remote)
kub-dataset list-simulators
# List local versions only
kub-dataset list-simulators --local
# List remote versions only
kub-dataset list-simulators --remote
# Filter by version
kub-dataset list-simulators --version 0.1.0
# JSON output
kub-dataset list-simulators --format json
Output (table format):
Source Version Items Total Size
=================================================================
local v0.1.0 lod0, others 209.5 MB
remote v0.1.0 lod0, manifest.json, others 209.5 MB
-----------------------------------------------------------------
6.4.6. list-simulator-versions
List available simulator versions on Girder (alias for list-simulators --remote).
kub-dataset list-simulator-versions [--version <version>] [--api-key <key>]
6.4.7. delete-simulator
Delete a simulator version or specific item from Girder.
kub-dataset delete-simulator --version <version> [--lod <lod>] --api-key <key>
Options
--version, -V <version>(required)-
Version to delete from (e.g.,
0.1.0) --lod, -L <lod>-
Specific item to delete (e.g.,
0,others). If omitted, deletes the entire version. --api-key <key>(required)-
Girder API key for authentication
--yes, -y-
Skip confirmation prompt
Example
# Delete entire version (all items and version folder)
kub-dataset delete-simulator --version 0.1.0 --api-key $GIRDER_API_KEY --yes
# Delete specific LOD only
kub-dataset delete-simulator --version 0.1.0 --lod 0 --api-key $GIRDER_API_KEY --yes
# Delete 'others' directory
kub-dataset delete-simulator --version 0.1.0 --lod others --api-key $GIRDER_API_KEY --yes
7. DVC Integration
kub-dataset provides built-in DVC (Data Version Control) support for reproducible data pipelines.
7.1. Quick Start
# 1. Initialize DVC (one-time setup)
kub-dataset init
# 2. Pull dataset with DVC tracking
kub-dataset pull kernante --version 0.98.0 --dvc
# 3. Check tracked datasets
kub-dataset status
# 4. Commit to git
git add .dvc .dvcignore cemdb/locations/kernante.dvc cemdb/locations/.gitignore
git commit -m "Track kernante v0.98.0 with DVC"
7.2. How It Works
The --dvc flag enables DVC tracking for pulled datasets by:
-
Computing MD5 checksums of the unpacked dataset
-
Generating a
.dvctracking file with version metadata -
Adding the data directory to
.gitignore
7.3. Generated .dvc File
When using --dvc, a tracking file is created:
# DVC tracking file for kernante
# Generated by kub-dataset
# DO NOT EDIT - managed by kub-dataset pull --dvc
outs:
- md5: a1b2c3d4e5f6...
path: kernante
hash: md5
size: 183456
nfiles: 42
meta:
source: girder-unistra
version: 0.98.0
tool: kub-dataset
The meta section stores:
-
source: The DMP where the dataset was pulled from -
version: The exact version of the dataset -
tool: Identifies kub-dataset as the tracking tool
7.4. Complete DVC Workflow
# 1. Initialize DVC in the repository (one-time)
kub-dataset init
git add .dvc .dvcignore
git commit -m "Initialize DVC"
# 2. Pull dataset with DVC tracking
kub-dataset pull kernante --version 0.98.0 --dvc
# 3. Verify tracking
kub-dataset status
# Output:
# Tracked datasets:
# - kernante (version: 0.98.0, source: girder-unistra)
# 4. Commit the .dvc file to git
git add cemdb/locations/kernante.dvc cemdb/locations/.gitignore
git commit -m "Track kernante v0.98.0"
# 5. Later, anyone can reproduce the exact dataset
kub-dataset pull kernante --version 0.98.0 --dvc --force
7.5. Updating Dataset Version
# Update to a new version
kub-dataset pull kernante --version 1.0.0 --dvc --force
# Check the updated version
kub-dataset status
# Output:
# Tracked datasets:
# - kernante (version: 1.0.0, source: girder-unistra)
# Commit the change
git add cemdb/locations/kernante.dvc
git commit -m "Update kernante to v1.0.0"
8. Global Options
-v, --verbose-
Print verbose output showing detailed progress:
[pull] DMP: girder-unistra (girder) [pull] Location: kernante [pull] Version: 0.98.0 [pull] Target: cemdb/locations Connecting to Girder at https://girder.math.unistra.fr/api/v1... Found dataset: kernante_input-v0.98.0.zip Downloading item: kernante_input-v0.98.0.zip... Unpacking to cemdb/locations... Extracted: kernante/.dataregistry.json Extracted: kernante/geo/gis.json ... -h, --help-
Show help message and exit
9. Archive Structure
The created archive preserves the location directory structure:
<location>/
├── .dataregistry.json
├── kub-cem-sim-config.cfg
├── geo/
│ ├── .dataregistry.json
│ ├── gis.json
│ ├── mesh_0_Lod0.msh
│ ├── mesh_1_Lod1.msh
│ ├── enriched.jsonl.gz
│ └── geosetup/
│ └── geographicdata_setup.json
├── preprocessing/
│ ├── preprocessing.json
│ └── partitioning/
│ ├── np_1/
│ ├── np_2/
│ ├── np_4/
│ └── ...
├── scenarios/
│ ├── scenarios.json
│ └── scenarioHVAC_*.csv
├── weather/
│ ├── weather-data-0.hourly-variables.csv
│ └── weather-data.json
└── air-quality/ # Optional
└── air-quality-data-0.hourly-variables.csv
10. Use Cases
10.1. CI/CD Pipeline Integration
Pull datasets automatically in CI workflows:
# GitHub Actions example
- name: Install kub-dataset
run: pip install feelpp-ktirio-urban-building
- name: Pull dataset
run: |
kub-dataset pull kernante \
--cemdb-root ./cemdb/locations \
--api-key "${{ secrets.GIRDER_API_KEY }}" \
--force -v
10.2. Setting Up a New Development Environment
# Clone repository
git clone https://github.com/feelpp/ktirio-urban-building.git
cd ktirio-urban-building
# List available datasets
kub-dataset list-locations
# Pull required datasets with DVC tracking
kub-dataset pull kernante --dvc --force
kub-dataset pull lingolsheim --dvc --force
# Verify downloaded data
ls -la cemdb/locations/
10.3. Publishing a New Dataset Version
# Verify local dataset is ready
ls cemdb/locations/kernante/
# Push new version to Girder
kub-dataset -v push kernante \
--version 1.0.0 \
--api-key $GIRDER_API_KEY \
--cemdb-root cemdb/locations
# Verify it's available
kub-dataset list-versions kernante
10.4. Migrating Datasets Between DMPs
# Pull from Girder
kub-dataset pull kernante \
--dmp girder-unistra \
--api-key $GIRDER_API_KEY \
--cemdb-root ./temp
# Push to CKAN
kub-dataset push kernante \
--version 1.0.0 \
--dmp ckan-hidalgo2 \
--api-key $CKAN_API_KEY \
--cemdb-root ./temp
10.5. Reproducible Research Workflow
Use DVC tracking for reproducible experiments:
# Initial setup: pull with DVC tracking
kub-dataset pull kernante --version 0.98.0 --dvc
# Commit the .dvc file
git add cemdb/locations/kernante.dvc
git commit -m "Track kernante v0.98.0"
# Later: reproduce the exact dataset
# (read version from .dvc file's meta section)
cat cemdb/locations/kernante.dvc | grep version
10.6. Creating Test Datasets for Distribution
# Pack location for distribution
kub-dataset -v pack kernante \
--cemdb-root cemdb/locations \
-o kernante_test_input.zip
# Verify archive contents
kub-dataset list kernante_test_input.zip | head -20
# Share or upload the archive
# Recipients can unpack with:
# kub-dataset unpack kernante_test_input.zip --force
10.7. Querying Multiple DMPs
# Get overview of all available data
kub-dataset list-locations --show-versions
# Export to JSON for processing
kub-dataset list-locations --format json > datasets.json
# Query specific DMPs with verbose output
kub-dataset -v list-locations --dmp girder-unistra
10.8. Working with Simulator Datasets
10.8.1. Setting Up Simulators for Development
# List available simulator versions
kub-dataset list-simulators
# Pull simulators with DVC tracking
kub-dataset pull-simulator --version 0.1.0 --dvc --force
# Verify downloaded files
ls cemdb/simulators/v0.1.0/
10.8.2. Publishing New Simulator Version
# Prepare local simulator files
ls cemdb/simulators/v0.2.0/
# manifest.json, lod0/, lod1/, others/
# Push all directories to Girder
kub-dataset push-simulator --version 0.2.0 --api-key $GIRDER_API_KEY
# Verify upload
kub-dataset list-simulators --remote
10.8.3. CI/CD with Simulators
Pull simulators in CI pipelines:
# GitHub Actions example
- name: Pull simulators
run: |
kub-dataset pull-simulator \
--version 0.1.0 \
--force \
--api-key "${{ secrets.GIRDER_API_KEY }}"
10.8.4. Creating Simulator Distribution Package
# Create single zip with all FMUs and manifest
kub-dataset pack-simulator --version 0.1.0 --single
# Result: simulators-v0.1.0.zip containing:
# - manifest.json
# - lod0/*.fmu (60 files)
# - lod0.metadata.json
# - others/Sun.fmu
# - others.metadata.json
10.8.5. Updating Simulator Version
# Check current version
kub-dataset list-simulators --local
# Pull new version
kub-dataset pull-simulator --version 0.2.0 --dvc --force
# Commit DVC tracking
git add cemdb/simulators/v0.2.0/lod0.dvc
git add cemdb/simulators/v0.2.0/others.dvc
git commit -m "Update simulators to v0.2.0"
11. Environment Variables
The following environment variables can be used:
GIRDER_API_KEY-
Girder API key for authentication (can be passed to
--api-key) CKAN_API_KEY-
CKAN API key for authentication
12. Python API
kub-dataset can also be used programmatically.
12.1. Location Dataset API
from feelpp.ktirio.ub.dataset import (
pack_dataset,
unpack_dataset,
pull_dataset_girder,
pull_dataset_ckan,
push_dataset_girder,
delete_dataset_girder,
list_locations_all,
list_versions_girder,
generate_dvc_file,
init_dvc,
status_dvc,
)
# Initialize DVC
init_dvc(verbose=True)
# Pack a location
zip_path = pack_dataset("kernante", cemdb_root="cemdb/locations")
# Pull from Girder
location_path = pull_dataset_girder(
"kernante",
version="0.98.0",
api_key="your-api-key",
cemdb_root="cemdb/locations",
)
# Generate DVC tracking
dvc_path = generate_dvc_file(
location_path,
dmp_id="girder-unistra",
version="0.98.0",
)
# Check DVC status
result = status_dvc()
print(f"Initialized: {result['initialized']}")
print(f"Status: {result['status']}")
# List all locations
locations = list_locations_all(verbose=True)
for loc in locations:
print(f"{loc['dmp']}: {loc['name']} v{loc['latest_version']}")
12.2. Simulator Dataset API
from feelpp.ktirio.ub.dataset import (
# Pack operations
pack_simulator_lod,
pack_simulator_version,
unpack_simulator_lod,
unpack_simulator_version,
list_simulator_lods_local,
list_simulator_versions_local,
# DMP operations (Girder only)
push_simulator_lod_girder,
push_simulator_manifest_girder,
pull_simulator_lod_girder,
list_simulator_lods_girder,
delete_simulator_lod_girder,
)
# Pack a specific LOD
zip_path = pack_simulator_lod(
version="0.1.0",
lod=0, # Can also be 'others'
cemdb_root="cemdb/simulators",
)
# Pack entire version (single zip with manifest)
zip_path = pack_simulator_version(
version="0.1.0",
cemdb_root="cemdb/simulators",
)
# Unpack a single directory zip (lod0 or others)
dir_path, version, dir_name = unpack_simulator_lod(
"simulators_lod0-v0.1.0.zip",
cemdb_root="cemdb/simulators",
)
print(f"Unpacked {dir_name} to {dir_path}")
# Unpack a unified version zip (all directories + manifest)
version_path, version = unpack_simulator_version(
"simulators-v0.1.0.zip",
cemdb_root="cemdb/simulators",
)
print(f"Unpacked version {version} to {version_path}")
# List local versions
versions = list_simulator_versions_local(cemdb_root="cemdb/simulators")
print(f"Local versions: {versions}")
# List directories in a version
dirs = list_simulator_lods_local("0.1.0", cemdb_root="cemdb/simulators")
print(f"Directories: {dirs}") # [0, 'others']
# Pull from Girder
dir_path, version, dir_name = pull_simulator_lod_girder(
lod=0,
version="0.1.0",
api_key="your-api-key",
cemdb_root="cemdb/simulators",
)
print(f"Pulled {dir_name} to {dir_path}")
# Push to Girder
result = push_simulator_lod_girder(
version="0.1.0",
lod=0,
api_key="your-api-key",
cemdb_root="cemdb/simulators",
)
print(f"Uploaded: {result['item_name']}")
# Push manifest
result = push_simulator_manifest_girder(
version="0.1.0",
api_key="your-api-key",
cemdb_root="cemdb/simulators",
)
# List remote versions
remote_data = list_simulator_lods_girder(verbose=True)
for version_info in remote_data:
print(f"v{version_info['version']}: {version_info['item_count']} items")
13. Exit Codes
| Code | Description |
|---|---|
0 |
Success |
1 |
Error (file not found, permission denied, invalid archive, network error, etc.) |
2 |
Invalid command-line arguments |
14. Related Submodules
The feelpp.ktirio.ub.dataset module includes additional submodules for dataset generation and external data sources:
14.1. generators
Dataset generation utilities for creating reference test cases.
from feelpp.ktirio.ub.dataset.generators import bestest
# Generate BESTEST reference cases
bestest.generate_bestest_dataset(output_dir="cemdb/locations/bestest")
See also: kub-bestest CLI tool for generating ASHRAE 140-2020 BESTEST reference cases.
14.2. sources
External data sources for dataset preparation and simulation support.
from feelpp.ktirio.ub.dataset.sources import openmeteo, events
# Fetch IAQ data for simulation
from feelpp.ktirio.ub.dataset.sources.openmeteo import fetch_iaq_for_simulation
# Fetch historical weather for event scenarios
from feelpp.ktirio.ub.dataset.sources.events import fetch_event_to_location
fetch_event_to_location(
location_root="cemdb/locations/notre_dame/v0.1.0",
event_name="fire-2019-04-15",
lat=48.853, lon=2.349,
start="2019-04-15", end="2019-04-16",
)
See also: kub-event-weather CLI tool for fetching historical weather data.
15. See Also
-
Data Layout - CEMDB directory structure
-
Data Formats - File format specifications
-
kub-bestest - BESTEST reference case generation
-
kub-dashboard - Interactive data exploration
-
kub-event-weather - Historical weather data for events
-
DVC Documentation - Data Version Control