CEMDB Data Layout
1. Overview
The City Energy Model Database (CEMDB) uses a hierarchical directory structure to organize simulation inputs, resources, and outputs. This layout includes a clear separation between single and ensemble runs.
2. Core Design Principles
-
Location-based organization for GIS, weather, and preprocessing data
-
Run isolation via unique run identifiers
-
Run type separation between single and ensemble simulations
-
Resource sharing for FMU datasets
-
Ensemble member isolation with per-member databases
-
Ensemble aggregation stored at run level (not per member)
3. Directory Structure
3.1. Root Structure
cemdb/
|-- simulators/ # Shared FMU datasets (read-only across all simulations)
| |-- ideal/ # Ideal heating system FMUs
| |-- boiler/ # Boiler heating system FMUs
| `-- heatPump/ # Heat pump system FMUs
`-- locations/ # Location-specific data
`-- <location>/ # e.g., "kernante", "strasbourg", "paris"
|-- gis/ # GIS input data
|-- weather/ # Weather data files
|-- scenarios/ # Scenario definitions
|-- preprocessing/ # Cached preprocessing data
`-- simulations/ # Simulation outputs
|-- single/
| `-- run_<timestamp>/
| |-- manifest.json
| |-- setup.json
| `-- database/
`-- ensemble/
`-- run_<timestamp>/
|-- manifest.json
|-- setup.json
|-- ensemble/ # Aggregated ensemble statistics
|-- sample_000/
| `-- database/
|-- sample_001/
| `-- database/
`-- database/ # Run-level logs/config
4. Key Directories
4.1. Simulators (cemdb/simulators/)
Purpose: Store shared FMU (Functional Mock-up Unit) datasets used by simulations.
Characteristics:
-
Read-only during simulations
-
Shared across all locations and runs
-
Organized by heating system type
-
FMU naming:
App{Walls}{Floors}{Roof}{HeatingSystem}.fmu
4.2. Location Root (cemdb/locations/<location>/)
Purpose: Group all data specific to a geographic location.
Subdirectories:
-
gis/: GIS input files (building geometries, terrain, metadata) -
weather/: Weather data files -
scenarios/: Scenario definition files -
preprocessing/: Cached preprocessing results -
simulations/: All simulation outputs (single and ensemble)
4.3. Simulations Base (cemdb/locations/<location>/simulations/)
Simulations are separated by run type.
4.3.1. Single Simulations (simulations/single/)
Purpose: Standalone, single-run simulations.
single/
`-- run_<timestamp>/
|-- manifest.json # Run manifest
|-- setup.json # Run setup
`-- database/
|-- buildings/ # Per-building simulation outputs
|-- visualization/ # Visualization exports
|-- stats/ # StatsEngine outputs
|-- report/ # Legacy reports and facts
`-- timeseries/
`-- building_outputs_ts.h5
Use cases: baseline simulations, single scenario testing, debugging.
4.3.2. Ensemble Simulations (simulations/ensemble/)
Purpose: Coordinated ensemble simulations with multiple parameter samples.
ensemble/
`-- run_<timestamp>/
|-- manifest.json # Run manifest
|-- setup.json # Run setup
|-- ensemble/ # Ensemble-level outputs
| |-- stats/
| | |-- global_kpi_summary.parquet
| | `-- building_outputs_ts.h5
|-- sample_000/
| `-- database/
| |-- stats/
| | `-- global_kpi_summary.parquet
| `-- report/
| `-- building_kpi_ts.part-r*.parquet
`-- database/ # Run-level logs/config
Use cases: uncertainty quantification, sensitivity analysis, parameter space exploration.
5. RunType Classification
6. PathManager API
6.1. Run ID Generation
C++ API:
// Basic run ID (no type prefix)
std::string runId = PathManager::generateRunId();
// -> "run_2025-01-15_14-30-45"
// Type-prefixed run IDs
std::string singleRunId = PathManager::generateRunId(RunType::Single, true);
// -> "single_run_2025-01-15_14-30-45"
std::string ensembleRunId = PathManager::generateRunId(RunType::Ensemble, true);
// -> "ensemble_run_2025-01-15_14-30-45"
Python API:
from feelpp.ktirio.ub import core as ktirio_core
# Basic run ID (no type prefix)
run_id = ktirio_core.PathManager.generateRunId()
# -> "run_2025-01-15_14-30-45"
# Type-prefixed run IDs
single_run_id = ktirio_core.PathManager.generateRunId(
ktirio_core.RunType.Single,
includeTypePrefix=True
)
# -> "single_run_2025-01-15_14-30-45"
ensemble_run_id = ktirio_core.PathManager.generateRunId(
ktirio_core.RunType.Ensemble,
includeTypePrefix=True
)
# -> "ensemble_run_2025-01-15_14-30-45"
| Type prefixes are optional but useful when manually inspecting directories. |
6.2. Initialization
C++ API:
#include <ktirio/ub/pathmanager.hpp>
using namespace Feel::Ktirio::Ub;
// Single simulation
PathManager::instance().initialize(
customRoot,
locationName,
runId,
std::nullopt,
RunType::Single
);
// Ensemble simulation (coordinator)
PathManager::instance().initialize(
customRoot,
locationName,
PathManager::generateRunId(RunType::Ensemble),
std::nullopt,
RunType::Ensemble
);
// Ensemble simulation (member)
PathManager::instance().initialize(
customRoot,
locationName,
runId,
"000",
RunType::Ensemble
);
Python API:
from feelpp.ktirio.ub import core as ktirio_core
pm = ktirio_core.PathManager.instance()
pm.initialize(
customRoot="/path/to/cemdb",
locationName="kernante",
runId=ktirio_core.PathManager.generateRunId(ktirio_core.RunType.Single),
ensembleMemberId=None,
runType=ktirio_core.RunType.Single
)
ensemble_run_id = ktirio_core.PathManager.generateRunId(
ktirio_core.RunType.Ensemble,
includeTypePrefix=True
)
pm.initialize(
customRoot="/path/to/cemdb",
locationName="kernante",
runId=ensemble_run_id,
ensembleMemberId=None,
runType=ktirio_core.RunType.Ensemble
)
pm.initialize(
customRoot="/path/to/cemdb",
locationName="kernante",
runId=ensemble_run_id,
ensembleMemberId="000",
runType=ktirio_core.RunType.Ensemble
)
6.3. Path Getters
// Root paths
std::filesystem::path cemdbRoot() const;
std::filesystem::path simulatorsDir() const;
std::filesystem::path locationRoot() const;
// Input data paths
std::filesystem::path locationGisDir() const;
std::filesystem::path locationWeatherDir() const;
std::filesystem::path locationScenariosDir() const;
std::filesystem::path locationPreprocessingDir() const;
// Simulation paths (type-aware)
std::filesystem::path locationSimulationsDir() const;
std::filesystem::path currentRunDir() const;
std::filesystem::path databaseDir() const;
// Ensemble-specific paths
std::filesystem::path ensembleDir() const;
std::filesystem::path ensembleMemberDir(const std::string& memberId) const;
// Run type
RunType runType() const;
bool isEnsembleMode() const;
Path Behavior:
| Method | Single Run | Ensemble Coordinator | Ensemble Member |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
N/A |
|
|
ensembleDir() always returns the run-level ensemble directory, never a member-specific path.
|
7. File Formats
7.1. Building Output Files (database/buildings/*.json)
Per-building simulation results with timeseries and aggregated statistics.
{
"buildingId": "building_001",
"outputs": {
"FinalEnergy": { "val": 125.4, "unit": "MJ" },
"UsefullEnergy": { "val": 112.8, "unit": "MJ" },
"IndoorTemperature": { "mean": 20.5, "std": 1.2, "unit": "degC" }
}
}
7.2. Run Manifest (run_<id>/manifest.json)
Metadata about the simulation run, including configuration and artifacts.
{
"run_id": "run_2025-01-15_14-30-00",
"location": "kernante",
"run_type": "Single",
"world_size": 1,
"start_time": 0.0,
"stop_time": 86400.0,
"step_time": 3600.0,
"backend": "json",
"artifact_base": "cemdb/locations/kernante/simulations/single/run_2025-01-15_14-30-00",
"artifacts": [
{ "dataset": "global_kpi_summary", "path": "database/stats/global_kpi_summary.parquet" },
{ "dataset": "report", "path": "database/report/report.json" }
],
"setup_summary": {
"buildings_total": 125,
"buildings_simulated": 120,
"buildings_skipped": 5,
"solar_shading_enabled": true,
"solar_shading_components": ["building", "terrain"],
"ideal_flows_enabled": false,
"outputs": {
"hdf5": true,
"csv": false,
"visualization": true,
"report": true,
"building_reports": false
}
}
}
world_size and artifact_base make it easy to resolve relative artifact paths outside the original runtime environment.
|
8. Manifest Schema Specification (v4)
The manifest v4 schema is the current standard for simulation result discovery and artifact management.
8.1. Schema Versions
| Version | Location | Description |
|---|---|---|
v1 |
|
Legacy per-database manifest |
v3 |
|
Intermediate schema with provenance |
v4 |
|
Run-root manifest with unified artifact discovery |
8.2. v4 Schema Definition
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "KUB Manifest v4",
"type": "object",
"required": ["schema_version", "run_id", "run_type", "artifact_base", "artifacts"],
"properties": {
"schema_version": {
"type": "integer",
"const": 4,
"description": "Schema version identifier"
},
"run_id": {
"type": "string",
"description": "Unique run identifier (e.g., run_2025-01-15_14-30-00)"
},
"run_type": {
"type": "string",
"enum": ["Single", "Ensemble"],
"description": "Simulation run type"
},
"location": {
"type": "string",
"description": "Geographic location identifier"
},
"world_size": {
"type": "integer",
"minimum": 1,
"description": "Number of MPI processes used"
},
"start_time": {
"type": "number",
"description": "Simulation start time (seconds)"
},
"stop_time": {
"type": "number",
"description": "Simulation stop time (seconds)"
},
"step_time": {
"type": "number",
"description": "Simulation time step (seconds)"
},
"backend": {
"type": "string",
"enum": ["json", "parquet"],
"description": "Output backend type"
},
"artifact_base": {
"type": "string",
"description": "Base path for resolving relative artifact paths"
},
"repository_root": {
"type": "string",
"description": "Repository root path (for run-root manifests)"
},
"artifacts": {
"type": "array",
"items": { "$ref": "#/$defs/artifact" },
"description": "List of output artifacts"
},
"setup_summary": {
"$ref": "#/$defs/setup_summary",
"description": "Simulation configuration summary"
},
"provenance": {
"$ref": "#/$defs/provenance",
"description": "Software version and build info"
},
"samples": {
"type": "array",
"items": { "type": "string" },
"description": "Ensemble sample identifiers (ensemble runs only)"
}
},
"$defs": {
"artifact": {
"type": "object",
"required": ["dataset", "path"],
"properties": {
"dataset": {
"type": "string",
"description": "Dataset identifier"
},
"path": {
"type": "string",
"description": "Relative path from artifact_base"
},
"type": {
"type": "string",
"enum": ["shared", "partitioned", "resource", "composite"],
"description": "Artifact type for MPI-parallel outputs"
},
"schema_id": {
"type": "string",
"description": "Schema identifier (e.g., kub.global_kpi_summary)"
},
"schema_version": {
"type": "string",
"description": "Schema version (e.g., 1.0)"
},
"sha256": {
"type": "string",
"description": "SHA-256 hash of file content"
},
"files": {
"type": "array",
"items": {
"type": "object",
"properties": {
"path": { "type": "string" },
"rank": { "type": "integer" },
"sha256": { "type": "string" }
}
},
"description": "Individual files for composite artifacts"
}
}
},
"setup_summary": {
"type": "object",
"properties": {
"buildings_total": { "type": "integer" },
"buildings_simulated": { "type": "integer" },
"buildings_skipped": { "type": "integer" },
"solar_shading_enabled": { "type": "boolean" },
"solar_shading_components": {
"type": "array",
"items": { "type": "string" }
},
"ideal_flows_enabled": { "type": "boolean" },
"outputs": {
"type": "object",
"properties": {
"hdf5": { "type": "boolean" },
"csv": { "type": "boolean" },
"visualization": { "type": "boolean" },
"report": { "type": "boolean" },
"building_reports": { "type": "boolean" }
}
}
}
},
"provenance": {
"type": "object",
"properties": {
"schema_version": { "type": "integer" },
"run_id": { "type": "string" },
"run_type": { "type": "string" },
"software": {
"type": "object",
"properties": {
"name": { "type": "string" },
"cem_version": { "type": "string" },
"feelpp_version": { "type": "string" }
}
},
"git": {
"type": "object",
"properties": {
"commit": { "type": "string" },
"branch": { "type": "string" },
"dirty": { "type": "boolean" }
}
},
"timestamp_utc": { "type": "string" },
"hostname": { "type": "string" }
}
}
}
}
8.3. Dataset Types
| Dataset ID | Description | Schema ID |
|---|---|---|
|
City-level KPI aggregates |
|
|
Building-level KPI aggregates |
|
|
Building KPI time series facts |
|
|
Building geolocation index |
|
|
Building outputs time series (HDF5) |
|
|
Building metadata (HDF5) |
|
|
Auto-deployed Jupyter notebook |
N/A |
|
Ensemble-aggregated city KPIs |
|
|
Ensemble-aggregated building KPIs |
|
9. StatsEngine Configuration
The StatsEngine processes building-level facts and produces aggregated KPI summaries.
9.1. Report Specification (JSON)
{
"window": {
"mode": "full",
"start_time": 0.0,
"end_time": 86400.0
},
"levels": ["city", "district"],
"kpis": ["FinalEnergy", "IndoorTemperature", "HeatingPower"],
"building_level": true,
"unit_validation_mode": "warn"
}
9.2. Report Specification (YAML)
window:
mode: full
start_time: 0.0
end_time: 86400.0
levels:
- city
- district
kpis:
- FinalEnergy
- IndoorTemperature
- HeatingPower
building_level: true
unit_validation_mode: warn
9.3. Configuration Options
| Field | Type | Description |
|---|---|---|
|
string |
Window mode: |
|
float |
Window start time in seconds (optional) |
|
float |
Window end time in seconds (optional) |
|
array |
Aggregation levels: |
|
array |
KPI identifiers to include (empty = all) |
|
bool |
Generate per-building summaries |
|
string |
Unit handling: |
9.4. KPI Catalog
The KPI catalog defines available metrics and their computation methods.
{
"kpis": [
{
"id": "FinalEnergy",
"unit": "kWh",
"reducer": "sum",
"aggregations": ["mean", "sum", "min", "max"]
},
{
"id": "IndoorTemperature",
"unit": "degC",
"reducer": "weighted_mean",
"aggregations": ["mean", "std", "min", "max"]
},
{
"id": "HeatingPower",
"unit": "W",
"reducer": "weighted_mean",
"parameters": {
"integration_method": "trapezoidal"
},
"aggregations": ["mean", "peak", "integrated"]
}
]
}
10. Parquet Schema Reference
10.1. Building KPI Time Series (kub.building_kpi_ts v1.0)
| Column | Type | Nullable | Description |
|---|---|---|---|
|
timestamp_s |
No |
Timestamp in seconds (UTC) |
|
int64 |
No |
Building identifier |
|
string |
No |
KPI identifier |
|
float64 |
No |
Observed value |
|
string |
Yes |
Unit string |
|
string |
Yes |
Run identifier |
|
string |
Yes |
Scenario identifier |
|
string |
Yes |
Ensemble sample identifier |
10.2. Global KPI Summary (kub.global_kpi_summary v1.0)
| Column | Type | Nullable | Description |
|---|---|---|---|
|
string |
No |
Aggregation level |
|
string |
No |
Entity identifier |
|
string |
No |
KPI identifier |
|
string |
No |
Metric name (mean, sum, std, etc.) |
|
float64 |
No |
Metric value |
|
string |
Yes |
Unit string |
|
int64 |
Yes |
Sample count |
|
timestamp_s |
Yes |
Window start |
|
timestamp_s |
Yes |
Window end |
|
string |
Yes |
EPC proxy mode |
10.3. Building KPI Summary (kub.building_kpi_summary v1.0)
| Column | Type | Nullable | Description |
|---|---|---|---|
|
string |
Yes |
Ensemble sample identifier |
|
int64 |
No |
Building identifier |
|
string |
No |
KPI identifier |
|
string |
No |
Metric name |
|
float64 |
No |
Metric value |
|
string |
Yes |
Unit string |
|
int64 |
Yes |
Sample count |
|
timestamp_s |
Yes |
Window start |
|
timestamp_s |
Yes |
Window end |
|
string |
Yes |
EPC/DPE class label |
|
string |
Yes |
EPC proxy mode |
10.4. Building Spatial Index (kub.building_spatial_index v1.0)
| Column | Type | Nullable | Description |
|---|---|---|---|
|
int64 |
No |
Building identifier |
|
float64 |
No |
Centroid longitude (EPSG:4326) |
|
float64 |
No |
Centroid latitude (EPSG:4326) |
|
string |
No |
District scheme identifier |
|
string |
No |
District identifier |
|
string |
Yes |
LAU identifier |
|
string |
Yes |
NUTS1 identifier |
|
string |
Yes |
NUTS2 identifier |
|
string |
Yes |
NUTS3 identifier |
10.5. Ensemble Summary (ensemble/stats/global_kpi_summary.parquet)
Aggregated statistics across all ensemble samples, produced by the StatsEngine ensemble aggregator.
The schema matches global_kpi_summary.parquet with metric values such as
ensemble_mean, ensemble_std, ensemble_min, ensemble_max, and ensemble_p05/p50/p95.
11. Migration Guide
11.1. Existing Data
Old structure (before v1.0):
simulations/
`-- run_<timestamp>/
|-- database/
`-- ensemble/
New structure (v1.0+):
simulations/
|-- single/
| `-- run_<timestamp>/
| `-- database/
`-- ensemble/
`-- run_<timestamp>/
|-- member_<N>/database/
`-- ensemble/
Migration Steps:
-
Identify run type from existing data
-
Move single runs to
simulations/single/ -
Move ensemble runs to
simulations/ensemble/ -
Update scripts to use the
RunTypeparameter
Migration Script (Python example):
import shutil
from pathlib import Path
cemdb_root = Path("/path/to/cemdb")
location = "kernante"
simulations = cemdb_root / "locations" / location / "simulations"
(simulations / "single").mkdir(exist_ok=True)
(simulations / "ensemble").mkdir(exist_ok=True)
for run_dir in simulations.glob("run_*"):
if (run_dir / "database").exists():
has_ensemble = (run_dir / "ensemble").exists()
has_members = any(run_dir.glob("member_*") )
if has_ensemble or has_members:
target = simulations / "ensemble" / run_dir.name
shutil.move(str(run_dir), str(target))
else:
target = simulations / "single" / run_dir.name
shutil.move(str(run_dir), str(target))
12. Benefits of the Separated Structure
-
Clarity: run type is visible from filesystem layout
-
Organization: easier to manage and archive different run types
-
Performance: smaller directories for listings
-
Policies: apply different retention policies per run type
-
Tooling: no need to parse run metadata
-
Debugging: easier to locate runs during development
13. Common Workflows
13.1. Single Simulation
PathManager::instance().initialize(
"/data/cemdb",
"kernante",
"run_2025-01-15_14-30-00",
std::nullopt,
RunType::Single
);
auto model = cityEnergyModel();
auto instance = model->newInstance(startTime, stopTime, stepTime);
instance->execute();
// Results saved to:
// /data/cemdb/locations/kernante/simulations/single/run_2025-01-15_14-30-00/database/
13.2. Ensemble Simulation
PathManager::instance().initialize(
"/data/cemdb",
"kernante",
"run_2025-01-15_15-00-00",
std::nullopt,
RunType::Ensemble
);
auto model = cityEnergyModel();
EnsemblePlan plan = loadEnsemblePlan("uncertainty.json");
auto stats = model->executeEnsemble(plan, startTime, stopTime, stepTime);
// Results saved to:
// /data/cemdb/locations/kernante/simulations/ensemble/run_2025-01-15_15-00-00/
// |-- member_000/database/
// |-- member_001/database/
// `-- ensemble/
13.3. Notebook Analysis
from feelpp.ktirio.ub import core as ktirio_core
from pathlib import Path
import json
pm = ktirio_core.PathManager.instance()
pm.initialize(
customRoot="/data/cemdb",
locationName="kernante",
runId="run_2025-01-15_15-00-00",
runType=ktirio_core.RunType.Ensemble
)
ensemble_dir = Path(pm.ensembleDir())
statistics_file = ensemble_dir / "statistics.json"
with open(statistics_file) as f:
stats = json.load(f)
print(f"Mean FinalEnergy: {stats['outputs']['FinalEnergy']['mean']} MJ")
print(f"95% CI: [{stats['outputs']['FinalEnergy']['ci_lower']}, "
f"{stats['outputs']['FinalEnergy']['ci_upper']}]")
14. Troubleshooting
RunType not found error
-
Cause: older code not updated to use the
RunTypeparameter -
Solution: add
RunType::SingleorRunType::Ensembletoinitialize()calls
Ensemble statistics saved in wrong directory
-
Cause: using
currentRunDir()instead ofensembleDir() -
Solution: always use
ensembleDir()for ensemble-level statistics
Cannot find simulation outputs
-
Cause: looking in the old
simulations/structure -
Solution: use
simulations/single/orsimulations/ensemble/