kub-enrich-gis

1. Overview

kub-enrich-gis enriches building records from GIS data (geo/gis.json) using external national building databases. It supports:

  • enrich: fetch matches and produce an enrichment dataset (json, jsonl, or partitioned parquet)

  • inspect: inspect enrichment output and manifest statistics

  • materialize: apply high-confidence enrichment fields back into geo/gis.json

2. Installation

Install optional enrichment dependencies:

pip install feelpp-ktirio-urban-building[enrichment]

3. Synopsis

kub-enrich-gis [-h] [-v] {enrich,inspect,materialize} ...
Help text may display the internal program name ktirio-urban-building-enrich-gis; the installed CLI entry point is kub-enrich-gis.

4. Commands

4.1. enrich

Read input GIS buildings and generate enrichment output.

kub-enrich-gis enrich --input <gis.json> --output <path> [OPTIONS]

Options:

Option Description

--input, -i PATH

Path to input GIS JSON file (required)

--output, -o PATH

Output file or directory path (required)

--format, -f {json,jsonl,parquet}

Output format (default: jsonl)

--country ISO2

Country code (default: FR)

--selector PATH

JSON selector for buildings (default: building)

--compression {none,gzip,zstd}

Output compression (default: gzip)

--partition COLS

Parquet partition columns, comma-separated (default: country,spatial_key)

--spatial {none,grid,h3}

Spatial partitioning mode (default: grid)

--grid-res FLOAT

Grid resolution in degrees (default: 0.02)

--h3-res INT

H3 resolution (default: 9)

--row-group INT

Parquet row group size (default: 50000)

--include-raw

Include raw API payloads in output

--cache PATH

SQLite cache database path

--cache-ttl-days INT

Cache TTL (default: 30)

--rate FLOAT

API rate limit in requests/second (default: 2.0)

--burst INT

API burst limit (default: 5)

--rnb-base-url URL

RNB API base URL

--bdnb-base-url URL

BDNB API base URL

--bdnb-rnb-endpoint PATH

BDNB endpoint used for RNB lookup

--force

Force operation (overwrite / bypass limits)

4.2. inspect

Inspect an enrichment file or dataset directory.

kub-enrich-gis inspect <path> [-v]

Options:

Option Description

path

Dataset/file to inspect

-v, --verbose

Print detailed metadata (including full manifest when available)

4.3. materialize

Apply enrichment results into GIS data.

kub-enrich-gis materialize --gis <geo/gis.json> --enrichment <path> [OPTIONS]

Options:

Option Description

--gis PATH

GIS file to update (required)

--enrichment PATH

Enrichment file or directory (json, jsonl, parquet) (required)

--output PATH

Optional output GIS path (default: in-place update)

--min-confidence FLOAT

Minimum confidence to apply (default: 0.80)

--override-existing

Allow overwriting existing GIS values

--no-manifest-update

Do not update location manifest enrichment metadata

--no-backup

Disable automatic .bak backup during in-place updates

--dry-run

Compute updates without writing files

5. Examples

# 1) Enrich GIS buildings to JSONL
kub-enrich-gis enrich \
  --input cemdb/locations/arz/v0.1.0/geo/gis.json \
  --output cemdb/locations/arz/v0.1.0/preprocessing/enrichment/buildings.jsonl

# 2) Enrich to partitioned Parquet dataset
kub-enrich-gis enrich \
  --input cemdb/locations/arz/v0.1.0/geo/gis.json \
  --output cemdb/locations/arz/v0.1.0/preprocessing/enrichment/ \
  --format parquet \
  --partition country,spatial_key \
  --compression zstd

# 3) Inspect generated data
kub-enrich-gis inspect cemdb/locations/arz/v0.1.0/preprocessing/enrichment/

# 4) Preview materialization without writing
kub-enrich-gis materialize \
  --gis cemdb/locations/arz/v0.1.0/geo/gis.json \
  --enrichment cemdb/locations/arz/v0.1.0/preprocessing/enrichment/ \
  --min-confidence 0.85 \
  --dry-run

# 5) Apply high-confidence matches in place
kub-enrich-gis materialize \
  --gis cemdb/locations/arz/v0.1.0/geo/gis.json \
  --enrichment cemdb/locations/arz/v0.1.0/preprocessing/enrichment/ \
  --min-confidence 0.85

6. Exit Codes

  • 0: success

  • 1: error (missing dependencies, invalid input paths/options, pipeline/materialization failure)

  • kub-dataset for packaging and distribution of enriched datasets