Data Storage and Management Strategy Documentation
1. Introduction
This document outlines the strategy for storing and managing research data generated by the CoE Hidalgo2 pipeline. The data includes various components, such as GIS data, building geometries, solar masks, and associated metadata. Proper data organization and management are essential to ensure data integrity, accessibility, and usability.
2. Folder Structure
The data is organized into a hierarchical folder structure to facilitate efficient data retrieval and organization. The structure is as follows:
<city>/<address>/<date+time>/
solarmasks/
meshes/
gis/
models/
simulations/
-
<city>: Represents the name of the city for which the data is collected. -
<address>: Specifies the address or central location within the city. -
<date+time>: Contains timestamped folders indicating when the data was collected.
Within each <date+time> folder, subfolders are used to categorize different types of data.
3. Metadata JSON File
A JSON file is stored within each <date+time> folder to provide essential metadata associated with the data. The JSON file contains the following fields:
-
city_name: Name of the city. -
radius_meters: Radius in meters around the specified address. -
creation_date_time: Timestamp indicating when the data was created. -
creator_name: Name of the person responsible for data generation. -
gis_resolution: Resolution of the GIS data used. -
building_definition_level: Level of definition of building geometries. -
total_buildings: Total number of buildings in the dataset. -
total_building_faces: Total number of faces of the buildings. -
has_vegetation: Boolean flag indicating the presence of vegetation data. -
has_mountains: Boolean flag indicating the presence of elevation (mountains) data. -
has_roads: Boolean flag indicating the presence of road data.
This metadata provides crucial context for understanding and utilizing the dataset effectively.
4. Data Storage Considerations
-
Large data, such as mesh files, are stored efficiently using MongoDB’s GridFS.
-
MongoDB is used as the database management system, offering scalability and flexibility for handling complex data structures.
5. Best Practices
-
Implement data compression techniques to reduce storage requirements.
-
Regularly monitor and optimize the database for performance.
-
Ensure data security through access controls and encryption.
-
Establish data backup and redundancy strategies to prevent data loss.
-
Use appropriate indexing for efficient queries.