Skip to content

Lunaris

FAIR Data Integration

Lunaris Case Study Infobox

  • Author: Tristan Kuehn (@tkkuehn)
  • Last updated: 2025-01-30
  • Mapping Type: Mapping Type
  • Status of this case study: Status

Mapping Canadian metadata records from their source schema to a unified schema for centralized discovery

Domain

The mappings apply to Canadian research (regardless of scientific domain) and government and government open data.

Purpose of the mapping

Discovery

Other purpose of the mapping

Harvest and aggregate Canadian metadata records at https://lunaris.ca

Type of mapped resources

Metadata records describing datasets from a variety of data sources are mapped to an internal metadata schema. Those sources represent their metadata using different schemas, some of which are sparsely documented. Fields from a source schema are therefore mapped to fields in the Lunaris schema, but in some cases these mappings are not direct and require some conditional logic. Geospatial fields indicating a region that is relevant to the dataset are a particular point of emphasis.

Our mappings are not isolated enough from the rest of the metadata harvesting code to be shareable concisely and the code is not currently open, but we aim to make those mappings public as this group's work progresses.

Tools used for creating the mapping

Source metadata is collected with API calls to our data sources, then Python code performs the mappings.

Type of mapping relations

  • Direct one-to-one synonymy (e.g. titles, full names)
  • Many-to-one mappings (e.g. file-level access conditions collated to a dataset-level description of access, addition of auxiliary description fields to an overall description field.

Examples (samples) of different types of mapping implementations

Here's an example of a many-to-one field mapping with some conditional logic as part of a broader mapping from the Dataverse API (example record) to Lunaris (post-mapping record).

# dataverse_record is the source record crawled from the Dataverse API
# We search the whole dataset for any file that is flagged as restricted
access = "Public"
if "files" in dataverse_record["latestVersion"]:
    for file in dataverse_record["latestVersion"]["files"]:
        if "restricted" in file and file["restricted"]:
            access = "Restricted"
            break