Data Enrichment

A high-level description of the types of data enrichment which happens on top of publicly available data which 2050 Materials collects.

Data Enrichment

At 2050 Materials, we go beyond basic data collection. Our platform enriches raw product and material data to ensure completeness, consistency, and usability across real-world workflows. Below is an overview of the types of enrichment we apply to the data accessible via our API.

Data Enrichment Overview

The 2050 Materials platform applies a comprehensive enrichment pipeline to raw construction product data to ensure high data quality, completeness, and structural consistency. This page outlines the enrichment logic from a systems and integration perspective.

Our goal is to make environmental product data immediately usable in digital workflows across LCA automation, BIM integrations, procurement optimization, and reporting pipelines.

1. Fallback Values for Incomplete Data

Many construction products are published with missing or inconsistent data fields. To ensure operability of downstream calculations, our system estimates key properties using statistically derived fallback values.

Supported Fields

density (kg/m³) — for mass-volume conversions
grammage (kg/m²) — for area-based estimates
linear_density (kg/m) — for length-based products
mass_per_piece (kg/piece) — for discrete units
thickness, cross_sectional_area — for geometry calculations
thermal_conductivity, porosity, compression_strength, life_expectancy
GWP_A1-A3 and biogenic_CO2 (when EPDs are unavailable)

Fallbacks are generated per (material_type, product_type) combinations using non-null statistical aggregates (e.g., median, count thresholds). These are updated monthly to reflect the latest corpus distributions and reduce propagation of stale estimates.

All estimated fields are marked with a metadata flag: [field]_estimated = True

2. Product Classification Mapping

All product records are assigned to one or more classification systems for improved interoperability:

Supported Classification Systems

Uniclass 2015 (Products, Systems, Materials)
RICS NRM
2050 Materials internal taxonomy (see here)

Classification is applied using a hybrid pipeline:

Rule-based mappings for structured inputs
AI-assisted inference (using product name, manufacturer, description fields) when structured data is missing or ambiguous

3. Keyword Tagging & Semantic Metadata

Each product is enriched with a curated set of semantic tags used for:

Enhanced search (e.g. recyclable, natural material , VOC-free, acoustic, bio-based)
Filtered queries via API or UI
Compatibility with procurement/specification workflows
Significant mapping capabilities used under the automated mapping endpoint

Tags are periodically revised and versioned to reflect industry evolution and terminology updates.

4. EPD Parsing and Structuring

When product-specific EPDs are available, our pipeline uses AI-driven document parsing to extract structured data directly from PDFs. This ensures that the wealth of information within EPDs is made machine-readable and analysis-ready.

Extracted EPD Fields

Product Identification & Company Information: Product name, descriptive name, detailed description, company name, contact information (email, address, website), Global Trade Item Number (GTIN), and EPD registration number.
Product Classification & Attributes: Product type, material type, applicable building types, manufacturing location, and various physical and performance attributes (density, grammage, linear density, mass per piece, thickness, cross-sectional area, fire performance, color, warranty, texture, steel grade, U-value, porosity, compression strength, impact strength, thermal conductivity, elasticity/plasticity, abrasion resistance, corrosion resistance, weathering resistance, solar heat gain coefficient, slip resistance, acoustic performance, maintenance, life expectancy).
MEP/HVAC Specifics: Concrete mix, consistence class, cooling capacity, heating capacity, rated heat power, refrigerant type, refrigerant charge, tank volume, flow rate, head, nominal thermal power, useful power, charging power (3-phase, 1-phase), and charging ampere.
Certificate Details: Type of certificate, date of issue, and expiry date.
Performance attributes (e.g., fire rating, durability, thermal resistance)
Standard references (EN 15804, ISO 14025, etc.)
Life Cycle Assessment (LCA) Data: Comprehensive LCA impact indicators across various modules (A1, A2, A3, A1A2A3, A4, A5, A4A5, B1-B7, Btotal, C1-C4, C2C3C4, Ctotal, D), including resource use (water, primary energy, secondary fuels, secondary materials), waste disposal, and environmental potentials (abiotic depletion, acidification, ozone depletion, eutrophication, tropospheric ozone formation, global warming potential, water deprivation, ecotoxicity, human toxicity, ionising radiation, particulate matter formation, soil quality).
Material Facts: Compliance standards, EPD operator, Product Category Rules (PCR) name, declared unit and quantity, mass per declared unit, and EPD language.
Detailed Material Composition: Breakdown of constituent materials, including percentages by weight, and specific details on biogenic carbon content and recycled/bio-based content.
Packaging Information: Specific materials used in packaging, total packaging weight, and notes on packaging reuse systems.
LCA Study Context: Information on LCA consultant, software/database versions used, data collection period, geographical scope, and key modeling assumptions.
Detailed Scenario Parameters: Granular details about transport, installation, use (e.g., carbonation), demolition, and end-of-life assumptions.
Product Variations & Scaling Factors: Identifiers for product variations, explicit per-declared-unit multipliers, and additional technical data for each variant.
Module Scenarios: Documentation of different end-of-life or other life cycle module scenarios presented in the EPD, including descriptive names and associated EPD module labels.
Unit Transformations: Records of unit conversions performed during data extraction to align with required schema units.

All parsed values are stored in a structured format and exposed via the API via the API under the get_products endpoint response.

Structured EPDs are versioned and associated with traceable source metadata.

5. Internal Material Statistics (IMDB)

We maintain an Internal Materials Database (IMDB) derived from the statistical aggregation of product facts. It is used for:

Generic material profiles (for early-stage design tools)
Benchmarking and validation
Fallback value generation (see more here)

The IMDB is updated in tandem with enrichment tasks, ensuring current and statistically representative reference data.

6. QA, Signals, and Model-Driven Validation

Data updates and inserts trigger internal QA signals:

Auto-corrections of dependent fields (e.g. mass_per_unit from density × volume)
Auto-propagation of classification changes
Detection and resolution of missing or inconsistent technical fields

In addition, we use:

AI classifiers to resolve ambiguous or unclassified records
Statistical outlier detection to flag anomalies for review
Manual override tools (via QA dashboards) for expert validation

Each record includes a data quality summary with validation flags and enrichment provenance.

7. Update Schedule

Enrichment processes run on a monthly or weekly cadence (depending on the topic):

Fallbacks: recalculated for all (MT, PT) combinations
Scaling factors: updated based on material-specific densities and units
Biogenic corrections: refreshed for accuracy based on new inputs
EPD parsing: runs continuously as new documents are ingested

All updates are atomic and version-tracked. Integrators can optionally subscribe to change logs via webhook.

Summary

2050 Materials delivers enriched product data that’s:

Statistically complete (even when input data is sparse)
Structurally interoperable (mapped across multiple taxonomies)
Semantically searchable (with contextual tags and attributes)
Machine-readable and analysis-ready (parsed EPDs, normalized units)
Traceable, versioned, and QA’d (with full enrichment metadata)

This ensures the data you pull via our API is not only technically correct—but also ready for real use in carbon estimation, procurement automation, early design feedback loops, and beyond.

For more information or to request enrichment coverage for a new material category, contact: [email protected]

PreviousData Structuring & QA Next2050 Materials Data Framework

Last updated 1 month ago

Was this helpful?