DDI4 data management description aims to account for the ingestion and production of new data types (registry data, health data, big data, spell data, event data, etc.) and both legacy and new data management services that give shape to these data types in the course of the data lifecycle. The Data Management View describes the prospective and retrospective use of multiple data management platforms and architectures including (1) ESB (Enterprise Service Bus) and SOA (Service Oriented Architecture); (2) the use of PROCs/Commands in statistical packages like SAS, Stata and R; and now (3) the use of iPaaS (integration Platform as a Service) in public clouds, private clouds and apparatuses as practiced by various ETL (Extract, Transform and Load) platforms.

Use Cases:

Create repeatable processes across a data network. More specifically, document and share the specifications for a demographic and epidemiological surveillance DataPipeline across surveillance sites.

Produce a Data Management Plan (DMP) in the form of a DataPipeline so other researchers are able to replicate a study’s results.

Document the actual data management in a study as a DataPipeline. This could underpin workflow tools for researchers.

Use a DataPipeline description as the input to tools that trace the lineage of data during the data lifecycle of a Study.

Use a DataPipeline description and the GraphML it spawns to create workflow diagrams.

Specialize the GLBPM to support the production of a dataset of geotagged tweets from the US where a wave corresponds to a day. Create a DataPipeline that describes in detail how this dataset is produced.

Programmatically create a Data Management View for an Extract, Transform and Load (ETL) platform using the ETL’s authoring environment and the instructions that authors create as input.

Target Audiences:

Researchers who are preparing a Data Management Plan (DMP).

Data networks migrating from an Enterprise Service Bus (ESB) / Service Oriented Architecture (SOA) platform to a virtual (cloud-based) or actual integrated Platform as a Service (iPaaS) appliance (e.g. ETLs).

Industry-specific or generic standard groups who wish to integrate fully developed information models with domain-specific business process models.

Search engines intent on exposing data lineage within a study.

General Documentation:

At one level the Data Management View consists of a data pipeline that traverses a series of business activities from business process models like the GSBPM (Generic Statistical Business Process Model) for the production of statistics and the GLBPM (Generic Longitudinal Business Process Model) for the description of longitudinal studies. At another level the Data Management View decomposes these business processes into a series of workflow steps. At both levels components exchange data.

Include in build?: 

Graph for view