Conceptual

Conceptual covers all of the basic components of ISO/IEC 11179 as captured and represented in the GSIM Concepts Group http://www1.unece.org/stat/platform/display/GSIMclick/Concepts+Group
The Conceptual package will review all parts of the GSIM Concepts Group content and determine if and where adjustments need to be made to interact well with the overall mandate of DDI regarding support for work outside of the GSIM sphere and required levels of abstraction.

Namespace URI: 
http://ddialliance.org/ddi4/conceptual
Color of objects in the uml graph: 
218559
Include in build?: 
True
Weight: 
10

Comments

In our package there is no NodeSet/Level relationship. Instead there is a CodeList/Level relationship with an explanation that that this relationship can be extended to Statistical Classifications and Category Sets. Is there a reason why we didn’t model the NodeSet/Level relationship in which event all the NodeSet subtypes would have a Level relationship by inheritance. I notice that NodeSet is abstract. But that shouldn’t matter, should it?

Why does Statistical Classification extend CodeList? Shouldn’t it extend NodeSet?

It is true that Statistical Classification is like CodeList except it has constraints that CodeList doesn’t like mutual exclusivity of classifications items at a Level. Would this argue against an extension relationship with CodeList?

It is the case that Variable "doesn't belong in" the Conceptual package. However, it is important to Conceptual because ConceptualVariable (GSIM's variable) is specialized by RepresentedVariable which in turn is specialized by Variable (GSIM's InstanceVariable). However I can't find a "Variable" in the library.

It is important that all three constructs be represented because of the relationships they have with one another. Of special interest is the one-to-many relationship between RepresentedVariable and Variable (GSIM's InstanceVariable).

Additionally, what also tells this story about RepresentedVariables and InstanceVariables is that a RepresentedVariable has a UnitOfAnalysis and an InstanceVariable has a Population. UnitOfAnalysis goes to what in principle is being measured. Population goes to what in fact in this instance was measured.

- Composition and aggregations are reversed: the diamonds should be on the whole rather than the part (e.g. Node -> Node Set, Node -> Level, etc).
- The semantics of some associations seem wrong: e.g. I think hasCategorySet -> CategorySet shouldn't be a composition since you don't need to delete the CategorySet once it's removed from the hasCategorySet attribute.

I moved Represented Variable from the Conceptual View to the Logical Data Description. This is consistent with DDI 3 and will facilitate maintenance because the Represented Variable and Instance Variable are very connected through the Value Domain. Also, it turns out that DataPoint and Datum are pivotal to the Physical Data Description so these objects have been moved from the Logical to the Physical Data Description.

In London we decided to have a set of standard properties (label, description, definition, language, comments and notes) to be used in a consistent way. That means, among other things, that we need to make use of the class hierarchy so that if an ancestor of a class has a given property we shouldn't add it to the subclass as well. However, that is not the case in the Conceptual Package: e.g. InstanceVariable is a subclass of Concept (three levels down) and yet it has its own Label and Definition properties again. This may be happening in other packages as well. We need to clean that up.

Since we now have the distinction between Universe and Population, I associated InstanceVariable with Population rather than Universe, because Population is the one associated to Unit and thus used at the study level (it has time and space constraints).

UnitType is associated to RepresentedVariable, but shouldn't it be associated to ConceptualVariable instead? I think RepresentedVariable should rather be associated to either Universe or Population, but definitely not UnitType, which is conceptual.

What's the reason for having a DataType object in the conceptual model? It's currently associated only to InstanceVariable and RepresentedVariable. Isn't it enough for Variables to have ValueDomains? It seems to me that's all we want to say in the conceptual model, data types should be part of the implementation/bindings.

dan's picture

The datatype is not inferable from the Value Domain. The datatype informs a user about what operations are available on the values under consideration. If we limit ourselves to the basic statistical datatypes - nominal, ordinal, interval, and ratio - here are some examples that illustrate:

1) Temperature can be measured in Fahrenheit, Celsius, and Kelvin. For the first 2 units of measure, one may take differences but not ratios. It is wrong to say 25 deg F is a third as warm as 75 deg F. Same with Celsius. However, 75 deg F is 50 deg F warmer than 25 deg F. Same with Celsius. Same with Kelvin. However, in Kelvin, 25 deg K IS a third as warm as 75 deg K. Therefore, we say Celsius and Fahrenheit measures have an Interval datatype, whereas Kelvin measures are Ratio.

2) Take the typical Sex Codes classification
<0, male>
<1, female>
Does the fact that 0 and 1 have an ordering mean anything here? You don't want it to, as codes are arbitrary, so you declare the Nominal datatype. This corresponds to the fact that male and female as categories have no ordering. But, the declaration makes it clear.

3) An example that is a little more confusing is educational attainment. To keep it simple, let the categories and codes be as follows:
<1, elementary>
<2, high school>
<3, college>
<4, post-graduate>
One could say these values (or categories) have an ordering, thus you would assign Ordinal to this. But, Ordinal with numeric codes suggests to people they can take averages, which leads to loads of confusion. This is not Interval data, and that means differences are not necessarily comparable. Taking averages requires comparable intervals, and doing so here leads to nonsense. Finally, it is an interesting question whether assigning Nominal to this might work just as well. The ordering is not so well-defined.

4) A preference scale on the other hand needs to be Ordinal, as preferences have an intrinsic order. As we said above, the codes are not important, so you could have the following (simple) scale:
<3, dislike>
<1, neutral>
<2, like>
The codes are arbitrary. Typically, though, we choose them as mnemonics to coincide with the available computation. But, look what will happen with averages here!

5) Currencies are often listed in units.hundredths. For example, 5 dollars and 22 cents is 5.22. This leads to the usage of Real datatypes, yet a Scaled datatype (see ISO/IEC 11404 for a detailed explanation) is far more appropriate.

There is a difference between the datatype in some application (often currencies can only be represented as Real numbers) and the intended one (Scaled in the case of currencies). So, the RV has the intended datatype specified, whereas the IV has the application one.

I see, it makes sense now.

Maybe we should then model the DataType object a little bit more to cover all those cases, because right now it doesn't even have a definition, nor examples, nor controlled vocabulary. In fact, its only property is a scheme (?) of type InternationalString.

InstanceVariable has a property of type StandardKeyValuePair. Having such generic and untyped containers break interoperability. Do we need it in InstanceVariable?

I just noticed that some associations, like parent and child in ConceptParentChild, are properly defined in the model but are not rendered in the diagram below.

Package Graph