Language resource management — Component Metadata Infrastructure (CMDI) — Part 1: The Component Metadata Model

ISO 24622:2015 describes a model that enables the flexible construction of interoperable metadata schemas for Language Resources (LRs). The metadata schemas based on this model can be used to describe resources at different levels of granularity (e.g. descriptions both on the collection level and on the level of individual resources).

Gestion des ressources langagières — Composante infrastructure de métadonnées (CMDI) — Partie 1: Composant modèle de métadonnées

Upravljanje z jezikovnimi viri - Infrastruktura komponentnih metapodatkov (CMDI) - 1. del: Model komponentnih metapodatkov

Področje uporabe tega dela standarda ISO 24622 je opis modela, ki omogoča prilagodljivo zgradbo interoperabilnih shem metapodatkov za jezikovne vire (LR). Sheme metapodatkov, ki temeljijo na tem modelu, je mogoče uporabiti za opisovanje virov na različnih ravneh granularnosti (npr. opise tako na ravni zbirke kot tudi na ravni posameznega vira).

General Information

Status
Published
Publication Date
19-Jan-2015
Current Stage
9093 - International Standard confirmed
Completion Date
09-Jun-2020

Buy Standard

Standard
ISO 24622-1:2018 - BARVE
English language
17 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 24622-1:2015 - Language resource management -- Component Metadata Infrastructure (CMDI)
English language
11 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO 24622-1:2018 - BARVE na PDF-str 7,13,15,16,17
English language
17 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)

SLOVENSKI STANDARD
SIST ISO 24622-1:2018
01-september-2018
Upravljanje z jezikovnimi viri - Infrastruktura komponentnih metapodatkov (CMDI) -
1. del: Model komponentnih metapodatkov
Language resource management -- Component Metadata Infrastructure (CMDI) -- Part
1: The Component Metadata Model
Gestion des ressources langagières -- Composante infrastructure de métadonnées
(CMDI) -- Partie 1: Composant modèle de métadonnées
Ta slovenski standard je istoveten z: ISO 24622-1:2015
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24622-1:2018 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24622-1:2018

---------------------- Page: 2 ----------------------
SIST ISO 24622-1:2018
INTERNATIONAL ISO
STANDARD 24622-1
First edition
2015-02-01
Language resource management —
Component Metadata Infrastructure
(CMDI) —
Part 1:
The Component Metadata Model
Gestion des ressources langagières — Composante infrastructure de
métadonnées (CMDI) —
Partie 1: Composant modèle de métadonnées
Reference number
ISO 24622-1:2015(E)
©
ISO 2015

---------------------- Page: 3 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2015
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2015 – All rights reserved

---------------------- Page: 4 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Terms and definitions . 1
3 Metadata schema availability and reuse . 5
3.1 Overview . 5
3.2 Metadata components and elements . 5
4 Semantics in the component metadata model . 7
4.1 Overview . 7
4.2 Concept registries . 8
4.3 Relation registries . 8
5 Metadata component and profile - compatibility and versioning .9
6 Expressiveness of the component metadata model . 9
Annex A (informative) Abbreviations .10
Bibliography .11
© ISO 2015 – All rights reserved iii

---------------------- Page: 5 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any
patent rights identified during the development of the document will be in the Introduction and/or on
the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical Barriers
to Trade (TBT), see the following URL: Foreword — Supplementary information.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
ISO 24622 consists of the following part, under the general title Language resource management —
Component metadata infrastructure (CMDI):
— Part 1: The component metadata model
A future part will address the component metadata specific language.
iv © ISO 2015 – All rights reserved

---------------------- Page: 6 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

Introduction
Component Metadata (CMD) is an approach to metadata modelling and metadata creation. It is being
increasingly used these days to enable the metadata description of different types of Language
Resources (LRs) with different metadata schemas, while still trying to maintain syntactic and semantic
interoperability.
1)
[1]
CMD is also the core of the Component Metadata Infrastructure (CMDI) : this infrastructure contains
not only the format specifications for this metadata modelling and creation approach, but also a set of
registries and tools for metadata modelling and creation work.
The advantages of having such a unified approach to metadata descriptions for LRs, an approach that
will be usable by many projects and initiatives, are obvious: firstly, there is a better chance of obtaining
interoperability between metadata descriptions from different sources, and secondly, it will be possible
to develop and share tools that work much more efficiently in this metadata framework.
The challenge of designing and organizing a comprehensive and unified approach to metadata
description for the very varied set of LR types, and one that also can satisfy a sufficiently large section
of the LR community, should not be underestimated. The landscape of metadata for LRs has been, and
continues to be, fragmented. Until recently, it was the practice in creating the metadata descriptions for
LRs to choose a specific metadata schema from a (small) existing set derived either from widespread
[2] [3]
traditions or from other disciplines; for example, OLAC is an adapted version of DCMI, which in turn
originates in the library world. Additionally, there are, for the purposes of LR metadata description,
specifically developed metadata schemas that can be limited in application to specific types of LR (e.g.
2)
[4]
IMDI ), or they can be of a proprietary nature (cf. the catalogues of the LR agencies such as LDC and
3)
ELRA ). The result is a domain of LR metadata that is far from interoperable. Although some progress
has been made in developing dedicated bridges for “translating” metadata from one specific schema to
another and in providing a consolidated catalogue, this practice does not scale well since it depends on
specific translations for each pair of different metadata schemas.
For some recent projects, founding principles have included the unification and consolidation of practices
and the need to produce efficient and sufficiently specific metadata descriptions.
It follows that a number of international, European, and national projects and infrastructure initiatives
[5] [6]
such as CLARIN and META-SHARE now share the CMD approach to metadata for LRs. This
International Standard will both standardize the fundamentals of this approach in order to achieve
interoperability based on solid documentation, and foster cooperation between the various initiatives
and projects that work on, and with, this International Standard.
The model description is the first part of an infrastructure that forms a complete package for the
creation of metadata schemas. As stated in the Foreword, the complete infrastructure standard contains,
in addition to this component metadata model specification (ISO 24622-1), one or more metadata
component specification languages (planned), and a number of recommended metadata components
and profiles (planned). Since this part of ISO 24622 specifies an abstract model, we will rely mainly on
[7]
UML to describe it.
Figure 1 — Describing resources with metadata
1) Abbreviations are explained in Annex A.
2) Linguistic Data Consortium, http://www.ldc.upenn.edu/
3) European Language Resources Association, http://www.elra.info/
© ISO 2015 – All rights reserved v

---------------------- Page: 7 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

This part of ISO 24622 addresses the basic need to provide a model that makes it easy for metadata
modellers (e.g. researchers and resource description experts) to create new metadata schemas, which
can in turn be used either to describe new types of resources or to enable a more appropriate description
for resources in specific circumstances. The metadata schema is instantiated into metadata records [i.e.
the metadata descriptions that describe the actual resource(s)] (see Figure 1).
The context of this desire for flexible metadata modelling is that for scientific work there are usually
various requirements for the proper description of LRs, and these requirements can derive from the
specific needs of a project or from the facility or repository that will be used to store the resource for
future use. This variation requires a flexible framework that enables the easy creation of new metadata
schemas for different purposes, but is also a framework (i) in which the instantiations have a strictly
defined format so that at least syntactic correctness can be checked, and (ii) which provides explicit
semantics for the metadata schema elements for interpretation of the metadata record content.
The metadata descriptions generated by schemas compliant with this model will also be compliant
with other TC 37 International Standards, for example, those requiring that references to the described
[9]
resources and resource parts use ISO 24619:2011 PISA-compatible persistent identifiers (PIDs) .
The definition of a resource in this context is very broad. This part of ISO 24622 takes a pragmatic view:
for example, an image can be a resource in itself when it is associated with a PID and can be referenced
as such, or it can be part of a document where it lacks an identity of its own. In addition, a reference can
point to a part of this image. An individual resource can stand alone in one environment and be treated
as part of a collection in another environment. Also, metadata descriptions describe resources, but they,
too, are a resource in different contexts. This part of ISO 24622 needs to support all such cases, and the
model needs to provide descriptions at all levels of granularity.
This part of ISO 24622 takes two types of collections into account:
a) A complex resource may have been created as a collection originally and, versioning aside, it will
exist as such in a rather static published form. Its specification will be treated as an independent
entity by the responsible archiving institution that also provides a PID for such a collection. In the
context of this part of ISO 24622, the metadata for the collection is the collection specification. The
archiving institution is responsible for maintaining the metadata representing the collection.
b) In contrast, a different type of collection is one that was not planned and designed as a collection
by its creators or by the holding archive, but achieves its status as a federated resource based on
research that needs to be verifiable. Such collections, although purposefully constructed by the
researcher, may not have any significance outside the context of the research for which they were
created. Referring from the research documents to the collection may also become tedious if the
collection contains hundreds of individual resources. It follows that there is a need to capture these
types of collection with a metadata record that is associated with all its constituent resources and
appropriate metadata, but only as the incarnation of this collection. There is no natural responsible
party to maintain this metadata record. It is unlikely that the researcher who created the “virtual”
collection (VC) has any way of consistently maintaining and curating this metadata record in the
long term. There may be special registries maintained by digital archives or publishers where
researchers can register such virtual collections.
Both types of collection are identified with the PID that refers to the collection metadata.
vi © ISO 2015 – All rights reserved

---------------------- Page: 8 ----------------------
SIST ISO 24622-1:2018
INTERNATIONAL STANDARD ISO 24622-1:2015(E)
Language resource management — Component Metadata
Infrastructure (CMDI) —
Part 1:
The Component Metadata Model
1 Scope
The scope of this part of ISO 24622 is to describe a model that enables the flexible construction of
interoperable metadata schemas for Language Resources (LRs). The metadata schemas based on this
model can be used to describe resources at different levels of granularity (e.g. descriptions both on the
collection level and on the level of individual resources).
2 Terms and definitions
2.1
archive
digital archive
repository (2.26) dedicated to the long-term preservation of the associated data
Note 1 to entry: The data in digital archives are also often available on-line. This highlights the need for
reliable PIDs (2.22)
2.2
cardinality
metadata component cardinality
metadata element cardinality
specification of the number of occurrences of a metadata component (2.14) or metadata element (2.12)
in an instantiation
2.3
citation
object containing information that directs a textual resource reader’s or user’s attention from one
resource to another
2.4
closed vocabulary
limited set of items that forms the mandatory value domain of a metadata element (2.12)
2.5
concept reference
concept link
reference to the definition of a concept in a concept registry (2.6)
2.6
concept registry
registry (2.25) for registering concepts enabling their identification with a unique identifier
© ISO 2015 – All rights reserved 1

---------------------- Page: 9 ----------------------
SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

2.7
collection
resource collection
grouping of multiple, different constituting elements, each of which is independent of the others and
may be accessed individually
Note 1 to entry: A collection can be a virtual collection if its constituent elements come from other different
(virtual) collections, and possibly if the elements are distributed over different repositories.
2.8
fragment identifier
identifier (2.9) used to reference a resource part (2.28) in a web context
[SOURCE: ISO 12619:2011]
2.9
identifier
digital identifier
compact sequence of characters associated with digital, non-digital, or abstract entities
[SOURCE: Adapted from ISO 12619:2011]
Note 1 to entry: Identifiers can apply to entities such as books, images, reports, metadata records, and events.
2.10
metadata record
metadata description
metadata
record (2.23) containing a description of a resource (
...

INTERNATIONAL ISO
STANDARD 24622-1
First edition
2015-02-01
Language resource management —
Component Metadata Infrastructure
(CMDI) —
Part 1:
The Component Metadata Model
Gestion des ressources langagières — Composante infrastructure de
métadonnées (CMDI) —
Partie 1: Composant modèle de métadonnées
Reference number
ISO 24622-1:2015(E)
©
ISO 2015

---------------------- Page: 1 ----------------------
ISO 24622-1:2015(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2015
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2015 – All rights reserved

---------------------- Page: 2 ----------------------
ISO 24622-1:2015(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Terms and definitions . 1
3 Metadata schema availability and reuse . 5
3.1 Overview . 5
3.2 Metadata components and elements . 5
4 Semantics in the component metadata model . 7
4.1 Overview . 7
4.2 Concept registries . 8
4.3 Relation registries . 8
5 Metadata component and profile - compatibility and versioning .9
6 Expressiveness of the component metadata model . 9
Annex A (informative) Abbreviations .10
Bibliography .11
© ISO 2015 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO 24622-1:2015(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any
patent rights identified during the development of the document will be in the Introduction and/or on
the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical Barriers
to Trade (TBT), see the following URL: Foreword — Supplementary information.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
ISO 24622 consists of the following part, under the general title Language resource management —
Component metadata infrastructure (CMDI):
— Part 1: The component metadata model
A future part will address the component metadata specific language.
iv © ISO 2015 – All rights reserved

---------------------- Page: 4 ----------------------
ISO 24622-1:2015(E)

Introduction
Component Metadata (CMD) is an approach to metadata modelling and metadata creation. It is being
increasingly used these days to enable the metadata description of different types of Language
Resources (LRs) with different metadata schemas, while still trying to maintain syntactic and semantic
interoperability.
1)
[1]
CMD is also the core of the Component Metadata Infrastructure (CMDI) : this infrastructure contains
not only the format specifications for this metadata modelling and creation approach, but also a set of
registries and tools for metadata modelling and creation work.
The advantages of having such a unified approach to metadata descriptions for LRs, an approach that
will be usable by many projects and initiatives, are obvious: firstly, there is a better chance of obtaining
interoperability between metadata descriptions from different sources, and secondly, it will be possible
to develop and share tools that work much more efficiently in this metadata framework.
The challenge of designing and organizing a comprehensive and unified approach to metadata
description for the very varied set of LR types, and one that also can satisfy a sufficiently large section
of the LR community, should not be underestimated. The landscape of metadata for LRs has been, and
continues to be, fragmented. Until recently, it was the practice in creating the metadata descriptions for
LRs to choose a specific metadata schema from a (small) existing set derived either from widespread
[2] [3]
traditions or from other disciplines; for example, OLAC is an adapted version of DCMI, which in turn
originates in the library world. Additionally, there are, for the purposes of LR metadata description,
specifically developed metadata schemas that can be limited in application to specific types of LR (e.g.
2)
[4]
IMDI ), or they can be of a proprietary nature (cf. the catalogues of the LR agencies such as LDC and
3)
ELRA ). The result is a domain of LR metadata that is far from interoperable. Although some progress
has been made in developing dedicated bridges for “translating” metadata from one specific schema to
another and in providing a consolidated catalogue, this practice does not scale well since it depends on
specific translations for each pair of different metadata schemas.
For some recent projects, founding principles have included the unification and consolidation of practices
and the need to produce efficient and sufficiently specific metadata descriptions.
It follows that a number of international, European, and national projects and infrastructure initiatives
[5] [6]
such as CLARIN and META-SHARE now share the CMD approach to metadata for LRs. This
International Standard will both standardize the fundamentals of this approach in order to achieve
interoperability based on solid documentation, and foster cooperation between the various initiatives
and projects that work on, and with, this International Standard.
The model description is the first part of an infrastructure that forms a complete package for the
creation of metadata schemas. As stated in the Foreword, the complete infrastructure standard contains,
in addition to this component metadata model specification (ISO 24622-1), one or more metadata
component specification languages (planned), and a number of recommended metadata components
and profiles (planned). Since this part of ISO 24622 specifies an abstract model, we will rely mainly on
[7]
UML to describe it.
Figure 1 — Describing resources with metadata
1) Abbreviations are explained in Annex A.
2) Linguistic Data Consortium, http://www.ldc.upenn.edu/
3) European Language Resources Association, http://www.elra.info/
© ISO 2015 – All rights reserved v

---------------------- Page: 5 ----------------------
ISO 24622-1:2015(E)

This part of ISO 24622 addresses the basic need to provide a model that makes it easy for metadata
modellers (e.g. researchers and resource description experts) to create new metadata schemas, which
can in turn be used either to describe new types of resources or to enable a more appropriate description
for resources in specific circumstances. The metadata schema is instantiated into metadata records [i.e.
the metadata descriptions that describe the actual resource(s)] (see Figure 1).
The context of this desire for flexible metadata modelling is that for scientific work there are usually
various requirements for the proper description of LRs, and these requirements can derive from the
specific needs of a project or from the facility or repository that will be used to store the resource for
future use. This variation requires a flexible framework that enables the easy creation of new metadata
schemas for different purposes, but is also a framework (i) in which the instantiations have a strictly
defined format so that at least syntactic correctness can be checked, and (ii) which provides explicit
semantics for the metadata schema elements for interpretation of the metadata record content.
The metadata descriptions generated by schemas compliant with this model will also be compliant
with other TC 37 International Standards, for example, those requiring that references to the described
[9]
resources and resource parts use ISO 24619:2011 PISA-compatible persistent identifiers (PIDs) .
The definition of a resource in this context is very broad. This part of ISO 24622 takes a pragmatic view:
for example, an image can be a resource in itself when it is associated with a PID and can be referenced
as such, or it can be part of a document where it lacks an identity of its own. In addition, a reference can
point to a part of this image. An individual resource can stand alone in one environment and be treated
as part of a collection in another environment. Also, metadata descriptions describe resources, but they,
too, are a resource in different contexts. This part of ISO 24622 needs to support all such cases, and the
model needs to provide descriptions at all levels of granularity.
This part of ISO 24622 takes two types of collections into account:
a) A complex resource may have been created as a collection originally and, versioning aside, it will
exist as such in a rather static published form. Its specification will be treated as an independent
entity by the responsible archiving institution that also provides a PID for such a collection. In the
context of this part of ISO 24622, the metadata for the collection is the collection specification. The
archiving institution is responsible for maintaining the metadata representing the collection.
b) In contrast, a different type of collection is one that was not planned and designed as a collection
by its creators or by the holding archive, but achieves its status as a federated resource based on
research that needs to be verifiable. Such collections, although purposefully constructed by the
researcher, may not have any significance outside the context of the research for which they were
created. Referring from the research documents to the collection may also become tedious if the
collection contains hundreds of individual resources. It follows that there is a need to capture these
types of collection with a metadata record that is associated with all its constituent resources and
appropriate metadata, but only as the incarnation of this collection. There is no natural responsible
party to maintain this metadata record. It is unlikely that the researcher who created the “virtual”
collection (VC) has any way of consistently maintaining and curating this metadata record in the
long term. There may be special registries maintained by digital archives or publishers where
researchers can register such virtual collections.
Both types of collection are identified with the PID that refers to the collection metadata.
vi © ISO 2015 – All rights reserved

---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO 24622-1:2015(E)
Language resource management — Component Metadata
Infrastructure (CMDI) —
Part 1:
The Component Metadata Model
1 Scope
The scope of this part of ISO 24622 is to describe a model that enables the flexible construction of
interoperable metadata schemas for Language Resources (LRs). The metadata schemas based on this
model can be used to describe resources at different levels of granularity (e.g. descriptions both on the
collection level and on the level of individual resources).
2 Terms and definitions
2.1
archive
digital archive
repository (2.26) dedicated to the long-term preservation of the associated data
Note 1 to entry: The data in digital archives are also often available on-line. This highlights the need for
reliable PIDs (2.22)
2.2
cardinality
metadata component cardinality
metadata element cardinality
specification of the number of occurrences of a metadata component (2.14) or metadata element (2.12)
in an instantiation
2.3
citation
object containing information that directs a textual resource reader’s or user’s attention from one
resource to another
2.4
closed vocabulary
limited set of items that forms the mandatory value domain of a metadata element (2.12)
2.5
concept reference
concept link
reference to the definition of a concept in a concept registry (2.6)
2.6
concept registry
registry (2.25) for registering concepts enabling their identification with a unique identifier
© ISO 2015 – All rights reserved 1

---------------------- Page: 7 ----------------------
ISO 24622-1:2015(E)

2.7
collection
resource collection
grouping of multiple, different constituting elements, each of which is independent of the others and
may be accessed individually
Note 1 to entry: A collection can be a virtual collection if its constituent elements come from other different
(virtual) collections, and possibly if the elements are distributed over different repositories.
2.8
fragment identifier
identifier (2.9) used to reference a resource part (2.28) in a web context
[SOURCE: ISO 12619:2011]
2.9
identifier
digital identifier
compact sequence of characters associated with digital, non-digital, or abstract entities
[SOURCE: Adapted from ISO 12619:2011]
Note 1 to entry: Identifiers can apply to entities such as books, images, reports, metadata records, and events.
2.10
metadata record
metadata description
metadata
record (2.23) containing a description of a resource (2.27)
2.11
metadata schema
schema
specification of a format and structure for a metadata record (2.10)
Note 1 to entry: In the context of this part of ISO 24622, a machine-readable and verifiable format specification
usually defined by an XML schema language.
2.12
metadata element
resource property name that can be used in metadata and that can be given a value
Note 1 to entry: A metadata element is referred to as metadata attribute in other communities.
[3]
EXAMPLE The DCMI elements.
2.13
metadata set
metadata element set
collection of metadata elements (2.12) used within a part
...

SLOVENSKI STANDARD
SIST ISO 24622-1:2018
01-september-2018
Upravljanje z jezikovnimi viri - Infrastruktura komponentnih metapodatkov (CMDI) -
1. del: Model komponentnih metapodatkov
Language resource management -- Component Metadata Infrastructure (CMDI) -- Part 1:
The Component Metadata Model
Gestion des ressources langagières -- Composante infrastructure de métadonnées
(CMDI) -- Partie 1: Composant modèle de métadonnées
Ta slovenski standard je istoveten z: ISO 24622-1:2015
ICS:
01.140.20 Informacijske vede Information sciences
35.060 Jeziki, ki se uporabljajo v Languages used in
informacijski tehniki in information technology
tehnologiji
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24622-1:2018 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------

SIST ISO 24622-1:2018

---------------------- Page: 2 ----------------------

SIST ISO 24622-1:2018
INTERNATIONAL ISO
STANDARD 24622-1
First edition
2015-02-01
Language resource management —
Component Metadata Infrastructure
(CMDI) —
Part 1:
The Component Metadata Model
Gestion des ressources langagières — Composante infrastructure de
métadonnées (CMDI) —
Partie 1: Composant modèle de métadonnées
Reference number
ISO 24622-1:2015(E)
©
ISO 2015

---------------------- Page: 3 ----------------------

SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2015
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2015 – All rights reserved

---------------------- Page: 4 ----------------------

SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Terms and definitions . 1
3 Metadata schema availability and reuse . 5
3.1 Overview . 5
3.2 Metadata components and elements . 5
4 Semantics in the component metadata model . 7
4.1 Overview . 7
4.2 Concept registries . 8
4.3 Relation registries . 8
5 Metadata component and profile - compatibility and versioning .9
6 Expressiveness of the component metadata model . 9
Annex A (informative) Abbreviations .10
Bibliography .11
© ISO 2015 – All rights reserved iii

---------------------- Page: 5 ----------------------

SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any
patent rights identified during the development of the document will be in the Introduction and/or on
the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical Barriers
to Trade (TBT), see the following URL: Foreword — Supplementary information.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
ISO 24622 consists of the following part, under the general title Language resource management —
Component metadata infrastructure (CMDI):
— Part 1: The component metadata model
A future part will address the component metadata specific language.
iv © ISO 2015 – All rights reserved

---------------------- Page: 6 ----------------------

SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

Introduction
Component Metadata (CMD) is an approach to metadata modelling and metadata creation. It is being
increasingly used these days to enable the metadata description of different types of Language
Resources (LRs) with different metadata schemas, while still trying to maintain syntactic and semantic
interoperability.
1)
[1]
CMD is also the core of the Component Metadata Infrastructure (CMDI) : this infrastructure contains
not only the format specifications for this metadata modelling and creation approach, but also a set of
registries and tools for metadata modelling and creation work.
The advantages of having such a unified approach to metadata descriptions for LRs, an approach that
will be usable by many projects and initiatives, are obvious: firstly, there is a better chance of obtaining
interoperability between metadata descriptions from different sources, and secondly, it will be possible
to develop and share tools that work much more efficiently in this metadata framework.
The challenge of designing and organizing a comprehensive and unified approach to metadata
description for the very varied set of LR types, and one that also can satisfy a sufficiently large section
of the LR community, should not be underestimated. The landscape of metadata for LRs has been, and
continues to be, fragmented. Until recently, it was the practice in creating the metadata descriptions for
LRs to choose a specific metadata schema from a (small) existing set derived either from widespread
[2] [3]
traditions or from other disciplines; for example, OLAC is an adapted version of DCMI, which in turn
originates in the library world. Additionally, there are, for the purposes of LR metadata description,
specifically developed metadata schemas that can be limited in application to specific types of LR (e.g.
2)
[4]
IMDI ), or they can be of a proprietary nature (cf. the catalogues of the LR agencies such as LDC and
3)
ELRA ). The result is a domain of LR metadata that is far from interoperable. Although some progress
has been made in developing dedicated bridges for “translating” metadata from one specific schema to
another and in providing a consolidated catalogue, this practice does not scale well since it depends on
specific translations for each pair of different metadata schemas.
For some recent projects, founding principles have included the unification and consolidation of practices
and the need to produce efficient and sufficiently specific metadata descriptions.
It follows that a number of international, European, and national projects and infrastructure initiatives
[5] [6]
such as CLARIN and META-SHARE now share the CMD approach to metadata for LRs. This
International Standard will both standardize the fundamentals of this approach in order to achieve
interoperability based on solid documentation, and foster cooperation between the various initiatives
and projects that work on, and with, this International Standard.
The model description is the first part of an infrastructure that forms a complete package for the
creation of metadata schemas. As stated in the Foreword, the complete infrastructure standard contains,
in addition to this component metadata model specification (ISO 24622-1), one or more metadata
component specification languages (planned), and a number of recommended metadata components
and profiles (planned). Since this part of ISO 24622 specifies an abstract model, we will rely mainly on
[7]
UML to describe it.
Figure 1 — Describing resources with metadata
1) Abbreviations are explained in Annex A.
2) Linguistic Data Consortium, http://www.ldc.upenn.edu/
3) European Language Resources Association, http://www.elra.info/
© ISO 2015 – All rights reserved v

---------------------- Page: 7 ----------------------

SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

This part of ISO 24622 addresses the basic need to provide a model that makes it easy for metadata
modellers (e.g. researchers and resource description experts) to create new metadata schemas, which
can in turn be used either to describe new types of resources or to enable a more appropriate description
for resources in specific circumstances. The metadata schema is instantiated into metadata records [i.e.
the metadata descriptions that describe the actual resource(s)] (see Figure 1).
The context of this desire for flexible metadata modelling is that for scientific work there are usually
various requirements for the proper description of LRs, and these requirements can derive from the
specific needs of a project or from the facility or repository that will be used to store the resource for
future use. This variation requires a flexible framework that enables the easy creation of new metadata
schemas for different purposes, but is also a framework (i) in which the instantiations have a strictly
defined format so that at least syntactic correctness can be checked, and (ii) which provides explicit
semantics for the metadata schema elements for interpretation of the metadata record content.
The metadata descriptions generated by schemas compliant with this model will also be compliant
with other TC 37 International Standards, for example, those requiring that references to the described
[9]
resources and resource parts use ISO 24619:2011 PISA-compatible persistent identifiers (PIDs) .
The definition of a resource in this context is very broad. This part of ISO 24622 takes a pragmatic view:
for example, an image can be a resource in itself when it is associated with a PID and can be referenced
as such, or it can be part of a document where it lacks an identity of its own. In addition, a reference can
point to a part of this image. An individual resource can stand alone in one environment and be treated
as part of a collection in another environment. Also, metadata descriptions describe resources, but they,
too, are a resource in different contexts. This part of ISO 24622 needs to support all such cases, and the
model needs to provide descriptions at all levels of granularity.
This part of ISO 24622 takes two types of collections into account:
a) A complex resource may have been created as a collection originally and, versioning aside, it will
exist as such in a rather static published form. Its specification will be treated as an independent
entity by the responsible archiving institution that also provides a PID for such a collection. In the
context of this part of ISO 24622, the metadata for the collection is the collection specification. The
archiving institution is responsible for maintaining the metadata representing the collection.
b) In contrast, a different type of collection is one that was not planned and designed as a collection
by its creators or by the holding archive, but achieves its status as a federated resource based on
research that needs to be verifiable. Such collections, although purposefully constructed by the
researcher, may not have any significance outside the context of the research for which they were
created. Referring from the research documents to the collection may also become tedious if the
collection contains hundreds of individual resources. It follows that there is a need to capture these
types of collection with a metadata record that is associated with all its constituent resources and
appropriate metadata, but only as the incarnation of this collection. There is no natural responsible
party to maintain this metadata record. It is unlikely that the researcher who created the “virtual”
collection (VC) has any way of consistently maintaining and curating this metadata record in the
long term. There may be special registries maintained by digital archives or publishers where
researchers can register such virtual collections.
Both types of collection are identified with the PID that refers to the collection metadata.
vi © ISO 2015 – All rights reserved

---------------------- Page: 8 ----------------------

SIST ISO 24622-1:2018
INTERNATIONAL STANDARD ISO 24622-1:2015(E)
Language resource management — Component Metadata
Infrastructure (CMDI) —
Part 1:
The Component Metadata Model
1 Scope
The scope of this part of ISO 24622 is to describe a model that enables the flexible construction of
interoperable metadata schemas for Language Resources (LRs). The metadata schemas based on this
model can be used to describe resources at different levels of granularity (e.g. descriptions both on the
collection level and on the level of individual resources).
2 Terms and definitions
2.1
archive
digital archive
repository (2.26) dedicated to the long-term preservation of the associated data
Note 1 to entry: The data in digital archives are also often available on-line. This highlights the need for
reliable PIDs (2.22)
2.2
cardinality
metadata component cardinality
metadata element cardinality
specification of the number of occurrences of a metadata component (2.14) or metadata element (2.12)
in an instantiation
2.3
citation
object containing information that directs a textual resource reader’s or user’s attention from one
resource to another
2.4
closed vocabulary
limited set of items that forms the mandatory value domain of a metadata element (2.12)
2.5
concept reference
concept link
reference to the definition of a concept in a concept registry (2.6)
2.6
concept registry
registry (2.25) for registering concepts enabling their identification with a unique identifier
© ISO 2015 – All rights reserved 1

---------------------- Page: 9 ----------------------

SIST ISO 24622-1:2018
ISO 24622-1:2015(E)

2.7
collection
resource collection
grouping of multiple, different constituting elements, each of which is independent of the others and
may be accessed individually
Note 1 to entry: A collection can be a virtual collection if its constituent elements come from other different
(virtual) collections, and possibly if the elements are distributed over different repositories.
2.8
fragment identifier
identifier (2.9) used to reference a resource part (2.28) in a web context
[SOURCE: ISO 12619:2011]
2.9
identifier
digital identifier
compact sequence of characters associated with digital, non-digital, or abstract entities
[SOURCE: Adapted from ISO 12619:2011]
Note 1 to entry: Identifiers can apply to entities such as books, images, reports, metadata records, and events.
2.10
metadata record
metadata description
metadata
record (2.23) containing a description of a
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.