Language resource management -- Lexical markup framework (LMF) -- Part 1: Core model

This document describes the core model of the lexical markup framework (LMF)l, a metamodel for
representing data in monolingual and multilingual lexical databases used with computer applications.
LMF provides mechanisms that allow the development and integration of a variety of electronic lexical
resource types.

Gestion de ressources linguistiques -- Cadre de balisage lexical -- Partie 1: Modèle de base

Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 1. del: Jedrni model

Ta dokument opisuje jedrni model ogrodja za označevanje leksikonov (LMF)l, metamodel za predstavljanje podatkov v enojezičnih in večjezičnih leksikalnih zbirkah podatkov, ki se uporabljajo z računalniškimi aplikacijami.
LMF zagotavlja mehanizme, ki omogočajo razvoj in integracijo številnih vrst elektronskih leksikalnih virov.

General Information

Status
Published
Publication Date
03-Sep-2019
Current Stage
6060 - National Implementation/Publication (Adopted Project)
Start Date
01-Aug-2019
Due Date
06-Oct-2019
Completion Date
04-Sep-2019

Relations

Buy Standard

Standard
ISO 24613-1:2019
English language
18 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 24613-1:2019 - Language resource management -- Lexical markup framework (LMF)
English language
13 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO 24613-1:2019
English language
18 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)

SLOVENSKI STANDARD
SIST ISO 24613-1:2019
01-oktober-2019
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 1. del:
Jedrni model
Language resource management -- Lexical markup framework (LMF) -- Part 1: Core
model
Gestion de ressources linguistiques -- Cadre de balisage lexical -- Partie 1: Modèle de
base
Ta slovenski standard je istoveten z: ISO 24613-1:2019
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24613-1:2019 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24613-1:2019

---------------------- Page: 2 ----------------------
SIST ISO 24613-1:2019
INTERNATIONAL ISO
STANDARD 24613-1
First edition
2019-06
Language resource management —
Lexical markup framework (LMF) —
Part 1:
Core model
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 1: Modèle de base
Reference number
ISO 24613-1:2019(E)
©
ISO 2019

---------------------- Page: 3 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2019
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2019 – All rights reserved

---------------------- Page: 4 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Key standards used by LMF . 3
4.1 Unicode . 3
4.2 Language coding . 3
4.3 Script coding . 3
4.4 Unified modeling language (UML) . 3
5 The LMF model . 3
5.1 Introduction . 3
5.2 Class inheritance and data category selection procedures . 4
5.2.1 Class inheritance . 4
5.2.2 LMF attributes . . 4
5.2.3 Data category selection (DCS) . 4
5.2.4 User-defined data categories . 4
5.3 LMF core package . 4
5.3.1 General. 4
5.3.2 LexicalResource class . 5
5.3.3 GlobalInformation class . 5
5.3.4 Lexicon class . 6
5.3.5 LexiconInformation class . 6
5.3.6 LexicalEntry class . 6
5.3.7 Form class . 6
5.3.8 OrthographicRepresentation class . 6
5.3.9 GrammaticalInformation Class . 6
5.3.10 Sense class . 7
5.3.11 Definition class . 7
5.4 Cross reference (CrossREF) model . 7
5.4.1 General. 7
5.4.2 CrossREF and CrossREFConstraint classes . 7
5.4.3 CrossREFConstraint class . 7
5.5 Methods for data category selection and subclass creation . 7
5.5.1 General. 7
5.5.2 Generalization (typing) . 8
5.5.3 Object instantiation . 8
5.5.4 Design choices . 8
5.5.5 Data categories for orthographic representation . 9
5.5.6 Principles for model simplification. 9
5.6 LMF extension use . 9
5.6.1 General. 9
5.6.2 Lexicon comparison .10
Annex A (informative) Data category examples .11
Bibliography .13
© ISO 2019 – All rights reserved iii

---------------------- Page: 5 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso
.org/iso/foreword .html.
The document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee 4, Language resource management.
This first edition of ISO 24613-1, together with ISO 24613-2 to ISO 24613-6, cancels and replaces
ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
The content has been entirely revised and subdivided into parts. Part 1, Core model, contains the
body of the previous edition. New classes include LexiconInformation and GrammaticalInformation.
The Representation class has been renamed the OrthographicRepresentation class. In addition, the
OrthographicRepresentation subclasses, FormRepresentation and TextRepresentation, no longer are
part of the core model, providing it with greater modeling flexibility. The LexicalEntry subclass now
allows subclasses, providing improved extensibility and flexibility for modeling future parts. The
addition of the CrossREF class and associated metadata provides a formal model for cross-reference
design and implementation, closing a functional gap in the previous edition. A thoroughly revised
description of data category allocation mechanisms and their relationship to generalization by typing
provides a more incisive description of how these interdependent mechanisms enable flexible and
extensible designs.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/members .html.
iv © ISO 2019 – All rights reserved

---------------------- Page: 6 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

Introduction
Optimizing the production, maintenance and extension of electronic lexical resources is one of the
crucial aspects impacting human language technologies (HLT) in general and natural language
processing (NLP) in particular, as well as human-oriented translation technologies. A second crucial
aspect involves optimizing the process leading to their integration in applications. Lexical markup
framework (LMF) is an abstract metamodel that provides a common, standardized framework for
the construction of computational lexicons. LMF ensures the encoding of linguistic information in a
way that enables reusability in different applications and for different tasks. LMF provides a common,
shared representation of lexical objects, including morphological, syntactic and semantic aspects.
The goals of LMF are to provide a common model for the creation and use of electronic lexical resources
ranging from small to large in scale, to manage the exchange of data between and among these
resources, and to facilitate the merging of large numbers of different individual electronic resources to
form extensive global electronic resources. The ultimate goal of LMF is to create a modular structure
that will facilitate true content interoperability across all aspects of electronic lexical resources.
[3]
LMF supports existing lexical resource models such as Genelex , the EAGLES International Standard
[4] [10]
for Language Engineering (ISLE) , Multilingual ISLE Lexical Entry (MILE) models , Text Encoding
[8] [7]
Initiative (TEI) guidelines , Ontolex , and the Language Base Exchange (LBX) serialization together
[5]
with the U.S. Government Wordscape On-Line Dictionary system .
[9]
LMF uses UML modeling processes . The LMF core package describes the basic hierarchy of information
of a lexical entry, including information on the word form. The core package is supplemented by various
resources that are part of the definition of LMF. These resources include:
— specific data categories used by the variety of resource types associated with LMF, both those data
categories relevant to the metamodel itself, and those associated with the extensions to the core
package in additional LMF parts (see Annex A for data category examples);
— the constraints governing the relationship of these data categories to the metamodel and to its
extensions;
— standard procedures for expressing these categories and thus for anchoring them on the structural
skeleton of LMF and relating them to the respective extension models;
— the vocabularies used by LMF to express related informational objects for describing how to extend
LMF through linkage to a variety of specific resources (extensions) and methods for analysing and
designing such linked systems.
LMF parts are expressed in a framework that describes the reuse of the LMF core components (such as
structures, data categories, and vocabularies) in conjunction with the additional components required
for a specific resource.
The parts currently in or planned for the new organization of ISO 24613 include Part 1: Core model, Part
2: Machine readable dictionary (MRD) model, Part 3: Diachrony-etymology, Part 4: TEI serialization, Part
5: LBX serialization, and Part 6: Syntax and semantics.
[2]
The ISO 24613 series is designed to coordinate closely with ISO 16642 .
© ISO 2019 – All rights reserved v

---------------------- Page: 7 ----------------------
SIST ISO 24613-1:2019

---------------------- Page: 8 ----------------------
SIST ISO 24613-1:2019
INTERNATIONAL STANDARD ISO 24613-1:2019(E)
Language resource management — Lexical markup
framework (LMF) —
Part 1:
Core model
1 Scope
This document describes the core model of the lexical markup framework (LMF)l, a metamodel for
representing data in monolingual and multilingual lexical databases used with computer applications.
LMF provides mechanisms that allow the development and integration of a variety of electronic lexical
resource types.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 639 (all parts), Codes for the representation of names of languages
ISO 15924, Information and documentation — Codes for the representation of names of scripts
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at http: //www .iso .org/obp
— IEC Electropedia: available at http: //www .electropedia .org/
3.1
data category
DC
elementary descriptor used in a linguistic description or annotation scheme
3.2
word form
instance of a word, multi-word expression, root, stem, or morpheme
3.3
grammatical feature
property associated with a word form (3.2) to describe one of its grammatical attributes
EXAMPLE /grammatical gender/
© ISO 2019 – All rights reserved 1

---------------------- Page: 9 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

3.4
lemma
lemmatized form
canonical form
conventional word form (3.2) chosen to represent a lexeme (3.5)
Note 1 to entry: In many European languages, the lemma is usually the /singular/ for a noun if there is a variation
in /number/, the /masculine/ form if there is a variation in /gender/ and the /infinitive/ for all verbs. In some
languages, certain nouns are defective in the singular form, in which case the /plural/ is chosen. In Arabic, for
a verb, the lemma is sometimes considered as being the third person singular with the accomplished aspect, in
other approaches it is considered as being the root.
3.5
lexeme
abstract unit generally associated with a set of word forms (3.2) sharing a common meaning
[SOURCE: ISO 24613:2008, 3.25, modified – "forms" replaced with "word forms".]
3.6
lexical resource
lexical database
database consisting of one or several lexicons (3.7)
3.7
lexicon
resource comprising lexical entries for one or several languages
Note 1 to entry: A special language lexicon or a lexicon prepared for a specific NLP application can comprise a
specific subset of a language.
3.8
multiword expression
MWE
lexeme (3.5) made up of a sequence of two or more lexemes that has properties that may not be
predictable from the properties of the individual lexemes or their normal mode of combination
EXAMPLE “To kick the bucket”, an idiomatic expression which means to die rather than to hit a bucket with
one's foot. An idiomatic expression is a subtype of MWE whose properties are not predictable from the properties
of the individual lexemes.
Note 1 to entry: An MWE can be a compound, a fragment of a sentence, or a sentence. The group of lexemes
making up an MWE can be continuous or discontinuous. It is not always possible to mark an MWE with a part of
speech (3.13).
3.9
natural language processing
NLP
field covering knowledge and techniques involved in the processing of linguistic data by a computer
3.10
orthography
way of spelling or writing lexemes (3.5) that conforms to a conventionalized use
Note 1 to entry: Usually, the notion of orthography covers standardized spellings of alphabetic languages, such
as standard UK or US English, or reformed German spelling, as well as hieroglyphic or syllabic writing systems.
For the purpose of this standard, we also subsume variations such as transliterations of languages in non-native
scripts, stenographic renderings, or representations in the International Phonetic Alphabet under the notion of
orthography.
2 © ISO 2019 – All rights reserved

---------------------- Page: 10 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

3.11
part of speech
lexical category
word class
category assigned to a lexeme (3.5) based on its grammatical properties
EXAMPLE Typical parts of speech for European languages include: noun, verb, adjective, adverb,
preposition, etc.
3.12
script
set of graphic characters used for the written form of one or more languages
EXAMPLE Hiragana, Katakana, Latin and Cyrillic.
Note 1 to entry: The description of scripts ranges from a high level classification such as hieroglyphic or syllabic
writing systems vs. alphabets to a more precise classification like Roman vs. Cyrillic. Scripts are defined by a list
of values taken from ISO 15924.
[SOURCE: ISO/IEC 10646:2017 3.50, modified – Example and Note 1 to entry added]
4 Key standards used by LMF
4.1 Unicode
LMF is Unicode-compliant and presumes that all data are used according to the Unicode character
encodings.
4.2 Language coding
Language identifiers used in LMF-compliant resources shall conform to criteria specified in the ISO 639
series of standards. Some issues involving the combination of language and country codes, as well as
the coordination of different parts of ISO 639 have been addressed in external standards supported by
[6]
the technology community. The current edition of IETF Best Common Practices (BCP) 47 should be
consulted.
4.3 Script coding
When the script code is not part of the language identifier, script identifiers shall conform to criteria
specified in ISO 15924.
4.4 Unified modeling language (UML)
LMF complies with the specifications and modeling principles of UML as defined by the Object
[9]
Management Group (OMG) , LMF uses a subset of UML that is relevant for linguistic description.
5 The LMF model
5.1 Introduction
LMF models are represented by UML classes, associations among the classes, and a set of data categories
that function as UML attribute-value pairs. The data categories are used to adorn the UML diagrams
that provide a high level view of the model. LMF specifications in the form of textual descriptions
describe the semantics of the modeling elements and provide more complete information about classes,
relationships, and extensions than can be included in UML diagrams.
In this process, lexicon developers shall use the classes that are specified in the LMF core package
(5.3), and classes that are defined in other LMF parts or classes derived from any of these referenced
© ISO 2019 – All rights reserved 3

---------------------- Page: 11 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

classes using documented LMF processes for class inheritance. Developers shall define a data category
selection (DCS) as specified for LMF data category selection procedures (5.2.3 and 5.2.4).
5.2 Class inheritance and data category selection procedures
5.2.1 Class inheritance
LMF specifies constraints on which classes allow subclasses.
5.2.2 LMF attributes
UML models such as LMF are populated or further described by UML attributes, which provide
information about specific properties or characteristics associated with the model. All LMF attributes
are complex data categories. For a given class, all attributes are different. Each value of an attribute is
either a simple data category or a Unicode string. Each attribute has only one value.
5.2.3 Data category selection (DCS)
In the broadest sense, a data category selection can comprise all the data categories used by a given
domain in the field of language resources. A DCS can also list and describe the set of data categories that
can be used in a given LMF lexicon. The DCS also describes constraints on how the data categories are
mapped to specific classes.
5.2.4 User-defined data categories
Lexicon creators can define a set of new data categories to cover data category concepts that are needed
and that are not available.
5.3 LMF core package
5.3.1 General
The LMF core package is a metamodel that provides a flexible basis for building LMF models and
extensions, see Figure 1.
NOTE Each word in a class name begins with a capital letter with no intervening spaces or punctuation. This
practice is not required by UML, but generally conforms with most UML documentation.
4 © ISO 2019 – All rights reserved

---------------------- Page: 12 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

Figure 1 — LMF core package
5.3.2 LexicalResource class
LexicalResource is a class representing the entire resource. LexicalResource occurs once and only once.
The LexicalResource instance is a container for one or more lexicons.
5.3.3 GlobalInformation class
GlobalInformation is a class representing administrative information and other general attributes.
There is a one-to-one aggregate association between the Lexicon class and the GlobalInformation class
in that the latter describes the administrative information and general attributes of the entire resource.
The GlobalInformation class does not allow subclasses.
The GlobalInformation instance shall contain at least the following attributes:
— /language coding/ This attribute specifies which standard is used in order to code the language
names within the whole LexicalResource instance.
The GlobalInformation instance can contain the following attributes:
— /script coding/ This attribute specifies which standard is used in order to code the script names
within the whole LexicalResource instance;
— /character coding/ This attribute specifies which Unicode version is used within the whole
LexicalResource instance.
NOTE Other standard related precisions can be specified on the GlobalInformation instance.
© ISO 2019 – All rights reserved 5

---------------------- Page: 13 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

5.3.4 Lexicon class
Lexicon is a class containing one or more lexical entries. The Lexicon class does not allow subclasses.
EXAMPLE A lexicon might include Spanish language entries in the domain of graphic arts; or a lexicon might
include Spanish language entries across multiple domains.
5.3.5 LexiconInformation class
LexiconInformation is a class representing administrative information and general attributes for a
given Lexicon. There is a one-to-one aggregate association between the LexiconInformation class and
the Lexicon class in that the former describes administrative information and attributes applicable to
the entire lexicon. The LexiconInformation class does not allow subclasses.
EXAMPLE The LexiconInformation class can manage /xml: lang/, /script/, and /notation/ attributes when
these are generally applicable to OrthographicRepresentation subclasses across the entire lexicon.
5.3.6 LexicalEntry class
The LexicalEntry class serves as a container for managing Form and Sense classes which have a close
conceptual relationship. The derivation of subclasses and new classes in future parts will require
modeling specifications that describe the associations between the forms and their related senses
whenever these diverge from the close conceptual relation defined in the core model. A LexicalEntry
instance can contain zero or more forms, and can have from zero or more different senses. The
LexicalEntry class allows subclasses.
EXAMPLE A bilingual dictionary usually requires at least one Sense class, while a lexicon that explicitly
describes inflected forms of a language (for example a lexicon containing all of or a substantial portion of the
inflected forms of each lemma or root in the lexicon (extensional morphology) may not require a Sense class,
depending on the design goals of the developer.
5.3.7 Form c
...

INTERNATIONAL ISO
STANDARD 24613-1
First edition
2019-06
Language resource management —
Lexical markup framework (LMF) —
Part 1:
Core model
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 1: Modèle de base
Reference number
ISO 24613-1:2019(E)
©
ISO 2019

---------------------- Page: 1 ----------------------
ISO 24613-1:2019(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2019
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2019 – All rights reserved

---------------------- Page: 2 ----------------------
ISO 24613-1:2019(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Key standards used by LMF . 3
4.1 Unicode . 3
4.2 Language coding . 3
4.3 Script coding . 3
4.4 Unified modeling language (UML) . 3
5 The LMF model . 3
5.1 Introduction . 3
5.2 Class inheritance and data category selection procedures . 4
5.2.1 Class inheritance . 4
5.2.2 LMF attributes . . 4
5.2.3 Data category selection (DCS) . 4
5.2.4 User-defined data categories . 4
5.3 LMF core package . 4
5.3.1 General. 4
5.3.2 LexicalResource class . 5
5.3.3 GlobalInformation class . 5
5.3.4 Lexicon class . 6
5.3.5 LexiconInformation class . 6
5.3.6 LexicalEntry class . 6
5.3.7 Form class . 6
5.3.8 OrthographicRepresentation class . 6
5.3.9 GrammaticalInformation Class . 6
5.3.10 Sense class . 7
5.3.11 Definition class . 7
5.4 Cross reference (CrossREF) model . 7
5.4.1 General. 7
5.4.2 CrossREF and CrossREFConstraint classes . 7
5.4.3 CrossREFConstraint class . 7
5.5 Methods for data category selection and subclass creation . 7
5.5.1 General. 7
5.5.2 Generalization (typing) . 8
5.5.3 Object instantiation . 8
5.5.4 Design choices . 8
5.5.5 Data categories for orthographic representation . 9
5.5.6 Principles for model simplification. 9
5.6 LMF extension use . 9
5.6.1 General. 9
5.6.2 Lexicon comparison .10
Annex A (informative) Data category examples .11
Bibliography .13
© ISO 2019 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO 24613-1:2019(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso
.org/iso/foreword .html.
The document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee 4, Language resource management.
This first edition of ISO 24613-1, together with ISO 24613-2 to ISO 24613-6, cancels and replaces
ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
The content has been entirely revised and subdivided into parts. Part 1, Core model, contains the
body of the previous edition. New classes include LexiconInformation and GrammaticalInformation.
The Representation class has been renamed the OrthographicRepresentation class. In addition, the
OrthographicRepresentation subclasses, FormRepresentation and TextRepresentation, no longer are
part of the core model, providing it with greater modeling flexibility. The LexicalEntry subclass now
allows subclasses, providing improved extensibility and flexibility for modeling future parts. The
addition of the CrossREF class and associated metadata provides a formal model for cross-reference
design and implementation, closing a functional gap in the previous edition. A thoroughly revised
description of data category allocation mechanisms and their relationship to generalization by typing
provides a more incisive description of how these interdependent mechanisms enable flexible and
extensible designs.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/members .html.
iv © ISO 2019 – All rights reserved

---------------------- Page: 4 ----------------------
ISO 24613-1:2019(E)

Introduction
Optimizing the production, maintenance and extension of electronic lexical resources is one of the
crucial aspects impacting human language technologies (HLT) in general and natural language
processing (NLP) in particular, as well as human-oriented translation technologies. A second crucial
aspect involves optimizing the process leading to their integration in applications. Lexical markup
framework (LMF) is an abstract metamodel that provides a common, standardized framework for
the construction of computational lexicons. LMF ensures the encoding of linguistic information in a
way that enables reusability in different applications and for different tasks. LMF provides a common,
shared representation of lexical objects, including morphological, syntactic and semantic aspects.
The goals of LMF are to provide a common model for the creation and use of electronic lexical resources
ranging from small to large in scale, to manage the exchange of data between and among these
resources, and to facilitate the merging of large numbers of different individual electronic resources to
form extensive global electronic resources. The ultimate goal of LMF is to create a modular structure
that will facilitate true content interoperability across all aspects of electronic lexical resources.
[3]
LMF supports existing lexical resource models such as Genelex , the EAGLES International Standard
[4] [10]
for Language Engineering (ISLE) , Multilingual ISLE Lexical Entry (MILE) models , Text Encoding
[8] [7]
Initiative (TEI) guidelines , Ontolex , and the Language Base Exchange (LBX) serialization together
[5]
with the U.S. Government Wordscape On-Line Dictionary system .
[9]
LMF uses UML modeling processes . The LMF core package describes the basic hierarchy of information
of a lexical entry, including information on the word form. The core package is supplemented by various
resources that are part of the definition of LMF. These resources include:
— specific data categories used by the variety of resource types associated with LMF, both those data
categories relevant to the metamodel itself, and those associated with the extensions to the core
package in additional LMF parts (see Annex A for data category examples);
— the constraints governing the relationship of these data categories to the metamodel and to its
extensions;
— standard procedures for expressing these categories and thus for anchoring them on the structural
skeleton of LMF and relating them to the respective extension models;
— the vocabularies used by LMF to express related informational objects for describing how to extend
LMF through linkage to a variety of specific resources (extensions) and methods for analysing and
designing such linked systems.
LMF parts are expressed in a framework that describes the reuse of the LMF core components (such as
structures, data categories, and vocabularies) in conjunction with the additional components required
for a specific resource.
The parts currently in or planned for the new organization of ISO 24613 include Part 1: Core model, Part
2: Machine readable dictionary (MRD) model, Part 3: Diachrony-etymology, Part 4: TEI serialization, Part
5: LBX serialization, and Part 6: Syntax and semantics.
[2]
The ISO 24613 series is designed to coordinate closely with ISO 16642 .
© ISO 2019 – All rights reserved v

---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO 24613-1:2019(E)
Language resource management — Lexical markup
framework (LMF) —
Part 1:
Core model
1 Scope
This document describes the core model of the lexical markup framework (LMF)l, a metamodel for
representing data in monolingual and multilingual lexical databases used with computer applications.
LMF provides mechanisms that allow the development and integration of a variety of electronic lexical
resource types.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 639 (all parts), Codes for the representation of names of languages
ISO 15924, Information and documentation — Codes for the representation of names of scripts
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at http: //www .iso .org/obp
— IEC Electropedia: available at http: //www .electropedia .org/
3.1
data category
DC
elementary descriptor used in a linguistic description or annotation scheme
3.2
word form
instance of a word, multi-word expression, root, stem, or morpheme
3.3
grammatical feature
property associated with a word form (3.2) to describe one of its grammatical attributes
EXAMPLE /grammatical gender/
© ISO 2019 – All rights reserved 1

---------------------- Page: 6 ----------------------
ISO 24613-1:2019(E)

3.4
lemma
lemmatized form
canonical form
conventional word form (3.2) chosen to represent a lexeme (3.5)
Note 1 to entry: In many European languages, the lemma is usually the /singular/ for a noun if there is a variation
in /number/, the /masculine/ form if there is a variation in /gender/ and the /infinitive/ for all verbs. In some
languages, certain nouns are defective in the singular form, in which case the /plural/ is chosen. In Arabic, for
a verb, the lemma is sometimes considered as being the third person singular with the accomplished aspect, in
other approaches it is considered as being the root.
3.5
lexeme
abstract unit generally associated with a set of word forms (3.2) sharing a common meaning
[SOURCE: ISO 24613:2008, 3.25, modified – "forms" replaced with "word forms".]
3.6
lexical resource
lexical database
database consisting of one or several lexicons (3.7)
3.7
lexicon
resource comprising lexical entries for one or several languages
Note 1 to entry: A special language lexicon or a lexicon prepared for a specific NLP application can comprise a
specific subset of a language.
3.8
multiword expression
MWE
lexeme (3.5) made up of a sequence of two or more lexemes that has properties that may not be
predictable from the properties of the individual lexemes or their normal mode of combination
EXAMPLE “To kick the bucket”, an idiomatic expression which means to die rather than to hit a bucket with
one's foot. An idiomatic expression is a subtype of MWE whose properties are not predictable from the properties
of the individual lexemes.
Note 1 to entry: An MWE can be a compound, a fragment of a sentence, or a sentence. The group of lexemes
making up an MWE can be continuous or discontinuous. It is not always possible to mark an MWE with a part of
speech (3.13).
3.9
natural language processing
NLP
field covering knowledge and techniques involved in the processing of linguistic data by a computer
3.10
orthography
way of spelling or writing lexemes (3.5) that conforms to a conventionalized use
Note 1 to entry: Usually, the notion of orthography covers standardized spellings of alphabetic languages, such
as standard UK or US English, or reformed German spelling, as well as hieroglyphic or syllabic writing systems.
For the purpose of this standard, we also subsume variations such as transliterations of languages in non-native
scripts, stenographic renderings, or representations in the International Phonetic Alphabet under the notion of
orthography.
2 © ISO 2019 – All rights reserved

---------------------- Page: 7 ----------------------
ISO 24613-1:2019(E)

3.11
part of speech
lexical category
word class
category assigned to a lexeme (3.5) based on its grammatical properties
EXAMPLE Typical parts of speech for European languages include: noun, verb, adjective, adverb,
preposition, etc.
3.12
script
set of graphic characters used for the written form of one or more languages
EXAMPLE Hiragana, Katakana, Latin and Cyrillic.
Note 1 to entry: The description of scripts ranges from a high level classification such as hieroglyphic or syllabic
writing systems vs. alphabets to a more precise classification like Roman vs. Cyrillic. Scripts are defined by a list
of values taken from ISO 15924.
[SOURCE: ISO/IEC 10646:2017 3.50, modified – Example and Note 1 to entry added]
4 Key
...

SLOVENSKI STANDARD
SIST ISO 24613-1:2019
01-oktober-2019
Upravljanje jezikovnih virov - Ogrodje za označevanje leksikonov (LMF) - 1. del:
Jedrni model
Language resource management -- Lexical markup framework (LMF) -- Part 1: Core
model
Gestion de ressources linguistiques -- Cadre de balisage lexical -- Partie 1: Modèle de
base
Ta slovenski standard je istoveten z: ISO 24613-1:2019
ICS:
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24613-1:2019 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24613-1:2019

---------------------- Page: 2 ----------------------
SIST ISO 24613-1:2019
INTERNATIONAL ISO
STANDARD 24613-1
First edition
2019-06
Language resource management —
Lexical markup framework (LMF) —
Part 1:
Core model
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 1: Modèle de base
Reference number
ISO 24613-1:2019(E)
©
ISO 2019

---------------------- Page: 3 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2019
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO 2019 – All rights reserved

---------------------- Page: 4 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Key standards used by LMF . 3
4.1 Unicode . 3
4.2 Language coding . 3
4.3 Script coding . 3
4.4 Unified modeling language (UML) . 3
5 The LMF model . 3
5.1 Introduction . 3
5.2 Class inheritance and data category selection procedures . 4
5.2.1 Class inheritance . 4
5.2.2 LMF attributes . . 4
5.2.3 Data category selection (DCS) . 4
5.2.4 User-defined data categories . 4
5.3 LMF core package . 4
5.3.1 General. 4
5.3.2 LexicalResource class . 5
5.3.3 GlobalInformation class . 5
5.3.4 Lexicon class . 6
5.3.5 LexiconInformation class . 6
5.3.6 LexicalEntry class . 6
5.3.7 Form class . 6
5.3.8 OrthographicRepresentation class . 6
5.3.9 GrammaticalInformation Class . 6
5.3.10 Sense class . 7
5.3.11 Definition class . 7
5.4 Cross reference (CrossREF) model . 7
5.4.1 General. 7
5.4.2 CrossREF and CrossREFConstraint classes . 7
5.4.3 CrossREFConstraint class . 7
5.5 Methods for data category selection and subclass creation . 7
5.5.1 General. 7
5.5.2 Generalization (typing) . 8
5.5.3 Object instantiation . 8
5.5.4 Design choices . 8
5.5.5 Data categories for orthographic representation . 9
5.5.6 Principles for model simplification. 9
5.6 LMF extension use . 9
5.6.1 General. 9
5.6.2 Lexicon comparison .10
Annex A (informative) Data category examples .11
Bibliography .13
© ISO 2019 – All rights reserved iii

---------------------- Page: 5 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso
.org/iso/foreword .html.
The document was prepared by Technical Committee ISO/TC 37, Language and terminology,
Subcommittee 4, Language resource management.
This first edition of ISO 24613-1, together with ISO 24613-2 to ISO 24613-6, cancels and replaces
ISO 24613:2008, which has been technically revised.
The main changes compared to the previous edition are as follows:
The content has been entirely revised and subdivided into parts. Part 1, Core model, contains the
body of the previous edition. New classes include LexiconInformation and GrammaticalInformation.
The Representation class has been renamed the OrthographicRepresentation class. In addition, the
OrthographicRepresentation subclasses, FormRepresentation and TextRepresentation, no longer are
part of the core model, providing it with greater modeling flexibility. The LexicalEntry subclass now
allows subclasses, providing improved extensibility and flexibility for modeling future parts. The
addition of the CrossREF class and associated metadata provides a formal model for cross-reference
design and implementation, closing a functional gap in the previous edition. A thoroughly revised
description of data category allocation mechanisms and their relationship to generalization by typing
provides a more incisive description of how these interdependent mechanisms enable flexible and
extensible designs.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/members .html.
iv © ISO 2019 – All rights reserved

---------------------- Page: 6 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

Introduction
Optimizing the production, maintenance and extension of electronic lexical resources is one of the
crucial aspects impacting human language technologies (HLT) in general and natural language
processing (NLP) in particular, as well as human-oriented translation technologies. A second crucial
aspect involves optimizing the process leading to their integration in applications. Lexical markup
framework (LMF) is an abstract metamodel that provides a common, standardized framework for
the construction of computational lexicons. LMF ensures the encoding of linguistic information in a
way that enables reusability in different applications and for different tasks. LMF provides a common,
shared representation of lexical objects, including morphological, syntactic and semantic aspects.
The goals of LMF are to provide a common model for the creation and use of electronic lexical resources
ranging from small to large in scale, to manage the exchange of data between and among these
resources, and to facilitate the merging of large numbers of different individual electronic resources to
form extensive global electronic resources. The ultimate goal of LMF is to create a modular structure
that will facilitate true content interoperability across all aspects of electronic lexical resources.
[3]
LMF supports existing lexical resource models such as Genelex , the EAGLES International Standard
[4] [10]
for Language Engineering (ISLE) , Multilingual ISLE Lexical Entry (MILE) models , Text Encoding
[8] [7]
Initiative (TEI) guidelines , Ontolex , and the Language Base Exchange (LBX) serialization together
[5]
with the U.S. Government Wordscape On-Line Dictionary system .
[9]
LMF uses UML modeling processes . The LMF core package describes the basic hierarchy of information
of a lexical entry, including information on the word form. The core package is supplemented by various
resources that are part of the definition of LMF. These resources include:
— specific data categories used by the variety of resource types associated with LMF, both those data
categories relevant to the metamodel itself, and those associated with the extensions to the core
package in additional LMF parts (see Annex A for data category examples);
— the constraints governing the relationship of these data categories to the metamodel and to its
extensions;
— standard procedures for expressing these categories and thus for anchoring them on the structural
skeleton of LMF and relating them to the respective extension models;
— the vocabularies used by LMF to express related informational objects for describing how to extend
LMF through linkage to a variety of specific resources (extensions) and methods for analysing and
designing such linked systems.
LMF parts are expressed in a framework that describes the reuse of the LMF core components (such as
structures, data categories, and vocabularies) in conjunction with the additional components required
for a specific resource.
The parts currently in or planned for the new organization of ISO 24613 include Part 1: Core model, Part
2: Machine readable dictionary (MRD) model, Part 3: Diachrony-etymology, Part 4: TEI serialization, Part
5: LBX serialization, and Part 6: Syntax and semantics.
[2]
The ISO 24613 series is designed to coordinate closely with ISO 16642 .
© ISO 2019 – All rights reserved v

---------------------- Page: 7 ----------------------
SIST ISO 24613-1:2019

---------------------- Page: 8 ----------------------
SIST ISO 24613-1:2019
INTERNATIONAL STANDARD ISO 24613-1:2019(E)
Language resource management — Lexical markup
framework (LMF) —
Part 1:
Core model
1 Scope
This document describes the core model of the lexical markup framework (LMF)l, a metamodel for
representing data in monolingual and multilingual lexical databases used with computer applications.
LMF provides mechanisms that allow the development and integration of a variety of electronic lexical
resource types.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 639 (all parts), Codes for the representation of names of languages
ISO 15924, Information and documentation — Codes for the representation of names of scripts
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at http: //www .iso .org/obp
— IEC Electropedia: available at http: //www .electropedia .org/
3.1
data category
DC
elementary descriptor used in a linguistic description or annotation scheme
3.2
word form
instance of a word, multi-word expression, root, stem, or morpheme
3.3
grammatical feature
property associated with a word form (3.2) to describe one of its grammatical attributes
EXAMPLE /grammatical gender/
© ISO 2019 – All rights reserved 1

---------------------- Page: 9 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

3.4
lemma
lemmatized form
canonical form
conventional word form (3.2) chosen to represent a lexeme (3.5)
Note 1 to entry: In many European languages, the lemma is usually the /singular/ for a noun if there is a variation
in /number/, the /masculine/ form if there is a variation in /gender/ and the /infinitive/ for all verbs. In some
languages, certain nouns are defective in the singular form, in which case the /plural/ is chosen. In Arabic, for
a verb, the lemma is sometimes considered as being the third person singular with the accomplished aspect, in
other approaches it is considered as being the root.
3.5
lexeme
abstract unit generally associated with a set of word forms (3.2) sharing a common meaning
[SOURCE: ISO 24613:2008, 3.25, modified – "forms" replaced with "word forms".]
3.6
lexical resource
lexical database
database consisting of one or several lexicons (3.7)
3.7
lexicon
resource comprising lexical entries for one or several languages
Note 1 to entry: A special language lexicon or a lexicon prepared for a specific NLP application can comprise a
specific subset of a language.
3.8
multiword expression
MWE
lexeme (3.5) made up of a sequence of two or more lexemes that has properties that may not be
predictable from the properties of the individual lexemes or their normal mode of combination
EXAMPLE “To kick the bucket”, an idiomatic expression which means to die rather than to hit a bucket with
one's foot. An idiomatic expression is a subtype of MWE whose properties are not predictable from the properties
of the individual lexemes.
Note 1 to entry: An MWE can be a compound, a fragment of a sentence, or a sentence. The group of lexemes
making up an MWE can be continuous or discontinuous. It is not always possible to mark an MWE with a part of
speech (3.13).
3.9
natural language processing
NLP
field covering knowledge and techniques involved in the processing of linguistic data by a computer
3.10
orthography
way of spelling or writing lexemes (3.5) that conforms to a conventionalized use
Note 1 to entry: Usually, the notion of orthography covers standardized spellings of alphabetic languages, such
as standard UK or US English, or reformed German spelling, as well as hieroglyphic or syllabic writing systems.
For the purpose of this standard, we also subsume variations such as transliterations of languages in non-native
scripts, stenographic renderings, or representations in the International Phonetic Alphabet under the notion of
orthography.
2 © ISO 2019 – All rights reserved

---------------------- Page: 10 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

3.11
part of speech
lexical category
word class
category assigned to a lexeme (3.5) based on its grammatical properties
EXAMPLE Typical parts of speech for European languages include: noun, verb, adjective, adverb,
preposition, etc.
3.12
script
set of graphic characters used for the written form of one or more languages
EXAMPLE Hiragana, Katakana, Latin and Cyrillic.
Note 1 to entry: The description of scripts ranges from a high level classification such as hieroglyphic or syllabic
writing systems vs. alphabets to a more precise classification like Roman vs. Cyrillic. Scripts are defined by a list
of values taken from ISO 15924.
[SOURCE: ISO/IEC 10646:2017 3.50, modified – Example and Note 1 to entry added]
4 Key standards used by LMF
4.1 Unicode
LMF is Unicode-compliant and presumes that all data are used according to the Unicode character
encodings.
4.2 Language coding
Language identifiers used in LMF-compliant resources shall conform to criteria specified in the ISO 639
series of standards. Some issues involving the combination of language and country codes, as well as
the coordination of different parts of ISO 639 have been addressed in external standards supported by
[6]
the technology community. The current edition of IETF Best Common Practices (BCP) 47 should be
consulted.
4.3 Script coding
When the script code is not part of the language identifier, script identifiers shall conform to criteria
specified in ISO 15924.
4.4 Unified modeling language (UML)
LMF complies with the specifications and modeling principles of UML as defined by the Object
[9]
Management Group (OMG) , LMF uses a subset of UML that is relevant for linguistic description.
5 The LMF model
5.1 Introduction
LMF models are represented by UML classes, associations among the classes, and a set of data categories
that function as UML attribute-value pairs. The data categories are used to adorn the UML diagrams
that provide a high level view of the model. LMF specifications in the form of textual descriptions
describe the semantics of the modeling elements and provide more complete information about classes,
relationships, and extensions than can be included in UML diagrams.
In this process, lexicon developers shall use the classes that are specified in the LMF core package
(5.3), and classes that are defined in other LMF parts or classes derived from any of these referenced
© ISO 2019 – All rights reserved 3

---------------------- Page: 11 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

classes using documented LMF processes for class inheritance. Developers shall define a data category
selection (DCS) as specified for LMF data category selection procedures (5.2.3 and 5.2.4).
5.2 Class inheritance and data category selection procedures
5.2.1 Class inheritance
LMF specifies constraints on which classes allow subclasses.
5.2.2 LMF attributes
UML models such as LMF are populated or further described by UML attributes, which provide
information about specific properties or characteristics associated with the model. All LMF attributes
are complex data categories. For a given class, all attributes are different. Each value of an attribute is
either a simple data category or a Unicode string. Each attribute has only one value.
5.2.3 Data category selection (DCS)
In the broadest sense, a data category selection can comprise all the data categories used by a given
domain in the field of language resources. A DCS can also list and describe the set of data categories that
can be used in a given LMF lexicon. The DCS also describes constraints on how the data categories are
mapped to specific classes.
5.2.4 User-defined data categories
Lexicon creators can define a set of new data categories to cover data category concepts that are needed
and that are not available.
5.3 LMF core package
5.3.1 General
The LMF core package is a metamodel that provides a flexible basis for building LMF models and
extensions, see Figure 1.
NOTE Each word in a class name begins with a capital letter with no intervening spaces or punctuation. This
practice is not required by UML, but generally conforms with most UML documentation.
4 © ISO 2019 – All rights reserved

---------------------- Page: 12 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

Figure 1 — LMF core package
5.3.2 LexicalResource class
LexicalResource is a class representing the entire resource. LexicalResource occurs once and only once.
The LexicalResource instance is a container for one or more lexicons.
5.3.3 GlobalInformation class
GlobalInformation is a class representing administrative information and other general attributes.
There is a one-to-one aggregate association between the Lexicon class and the GlobalInformation class
in that the latter describes the administrative information and general attributes of the entire resource.
The GlobalInformation class does not allow subclasses.
The GlobalInformation instance shall contain at least the following attributes:
— /language coding/ This attribute specifies which standard is used in order to code the language
names within the whole LexicalResource instance.
The GlobalInformation instance can contain the following attributes:
— /script coding/ This attribute specifies which standard is used in order to code the script names
within the whole LexicalResource instance;
— /character coding/ This attribute specifies which Unicode version is used within the whole
LexicalResource instance.
NOTE Other standard related precisions can be specified on the GlobalInformation instance.
© ISO 2019 – All rights reserved 5

---------------------- Page: 13 ----------------------
SIST ISO 24613-1:2019
ISO 24613-1:2019(E)

5.3.4 Lexicon class
Lexicon is a class containing one or more lexical entries. The Lexicon class does not allow subclasses.
EXAMPLE A lexicon might include Spanish language entries in the domain of graphic arts; or a lexicon might
include Spanish language entries across multiple domains.
5.3.5 LexiconInformation class
LexiconInformation is a class representing administrative information and general attributes for a
given Lexicon. There is a one-to-one aggregate association between the LexiconInformation class and
the Lexicon class in that the former describes administrative information and attributes applicable to
the entire lexicon. The LexiconInformation class does not allow subclasses.
EXAMPLE The LexiconInformation class can manage /xml: lang/, /script/, and /notation/ attributes when
these are generally applicable to OrthographicRepresentation subclasses across the entire lexicon.
5.3.6 LexicalEntry class
The LexicalEntry class serves as a container for managing Form and Sense classes which have a close
conceptual relationship. The derivation of subclasses and new classes in future parts will require
modeling specifications that describe the associations between the forms and their related senses
whenever these diverge from the close conceptual relation defined in the core model. A LexicalEntry
instance can contain zero or more forms, and can have from zero or more different senses. The
LexicalEntry class allows subclasses.
EXAMPLE A bilingual dictionary usually requires at least one Sense class, while a lexicon that explicitly
describes inflected forms of a language (for example a lexicon containing all of or a substantial portion of the
inflected forms of each lemma or root in the lexicon (extensional morphology) may not require a Sense class,
depending on the design goals of the developer.
5.3.7 Form class
The Form class groups and manages all the information about the written and spoken f
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.