Genomics informatics — Clinical genomics data sharing specification for next-generation sequencing

This document specifies clinical sequencing information generated by massive parallel sequencing technology for sharing health information via massively parallel sequencing. This document covers the data fields and their metadata from the generation of sequence reads and base calling to variant evaluation and assertion for archiving reproducibility during health information exchange of clinical sequence information. However, the specimen collection, processing and storage, DNA extraction and DNA processing and library preparation, and the generation of test report are not in the scope of this document. This document hence defines the data types, relationship, optionality, cardinalities and bindings of terminology of the data. In essence, this document specifies: — the required data fields and their metadata from generation of sequence reads and base calling to variant evaluation and assertion for sharing clinical genomic sequencing data files generated by massively parallel sequencing technology, as shown in Figure 1; — the sequencing information from human samples using DNA sequencing by massively parallel sequencing technologies for clinical practice.

Informatique génomique — Spécification du partage des données de génomique clinique pour le séquençage de nouvelle génération

General Information

Status
Published
Publication Date
02-Jul-2023
Current Stage
6060 - International Standard published
Start Date
03-Jul-2023
Due Date
05-Feb-2023
Completion Date
03-Jul-2023
Ref Project

Buy Standard

Technical specification
ISO/TS 23357:2023 - Genomics informatics — Clinical genomics data sharing specification for next-generation sequencing Released:3. 07. 2023
English language
14 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
REDLINE ISO/PRF TS 23357 - Genomics informatics — Clinical genomics data sharing specification for next-generation sequencing Released:4. 05. 2023
English language
14 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/PRF TS 23357 - Genomics informatics — Clinical genomics data sharing specification for next-generation sequencing Released:4. 05. 2023
English language
14 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

TECHNICAL ISO/TS
SPECIFICATION 23357
First edition
2023-07
Genomics informatics — Clinical
genomics data sharing specification
for next-generation sequencing
Informatique génomique — Spécification du partage des données de
génomique clinique pour le séquençage de nouvelle génération
Reference number
ISO/TS 23357:2023(E)
© ISO 2023

---------------------- Page: 1 ----------------------
ISO/TS 23357:2023(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2023
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
  © ISO 2023 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/TS 23357:2023(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 2
4 Abbreviated terms . 4
5 Summary of the clinical genomic information model . 4
5.1 General . 4
5.2 Patient . 5
5.2.1 General . 5
5.2.2 Identifiers . 5
5.2.3 Name . 5
5.2.4 Sex . 5
5.2.5 Birth data . 5
5.2.6 Ethnicity . 5
5.2.7 List of diagnosis . 5
5.2.8 Treatment . 6
5.3 Specimen . 6
5.3.1 General . 6
5.3.2 Tissue or organ of origin . 6
5.3.3 Collection date . 6
5.3.4 Type of specimen . 7
5.4 Experimental equipment . 7
5.4.1 General . 7
5.4.2 Quality control . 7
5.4.3 Base calling information . 7
5.5 Analysis equipment . 9
5.5.1 General . 9
5.5.2 Read alignment . 10
5.5.3 Alignment post processing . 11
5.5.4 Variant calling . 11
5.5.5 Variant annotation .12
5.6 Derived data .12
5.6.1 General .12
5.6.2 FASTAQ FASTQ . 13
5.6.3 Sequence alignment map (SAM) . . 13
5.6.4 Binary alignment map (BAM) . 13
5.6.5 Compressed reference-oriented alignment map (CRAM) .13
5.6.6 Variant cell format (VCF) . 13
5.6.7 Mutation annotation format (MAF) . 13
Bibliography .14
iii
© ISO 2023 – All rights reserved

---------------------- Page: 3 ----------------------
ISO/TS 23357:2023(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use
of (a) patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed
patent rights in respect thereof. As of the date of publication of this document, ISO had not received
notice of (a) patent(s) which may be required to implement this document. However, implementers are
cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents. ISO shall not be held responsible for identifying any or all
such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 215, Health informatics, Subcommittee SC
1, Genomics informatics.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
  © ISO 2023 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/TS 23357:2023(E)
Introduction
Owing to the rapid advancement of next-generation sequencing technologies, the human genome is
[7]
being adopted in clinical settings to realize precision medicine . Massive parallel sequencing or next-
generation sequencing (NGS) is any of several high-throughput approaches to DNA sequencing using
the concept of massively parallel processing. These technologies use miniaturized and parallelized
platforms for sequencing of 1 million to 43 billion short reads (50-400 bases each) per instrument run.
The data obtained in a clinical setting should be shared with another institution when patients move or
shared with the patient if requested.
The clinical application steps based on clinical sequence information consist of:
a) specimen collection, processing and storage;
b) DNA extraction;
c) DNA processing and library preparation;
d) generation of sequence reads and base calling;
e) sequencing alignment/mapping;
f) variant calling;
g) variant annotation and filtering;
h) variant evaluation and assertion;
[8]
i) generation of test report .
It is required to share clinical sequencing information at a level that can reproduce the results of the
institution that obtained the initial clinical sequencing information. In addition, the shared clinical
genomic sequencing data should be interoperable.
This document proposes a data specification to integrate multi-layered sequencing files and related
parameters and clinical data for achieving the reproducibility of genomic data in clinical practice.
This document will assist health IT companies by proposing new system requirements to deal with
genomic data.
This document can be used to store and share clinical genomic data in electronic health records. In
addition, it will be helpful in translational research, which requires genomic and clinical data from
multiple institutes.
v
© ISO 2023 – All rights reserved

---------------------- Page: 5 ----------------------
TECHNICAL SPECIFICATION ISO/TS 23357:2023(E)
Genomics informatics — Clinical genomics data sharing
specification for next-generation sequencing
1 Scope
This document specifies clinical sequencing information generated by massive parallel sequencing
technology for sharing health information via massively parallel sequencing. This document covers
the data fields and their metadata from the generation of sequence reads and base calling to variant
evaluation and assertion for archiving reproducibility during health information exchange of clinical
sequence information. However, the specimen collection, processing and storage, DNA extraction and
DNA processing and library preparation, and the generation of test report are not in the scope of this
document.
This document hence defines the data types, relationship, optionality, cardinalities and bindings of
terminology of the data.
In essence, this document specifies:
— the required data fields and their metadata from generation of sequence reads and base calling to
variant evaluation and assertion for sharing clinical genomic sequencing data files generated by
massively parallel sequencing technology, as shown in Figure 1;
— the sequencing information from human samples using DNA sequencing by massively parallel
sequencing technologies for clinical practice.
NOTE The grey shaded text indicates the scope of this document.
Figure 1 — Clinical application processes based on next-generation sequencing (NGS) data
2 Normative references
There are no normative references in this document.
1
© ISO 2023 – All rights reserved

---------------------- Page: 6 ----------------------
ISO/TS 23357:2023(E)
3 Terms and definitions
For the purposes of this document, the terms and definitions given in [external document reference
xxx] and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
clinical sequencing
next generation sequencing or future sequencing technologies using human samples for clinical practice
and clinical trials
[SOURCE: ISO/TS 20428:2017, 3.5, modified — "later" has been replaced with "future" in the definition.]
3.2
deoxyribonucleic acid
DNA
molecule that encodes the genetic information in the nucleus of cells
[SOURCE: ISO 25720:2009, 4.7]
3.3
DNA sequencing
determination of the order of nucleotide bases (adenine, guanine, cytosine, and thymine) in a molecule
of DNA (3.4)
Note 1 to entry: Sequence is generally described from the 5’ end.
[SOURCE: ISO 17822:2020, 3.19]
3.4
exome
part of the genome that corresponds to the complete complement of the exons of a cell
3.7
FASTQ
text-based format for storing both the biological sequence (typically nucleotide sequence) and its
corresponding quality scores
3.8
gene
category of nucleic acid sequences that functions as a unit of heredity and codes for the basic instructions
for the development, reproduction, and maintenance of organisms
3.9
germline
series of germ cells, each descended or developed from earlier cells in the series, regarded as continuing
through successive generations of an organism
[SOURCE: ISO/TS 20428:2017, 3.17]
3.10
indel
insertion (3.15) or/and deletion (3.7)
[SOURCE: ISO/TS 20428:2017, 3.18]
2
  © ISO 2023 – All rights reserved

---------------------- Page: 7 ----------------------
ISO/TS 23357:2023(E)
3.12
mutation annotation format
MAF
tab-delimited text file with aggregated mutation information from variant call format (3.21) files and
generated on a project-level
3.13
next-generation sequencing
massive parallel sequencing
NGS
technology that can sequence millions of small fragments of DNA (3.4) in parallel
3.14
sequence read
read
fragmented nucleotide sequences which are used to reconstruct the original sequence for next
generation sequencing technologies
[SOURCE: ISO/TS 20428:2017, 3.26]
3.15
read type
type of implementation in the sequencing instrument
Note 1 to entry: It can be either single-end or paired-end.
Note 2 to entry: Single-end: Single read (3.14) implements the sequencing instrument reads from one end of a
fragment to the other end.
Note 3 to entry: Paired-end: Paired end implements a read from one end to the other end and then starts another
round of reading from the opposite end.
[SOURCE: ISO/TS 20428:2017, 3.27, modified — "run" has been replaced with "implementation" in the
definition and the notes to entry.]
3.16
reference sequence
sequence file that is used as a reference to describe the variants that are present in the analyzed
sequence
3.18
specimen
biospecimen
sample of a tissue, body fluid, food, or other substance collected or acquired to support the assessment,
diagnosis, treatment, mitigation or prevention of a disease, disorder or abnormal physical state, or its
symptoms
[SOURCE: ISO/TS 20428:2017, 3.34, modified — the term "biological specimen" has been removed.]
3.19
subject of care
person who uses or is a potential user of a health care service
[SOURCE: ISO/TS 22220:2011, 3.2, modified — "Note 1 to entry" and the abbreviated term "SOC" have
been deleted.]
3.20
target capture
method to capture genomic regions of interest from a DNA (3.4) sample prior to sequencing
[SOURCE: ISO/TS 20428:2017, 3.36]
3
© ISO 2023 – All rights reserved

---------------------- Page: 8 ----------------------
ISO/TS 23357:2023(E)
3.21
variant call format
VCF
format of the text file used in bioinformatics for storing gene (3.8) sequence variations
4 Abbreviated terms
BAM binary alignment map
bp base pair
COSMIC Catalogue of Somatic Mutations in Cancer
CRAM compressed reference-oriented alignment map
EBI European Bioinformatics Institute
HGNC HUGO Gene Nomenclature Committee
HGVS Human Genome Variation Society
HUGO Human Genome Organization
MAF mutation annotation format
NCBI National Center for B
...

© ISO #### – All rights reserved
ISO/TS 23357:####(X:2023(E)
ISO TC 215/ SC 1 / WG 1
Secretariat: KATS
Genomics informatics -— Clinical genomics data sharing specification for next-generation
sequencing

CD stage

Warning for WDs and CDs
This document is not an ISO International Standard. It is distributed for review and comment. It is subject to
change without notice and may not be referred to as an International Standard.
Recipients of this draft are invited to submit, with their comments, notification of any relevant patent rights of
which they are aware and to provide supporting documentation.
To help you, this guide on writing standards was produced by the ISO/TMB and is available at
A model manuscript of a draft International Standard (known as “The Rice Model”) is available at

---------------------- Page: 1 ----------------------
© ISO 2018, Published in Switzerland
Date: 2023-05-04

---------------------- Page: 2 ----------------------
ISO/TS 23357:2023(E)
© ISO 2023
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of
this publication may be reproduced or utilized otherwise in any form or by any means, electronic or
mechanical, including photocopying, or posting on the internet or an intranet, without prior written
permission. Permission can be requested from either ISO at the address below or ISO’sISO's member body in
the country of the requester.
ISO Copyright Office
Ch. de Blandonnet 8 • CP 401 • CH-1214 Vernier, Geneva , Switzerland
Tel. Phone: + 41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.orgFax + 41 22 749 09 47
copyright@iso.org
www.iso.org

Published in Switzerland.
© ISO 2023 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/TS 23357:2023(E)
Contents
Foreword . 5
Introduction. 6
Clinical genomic data sharing specification for next-generation sequencing . 7
1. Scope . 7
2. Normative references . 7
3. Terms and definitions . 7
4. Abbreviated . 11
5. Abbreviated Clinical genomic information model . 12
Figure 2. — Major structure of the genomic data model . 12
5.1 Patient . 12
5.1.1 General . 12
5.1.2 Identifiers . 12
5.1.3 Name . 13
5.1.4 Birth data . 13
5.1.5 Sex . 13
5.1.6 Race . 13
5.1.7 List of diagnosis . 13
5.1.7.1 Age of diagnosis . 13
5.1.8 Treatment . 13
5.1.8.1 Prior treatment . 13
5.1.8.2 Treatment outcome . 13
5.2 Specimen . 14
5.2.1 General . 14
5.2.2 Tissue or organ of origin . 14
5.2.3 Collection date . 14
5.2.4 Type of specimen . 14
5.3 Experimental equipment . 15
5.3.1 General . 15
5.3.2 Quality control . 15
iv © ISO 2023 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/TS 23357:2023(E)
5.3.3 Base calling information . 15
5.3.3.1 Read depth . 15
5.3.3.2 Reference allelic depth . 15
5.3.3.3 Alternative allelic depth . 16
5.3.3.4 Allele frequency . 16
5.3.3.5 Genotype . 16
5.3.3.6 Type of sequencers . 16
5.3.3.7 Library preparation methods . 16
5.3.3.7.1 Target capture methods . 16
5.3.3.8 Read type . 16
5.3.3.9 Read length . 16
5.4 Analysis equipment . 17
5.4.1 General . 17
5.4.2 Read alignment . 18
5.4.3 Alignment post processing . 19
5.4.4 Variant calling . 19
5.4.5 Variant annotation . 20
5.5 Derived data . 21
5.5.1 General . 21
5.5.2 FASTAQ FASTQ . 21
5.5.3 Sequence alignment map (SAM) . 21
5.5.4 Binary Alignment Map (BAM) . 21
5.5.5 Compressed Reference-oriented Alignment Map (CRAM) . 21
5.5.6 Variant cell format (VCF) . 21
5.5.7 Mutation Annotation Format (MAF) . 21
Bibliography . 23
© ISO 2023 – All rights reserved v

---------------------- Page: 5 ----------------------
ISO/TS 23357:2023(E)
Foreword
ISO (the International Organization for Standardization (ISO) is a worldwide federation of national
standards bodies (ISO member bodies). The work of preparing International Standards is typically
conducted bynormally carried out through ISO technical committees. Each member body interested in a
subject for which a technical committee has been established has the right to be represented on that
committee. International organizations, governmental and non-governmental, in liaison with ISO, also
participatetake part in the work. The ISO collaborates closely with the International Electrotechnical
Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documentsdocument should be noted. This document was drafted in accordance
with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawnISO draws attention to the possibility that some of the elementsimplementation of this
document may beinvolve the subjectuse of (a) patent(s). ISO takes no position concerning the evidence,
validity or applicability of any claimed patent rights in respect thereof. As of the date of publication of
this document, ISO had not received notice of (a) patent(s) which may be required to implement this
document. However, implementers are cautioned that this may not represent the latest information,
which may be obtained from the patent database available at www.iso.org/patents. ISO shall not be held
responsible for identifying any or all such patent rights. Details of any patent rights identified during the
development of the document will be in the Introduction and/or in the ISO list of patent declarations
received (see ).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, definitionsthe meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about theISO's adherence of ISO to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT)), see the
following URL: www.iso.org/iso/foreword.html.
Field Code Changed
This document was prepared by Technical Committee ISO/TC215TC 215, Health informatics,
Subcommittee SC 1, Genomics informatics.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
vi © ISO 2023 – All rights reserved

---------------------- Page: 6 ----------------------
ISO/TS 23357:2023(E)
Introduction
Owing to the rapid advancement of next-generation sequencing technologies, the human genome is being
[7]
adopted in clinical settings to realize precision medicine [1]. Massive parallel sequencing or next-
generation sequencing (NGS) is any of several high-throughput approaches to DNA sequencing using the
concept of massively parallel processing. These technologies use miniaturized and parallelized platforms
for sequencing of 1 million to 43 billion short reads (50-400 bases each) per instrument run. The data
obtained in a clinical setting should be shared with another institution when patients move or shared
with the patient if requested.
The clinical application steps based on clinical sequence information consist of:
— (1) a) specimen collection, processing, and storage;
— (2) b) DNA extraction;
— (3) c) DNA processing and library preparation;
— (4) d) generation of sequence reads and base calling;
— (5) e) sequencing alignment/mapping;
— (6) f) variant calling;
— (7) g) variant annotation and filtering;
— (8) h) variant evaluation and assertion;
[8]
— (9) i) generation of test report [2] .
It is required to share clinical sequencing information at a level that can reproduce the results of the
institution that obtained the initial clinical sequencing information. In addition, the shared clinical
genomic sequencing data should be interoperable.
This document proposes a data specification to integrate multi-layered sequencing files and related
parameters and clinical data for achieving the reproducibility of genomic data in clinical practice.
This document will assist health IT companies by proposing new system requirements to deal with
genomic data.
This document can be used to store and share clinical genomic data in electronic health records. In
addition, it will be helpful in translational research, which requires genomic and clinical data from
multiple institutes.
© ISO 2023 – All rights reserved vii

---------------------- Page: 7 ----------------------
TECHNICAL SPECIFICATION ISO/TS 23357:2023(E)

Genomics informatics — Clinical genomic data sharing
specification for next-generation sequencing
1 1. Scope
This document specifies clinical sequencing information generated by massivelymassive parallel
sequencing technology for sharing health information via massively parallel sequencing. This document
covers the data fields and their metadata from stepsthe generation of sequence reads and base calling to
variant evaluation and assertion for archiving reproducibility during health information exchange of
clinical sequence information. However, the specimen collection, processing, and storage, DNA
extraction, and DNA processing and library preparation, and the generation of test report are not in the
scope of this document.
This document hence defines the data types, relationship, optionality, cardinalities, and bindings of
terminology of the data.
In essence, this document specifies:
— the required data fields and their metadata from generation of sequence reads and base calling to
variant evaluation and assertion for sharing clinical genomic sequencing data files generated by
massively parallel sequencing technology, as shown in Figure 1.;
— the sequencing information from human samples using DNA sequencing by massively parallel
sequencing technologies for clinical practice.
Specimen
DNA processing Generation of Sequencing
collection
DNA extraction and library sequence reads alignment /
processing, and
preparation and base calling mapping
storage
Generation of test Variant evaluation Variant annotation
Variant calling
report and assertion and filtering

© ISO 2023 – All rights reserved 1

---------------------- Page: 8 ----------------------
ISO/TS 23357:2023(E)

NOTE The grey shaded text indicates the scope of this document.
Figure The 1 — Clinical application processes based on next-generation sequencing (NGS) data.
The gray filled box is the scope of this document.
2 2. Normative references
There are no normative references in this document.

43 3. Terms and definitions
For the purposes of this document, the following terms and definitions given in [external document
reference xxx] and the following apply:.
The ISO and the IEC maintain terminologicalterminology databases for use in standardization at the
following addresses:
— ISO Online browsing platform: available at https://www.iso.org/obp
— IEC Electropedia: available at https://www.electropedia.org/
3.1
chromosome
structures consisting of or containing DNA, which carries the genetic information essential to the cell.
[SOURCE: ISO 19238:2014, 2.7]
3.2
clinical sequencing
next -generation sequencing or future sequencing technologies using human samples for clinical practice
and clinical trials
[SOURCE: ISO/TS 20428:2017, 3.5]
3.3
deletion
2 © ISO 2023 – All rights reserved

---------------------- Page: 9 ----------------------
ISO/TS 23357:2023(E)
mutation, modified — "later" has been replaced with "future" in which a part of a chromosome or a
sequence of DNA is lost during DNA replicationthe definition.]
[SOURCE: ISO 20428:2017, 3.10]
3.4
3.2
deoxyribonucleic acid
DNA
molecule that encodes the genetic information in the nucleus of cells
[SOURCE: ISO 25720:2009, 4.7]
3.53
DNA sequencing
fourdetermination of the order of nucleotide bases (adenine, guanine, cytosine, and thymine) are the four
nucleotide bases that make upin a DNA molecule. of DNA (3.4)
Note 1 to entry: Sequence is generally described from the 5’ end.
[SOURCE: ISO/TS 178221:2014 17822:2020, 3.2019]
3.64
exome
part of the genome that corresponds to the complete complement of the exons of a cell
[SOURCE: ISO 20428:2017, 3.13]
3.7
FASTQ
text-based format for storing both the biological sequence (typically nucleotide sequence) and its
corresponding quality scores
3.8
gene
category of nucleic acid sequences that functions as a unit of heredity and codes for the basic instructions
for the development, reproduction, and maintenance of organisms
[SOURCE: ISO 11238:2012, 2.1.16]
3.9
germline
series of germ cells, each descended or developed from earlier cells in the series, regarded as continuing
through successive generations of an organism
[SOURCE: ISO/TS 20428:2017, 3.17]
3.10
indel
insertion (3.15) or/and deletion (3.7)
[SOURCE: ISO/TS 20428:2017, 3.18]
© ISO 2023 – All rights reserved 3

---------------------- Page: 10 ----------------------
ISO/TS 23357:2023(E)
3.11
insertion
addition of one or more nucleotide base pairs into a DNA sequence
[SOURCE: ISO 20428:2017, 3.19]
3.12
mutation annotation format
MAF
tab-delimited text file with aggregated mutation information from VCFvariant call format (3.21) files and
generated on a project-level
3.13
next-generation sequencing
massive parallel sequencing
NGS
technology that can sequence of millions of small fragments of DNA (3.4) in parallel.
3.14
sequence read
read
fragmented nucleotide sequence,sequences which isare used to reconstruct the original sequence for
next -generation sequencing technologies
[SOURCE: ISO/TS 20428:2017, 3.26]
3.15
read type
type of implementation in the sequencing instrument
Note 1 to entry: It can be either single-end or paired-end.
Note 2 to entry: Single-end: Single read (3.14) implements the sequencing instrument reads from one end of a
fragment to the other end.
Note 3 to entry: Paired-end: Paired end implements a read from one end to the other end and then starts another
round of reading from the opposite end.
[SOURCE: ISO/TS 20428:2017, 3.27], modified — "run" has been replaced with "implementation" in the
definition and the notes to entry.]
3.16
reference sequence
sequence file that is used as a reference to describe the variants that are present in the analyzed sequence
[SOURCE: ISO 20428:2017, 3.28]
3.17
ribonucleic acid
RNA
polynucleotide consisting essentially of chains with a repeating backbone of phosphate and ribose units
to which nitrogenous bases are attached. In this context, the RNA in the body of a human individual. May
be specified as to origin. Includes ribosomal RNA (rRNA), messenger RNA (mRNA), transfer RNA (tRNA),
micro RNA (miRNA), and other non-coding RNA (ncRNA) [2]
4 © ISO 2023 – All rights reserved

---------------------- Page: 11 ----------------------
ISO/TS 23357:2023(E)
[SOURCE: ISO 22174:2005, 3.1.3]
3.18
specimen
biospecimen
specimen
a sample of a tissue, body fluid, food, or other substance collected or acquired to support the assessment,
diagnosis, treatment, mitigation or prevention of a disease, disorder or abnormal physical state, or its
symptoms
[SOURCE: ISO/TS 20428:2017, 3.34], modified — the term "biological specimen" has been removed.]
3.19
subject of care
any person who uses or is a potential user of a health care service
[SOURCE: ISO/TS 22220:2011, 3.2], modified — "Note 1 to entry" and the abbreviated term "SOC" have
been deleted.]
3.20
target capture
method to capture the genomic regions of interest from a DNA (3.4) sample prior to sequencing
[SOURCE: ISO/TS 20428:2017, 3.36]
3.21
variant call format
VCF
format of the text file used in bioinformatics for storing gene (3.8) sequence variations

© ISO 2023 – All rights reserved 5

---------------------- Page: 12 ----------------------
ISO/TS 23357:2023(E)
64 4. Abbreviated terms
BAM binary alignment map
bp base pair
BAM Binary Alignment Map
COSMIC Catalogue of Somatic Mutations in Cancer
CRAM compressed reference-oriented alignment map
EBI European Bioinformatics Institute
HGNC HUGO Gene Nomenclature Committee
HGVS Human Genome Variation Society
HUGO Human Genome Organization
MAF mutation annotation format
MAF Mutation Annotation Format
NCBI National Center for Biotechnology Information
NGS next-generation sequencing
VCF Variant Call File
SAM sequence alignment map
VCF variant call file
7
6 © ISO 2023 – All rights reserved

---------------------- Page: 13 ----------------------
ISO/TS 23357:2023(E)
85 5. AbbreviatedSummary of the clinical genomic information model
5.1 General
The clinical genomic information model defines the structure and the organization of the information
related to the communication of the clinical genomic data generated by massively parallel sequencing
technology.
Figure 2 shows the relationships between the major structures of the clinical genomic information model.
Patient
has
Specimen
Contains
Quality Experimental Analysis
Derived data
control Equipment Equipment
d

© ISO 2023 – All rights reserved 7

---------------------- Page: 14 ----------------------
ISO/TS 23357:2023(E)


Figure 2. — Major structure of the genomic data model
8.15.2 5.1 Patient
8.1.15.2.1 General
A patient is a person receiving or registered to receive healthcare services or is a subject of one or
more studies for some other purpose, such as clinical trials.
8.1.25.2.2 Identifiers
The unique identifiers for the subject of care (see Table 1)shall be included.
8.1.35.2.3 Name
The subject of care name (see Table 1) shall be given as a general rule.
5.2.4 Sex
The sex of the subject of care (see Table 1) shall be in accordance with ISO/TS 22220:2011.

8.1.45.2.5 Birth data
The birth date of the subject of care (see Table 1) shall be given to calculate the age of the patient. The
birth date should be in accordance with according to ISO 8601-1 and, if necessary, ISO 8601-2.
8.1.5 Sex
5.2.6 Ethnicity
The sexethnicity of the subject of care shall be in accordance with ISO 22220:2011.

5.1.2 Race
The Race of the subject of care(see Table 1) should be notified to represent his or her genetic origin. The
Raceethnicity information should be represented by HL7 v3 Code System Race
8 © ISO 2023 – All rights reserved

---------------------- Page: 15 ----------------------
ISO/TS 23357:2023(E)
[9]
(https://www.hl7.org/fhir/v3/Race/index.html). . Alternatively, if there are national standards, those
coding systems can be used, for example, US FDA Guidance for Industry – Collection of Race and Ethnicity
Data in Clinical Trials. The raceethnicity of the patient shall be reported.
8.1.65.2.7 List of diagnosis
5.2.7.1 General
Diagnosis list, including pertinent data from the investigation, analysis, and recognition of the presence
and nature of disease, condition, or injury from expressed signs and symptoms. If possible, diagnosis
should be included using ICD 10 or 11 codes, SNOMED-CT code, or other widely adopted ontologies. The
list of diagnosis for the patients shall be reported.
8.1.6.15.2.7.2 Age of diagnosis
The age at the time of diagnosis since birth (see Table 1) shall be included and can be expressed in the
number of years.
8.1.75.2.8 Treatment
8.1.7.15.2.8.1 Prior treatment
A text description should be included that describes the prior treatment (see Table 1) received before the
body specimen was collected.
8.1.7.25.2.8.2 Treatment outcome
A text description should be included that describes the final outcome of the patient (see Table 1) after
the treatment was administered.
Table. 1 — Summary of patient related metrics
Category Metrics Value representation Optionality
Patient related
metrics shall be in
Patient Identifier - — accordance with
Merged Cells
Table 1
Mandatory
Patient related
metrics shall be in
Name - —
accordance with
Table 1Mandatory
Patient related
metrics shall be in
Birth date ISO 8601-1, ISO 8601-2
accordance with
Table 1Mandatory
Patient related
metrics shall be in
Sex Male, female
accordance with
Table 1Mandatory
Patient related
metrics shall be in
RaceEthnicity HL7 v3 Code System Race
accordance with
Table 1Mandatory
© ISO 2023 – All rights reserved 9

---------------------- Page: 16 ----------------------
ISO/TS 23357:2023(E)
Patient related
ICD 10 or 11, SNOMED-CT, or other widely metrics shall be in
Diagnosis list
adopted ontologies accordance with
Table 1Mandatory
Patient related
metrics shall be in
Age of diagnosis Integer
accordance with
Table 1Mandatory
Prior treatment -— Optional
Treatment outcome -— Optional
8.25.3 Specimen
8.2.15.3.1 General
The specimen information shall be represented by the subject of care identifier type code of ISO/TS
22220:2011. (see Table 2).
EXAMPLE 13-S-048435_A1 - Pathology Number: ISO/TS 22220:2011 (SOC identifier designation: 13-S-
048435_A1, SOC identifier geographic area: 1 (local), SOC identifier issuer: AMC (ABC Medical Center), 02
(specialtyspeciality number - pathology).
8.2.25.3.2 Tissue or organ of origin
Specimen origin refers to the anatomical site from which the specimen was acquired. The anatomical site
[10]
shall be represented by SNOMED CT or other vocabulary. (see Table 2).
8.2.35.3.3 Collection of date
Collection of The collection date (see Table 2) is the date when the specimen was acquired. The collection
of date shall be represented in accordance with ISO 8601-1 and ISO 8601-2.
8.2.45.3.4 Type of specimen
Types of specimens (see Table 2) can be represented by the Standard Preanalytical Code (SPREC) of the
International Society for Biological and Environmental Repositories [613]. Currently, SPREC Version
[11]
3.0 is the up-to-date version.
EXAMPLE BLD (Blood), BUF (buffy coated), non-blood tissue (CEN), semen (SEM)).
Table. 2 — Summary of patient related metrics
Category Metrics Value representation Optionality
Specimen related
metrics shall be in
General Identifier type code of ISO/TS 22220:2011
accordance with
Table 2Mandatory
Specimen
Specimen related
metrics shall be in
Tissue or organ of origin SNOMED CT or other vocabulary
accordance with
Table 2Mandatory
10 © ISO 2023 – All rights reserved

---------------------- Page: 17 -----------------
...

TECHNICAL ISO/TS
SPECIFICATION 23357
First edition
Genomics informatics — Clinical
genomics data sharing specification
for next-generation sequencing
PROOF/ÉPREUVE
Reference number
ISO/TS 23357:2023(E)
© ISO 2023

---------------------- Page: 1 ----------------------
ISO/TS 23357:2023(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2023
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
PROOF/ÉPREUVE © ISO 2023 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/TS 23357:2023(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 2
4 Abbreviated terms . 4
5 Summary of the clinical genomic information model . 4
5.1 General . 4
5.2 Patient . 5
5.2.1 General . 5
5.2.2 Identifiers . 5
5.2.3 Name . 5
5.2.4 Sex . 5
5.2.5 Birth data . 5
5.2.6 Ethnicity . 5
5.2.7 List of diagnosis . 5
5.2.8 Treatment . 6
5.3 Specimen . 6
5.3.1 General . 6
5.3.2 Tissue or organ of origin . 6
5.3.3 Collection date . 6
5.3.4 Type of specimen . 7
5.4 Experimental equipment . 7
5.4.1 General . 7
5.4.2 Quality control . 7
5.4.3 Base calling information . 7
5.5 Analysis equipment . 9
5.5.1 General . 9
5.5.2 Read alignment . 10
5.5.3 Alignment post processing . 11
5.5.4 Variant calling . 11
5.5.5 Variant annotation . 11
5.6 Derived data .12
5.6.1 General .12
5.6.2 FASTAQ FASTQ .12
5.6.3 Sequence alignment map (SAM) . .12
5.6.4 Binary alignment map (BAM) .12
5.6.5 Compressed reference-oriented alignment map (CRAM) .13
5.6.6 Variant cell format (VCF) . 13
5.6.7 Mutation annotation format (MAF) . 13
Bibliography .14
iii
© ISO 2023 – All rights reserved PROOF/ÉPREUVE

---------------------- Page: 3 ----------------------
ISO/TS 23357:2023(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use
of (a) patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed
patent rights in respect thereof. As of the date of publication of this document, ISO had not received
notice of (a) patent(s) which may be required to implement this document. However, implementers are
cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents. ISO shall not be held responsible for identifying any or all
such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 215, Health informatics, Subcommittee SC
1, Genomics informatics.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
PROOF/ÉPREUVE © ISO 2023 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/TS 23357:2023(E)
Introduction
Owing to the rapid advancement of next-generation sequencing technologies, the human genome is
[7]
being adopted in clinical settings to realize precision medicine. Massive parallel sequencing or next-
generation sequencing (NGS) is any of several high-throughput approaches to DNA sequencing using
the concept of massively parallel processing. These technologies use miniaturized and parallelized
platforms for sequencing of 1 million to 43 billion short reads (50-400 bases each) per instrument run.
The data obtained in a clinical setting should be shared with another institution when patients move or
shared with the patient if requested.
The clinical application steps based on clinical sequence information consist of:
a) specimen collection, processing and storage;
b) DNA extraction;
c) DNA processing and library preparation;
d) generation of sequence reads and base calling;
e) sequencing alignment/mapping;
f) variant calling;
g) variant annotation and filtering;
h) variant evaluation and assertion;
[8]
i) generation of test report .
It is required to share clinical sequencing information at a level that can reproduce the results of the
institution that obtained the initial clinical sequencing information. In addition, the shared clinical
genomic sequencing data should be interoperable.
This document proposes a data specification to integrate multi-layered sequencing files and related
parameters and clinical data for achieving the reproducibility of genomic data in clinical practice.
This document will assist health IT companies by proposing new system requirements to deal with
genomic data.
This document can be used to store and share clinical genomic data in electronic health records. In
addition, it will be helpful in translational research, which requires genomic and clinical data from
multiple institutes.
v
© ISO 2023 – All rights reserved PROOF/ÉPREUVE

---------------------- Page: 5 ----------------------
TECHNICAL SPECIFICATION ISO/TS 23357:2023(E)
Genomics informatics — Clinical genomics data sharing
specification for next-generation sequencing
1 Scope
This document specifies clinical sequencing information generated by massive parallel sequencing
technology for sharing health information via massively parallel sequencing. This document covers
the data fields and their metadata from the generation of sequence reads and base calling to variant
evaluation and assertion for archiving reproducibility during health information exchange of clinical
sequence information. However, the specimen collection, processing and storage, DNA extraction and
DNA processing and library preparation, and the generation of test report are not in the scope of this
document.
This document hence defines the data types, relationship, optionality, cardinalities and bindings of
terminology of the data.
In essence, this document specifies:
— the required data fields and their metadata from generation of sequence reads and base calling to
variant evaluation and assertion for sharing clinical genomic sequencing data files generated by
massively parallel sequencing technology, as shown in Figure 1;
— the sequencing information from human samples using DNA sequencing by massively parallel
sequencing technologies for clinical practice.
NOTE The grey shaded text indicates the scope of this document.
Figure 1 — Clinical application processes based on next-generation sequencing (NGS) data
2 Normative references
There are no normative references in this document.
1
© ISO 2023 – All rights reserved PROOF/ÉPREUVE

---------------------- Page: 6 ----------------------
ISO/TS 23357:2023(E)
3 Terms and definitions
For the purposes of this document, the terms and definitions given in [external document reference
xxx] and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
clinical sequencing
next generation sequencing or future sequencing technologies using human samples for clinical practice
and clinical trials
[SOURCE: ISO/TS 20428:2017, 3.5, modified — "later" has been replaced with "future" in the definition.]
3.2
deoxyribonucleic acid
DNA
molecule that encodes the genetic information in the nucleus of cells
[SOURCE: ISO 25720:2009, 4.7]
3.3
DNA sequencing
determination of the order of nucleotide bases (adenine, guanine, cytosine, and thymine) in a molecule
of DNA (3.4)
Note 1 to entry: Sequence is generally described from the 5’ end.
[SOURCE: ISO 17822:2020, 3.19]
3.4
exome
part of the genome that corresponds to the complete complement of the exons of a cell
3.7
FASTQ
text-based format for storing both the biological sequence (typically nucleotide sequence) and its
corresponding quality scores
3.8
gene
category of nucleic acid sequences that functions as a unit of heredity and codes for the basic instructions
for the development, reproduction, and maintenance of organisms
3.9
germline
series of germ cells, each descended or developed from earlier cells in the series, regarded as continuing
through successive generations of an organism
[SOURCE: ISO/TS 20428:2017, 3.17]
3.10
indel
insertion (3.15) or/and deletion (3.7)
[SOURCE: ISO/TS 20428:2017, 3.18]
2
PROOF/ÉPREUVE © ISO 2023 – All rights reserved

---------------------- Page: 7 ----------------------
ISO/TS 23357:2023(E)
3.12
mutation annotation format
MAF
tab-delimited text file with aggregated mutation information from variant call format (3.21) files and
generated on a project-level
3.13
next-generation sequencing
massive parallel sequencing
NGS
technology that can sequence millions of small fragments of DNA (3.4) in parallel
3.14
sequence read
read
fragmented nucleotide sequences which are used to reconstruct the original sequence for next
generation sequencing technologies
[SOURCE: ISO/TS 20428:2017, 3.26]
3.15
read type
type of implementation in the sequencing instrument
Note 1 to entry: It can be either single-end or paired-end.
Note 2 to entry: Single-end: Single read (3.14) implements the sequencing instrument reads from one end of a
fragment to the other end.
Note 3 to entry: Paired-end: Paired end implements a read from one end to the other end and then starts another
round of reading from the opposite end.
[SOURCE: ISO/TS 20428:2017, 3.27, modified — "run" has been replaced with "implementation" in the
definition and the notes to entry.]
3.16
reference sequence
sequence file that is used as a reference to describe the variants that are present in the analyzed
sequence
3.18
specimen
biospecimen
sample of a tissue, body fluid, food, or other substance collected or acquired to support the assessment,
diagnosis, treatment, mitigation or prevention of a disease, disorder or abnormal physical state, or its
symptoms
[SOURCE: ISO/TS 20428:2017, 3.34, modified — the term "biological specimen" has been removed.]
3.19
subject of care
person who uses or is a potential user of a health care service
[SOURCE: ISO/TS 22220:2011, 3.2, modified — "Note 1 to entry" and the abbreviated term "SOC" have
been deleted.]
3.20
target capture
method to capture genomic regions of interest from a DNA (3.4) sample prior to sequencing
[SOURCE: ISO/TS 20428:2017, 3.36]
3
© ISO 2023 – All rights reserved PROOF/ÉPREUVE

---------------------- Page: 8 ----------------------
ISO/TS 23357:2023(E)
3.21
variant call format
VCF
format of the text file used in bioinformatics for storing gene (3.8) sequence variations
4 Abbreviated terms
BAM binary alignment map
bp base pair
COSMIC Catalogue of Somatic Mutations in Cancer
CRAM compressed reference-oriented alignment map
EBI European Bioinformatics Institute
HGNC HUGO Gene Nomenclature Committee
HGVS Human Genome Variation Society
HUGO Human Genome Organization
MAF mutation annotation format
NCBI National Center for Biotechnology Information
NGS next-generation sequencing
SAM sequence alignment map
VC
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.