Information technologies — JPEG systems — Part 8: JPEG Snack

This document defines JPEG Snack metadata that enriches a representation of multiple media contents, in order to facilitate sharing, editing, and presentation; it further specifies metadata and container formats for JPEG Snack format.

Technologies de l'information — Systèmes JPEG — Partie 8: JPEG Snack définissant des métadonnées d’enrichissement destinées à faciliter la consommation des contenus JPEG

General Information

Status
Published
Publication Date
02-Feb-2023
Current Stage
6060 - International Standard published
Start Date
03-Feb-2023
Due Date
06-Jul-2023
Completion Date
03-Feb-2023
Ref Project

Buy Standard

Standard
ISO/IEC 19566-8:2023 - Information technologies — JPEG systems — Part 8: JPEG Snack Released:2/3/2023
English language
36 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
REDLINE ISO/IEC PRF 19566-8 - Information technologies — JPEG systems — Part 8: JPEG Snack Released:15. 12. 2022
English language
36 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/IEC PRF 19566-8 - Information technologies — JPEG systems — Part 8: JPEG Snack Released:15. 12. 2022
English language
36 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

INTERNATIONAL ISO/IEC
STANDARD 19566-8
First edition
2023-02
Information technologies — JPEG
systems —
Part 8:
JPEG Snack
Technologies de l'information — Systèmes JPEG —
Partie 8: JPEG Snack définissant des métadonnées d’enrichissement
destinées à faciliter la consommation des contenus JPEG
Reference number
ISO/IEC 19566-8:2023(E)
© ISO/IEC 2023

---------------------- Page: 1 ----------------------
ISO/IEC 19566-8:2023(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2023
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
  © ISO/IEC 2023 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 19566-8:2023(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Overview . 2
4.1 System description . 2
4.2 System decoder model . 3
4.3 Metadata model . 5
4.4 Object-structured file organization . 5
5 Object-structured format . 6
5.1 General . 6
5.2 Object definition . 8
5.2.1 General . 8
5.2.2 Object types and media types . 9
5.2.3 Static objects . 9
5.2.4 Dynamic objects.12
6 Object-composition format .14
6.1 General . 14
6.1.1 Default image . 14
6.1.2 Timeline.15
6.2 Composing objects .15
6.2.1 Temporal relationship between the default image and objects . 17
6.2.2 Spatial relationship between the default image and objects . 17
6.2.3 Layering the objects . 18
6.2.4 Moving the objects . 19
Annex A (normative) Boxes for JPEG Snack .22
Annex B (informative) Container of JPEG Snack .28
Annex C (informative) Usage examples .29
Bibliography .36
iii
© ISO/IEC 2023 – All rights reserved

---------------------- Page: 3 ----------------------
ISO/IEC 19566-8:2023(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance
are described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria
needed for the different types of document should be noted. This document was drafted in
accordance with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives or
www.iec.ch/members_experts/refdocs).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents) or the IEC
list of patent declarations received (see https://patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see
www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
A list of all parts in the ISO/IEC 19566 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.
iv
  © ISO/IEC 2023 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC 19566-8:2023(E)
Introduction
The ISO/IEC 19566 series, on JPEG systems, contributes to the specification of system-level
functionalities.
JPEG Snack is a means to convey relatively simple multimedia experiences which is fundamentally based
on images and the image file format. Many digital storytelling experiences are based on converting
images into video-based technologies, whereas images are directly used in JPEG Snack, along with
playback of other media (video, audio, titles, captions, and effects) coordinated through an explicit
timeline.
v
© ISO/IEC 2023 – All rights reserved

---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO/IEC 19566-8:2023(E)
Information technologies — JPEG systems —
Part 8:
JPEG Snack
1 Scope
This document defines JPEG Snack metadata that enriches a representation of multiple media contents,
in order to facilitate sharing, editing, and presentation; it further specifies metadata and container
formats for JPEG Snack format.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 10918-1, Information technology — Digital compression and coding of continuous-tone still
images: Requirements and guidelines
ISO/IEC 15444-2, Information technology — Part 2: Extensions
ISO/IEC 18477-3, Information technology — Scalable compression and coding of continuous-tone still
images — Part 3: Box file format
ISO/IEC 19566-5, Information technology — Part 5: JPEG Universal Metadata Box Format (JUMBF)
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 10918-1 and
ISO/IEC 18477-3 and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
snack culture
consumption of image-rich media in a short story format
3.2
media type
indicator of the format and content of the file transmitted through the Internet.
3.3
z-order
ordering of overlapping two dimensional regions that define the occlusion precedence amongst them
1
© ISO/IEC 2023 – All rights reserved

---------------------- Page: 6 ----------------------
ISO/IEC 19566-8:2023(E)
4 Overview
This document specifies metadata and formats that enable storing, sharing, and rendering snack
culture contents with JPEG image coding standards.
NOTE The snack culture contents are defined as follows:
— image sequence from which one or more frames are generated by manipulating still images;
— image sequence recorded with a short playing duration, e.g. 1.5 s;
— image sequence with transition effects and/or overlay along with subtitles, audio clips, and graphics.
JPEG Snack is a format that defines the representation of multimedia, such as images, image sequences,
text, audio, and video clips, including transition effects, based on the existing JPEG family image coding
standards. Besides, it supports a timing mechanism to synchronize multimedia with a global timeline
in a context. This mechanism allows users to watch multimedia contents like short-form video clips.
However, unlike conventional video formats, it supports storing images without transcoding from
image to dedicated video codec.
In order to define the functionalities of the JPEG Snack format, this document is organized as follows:
— 4.1 describes the overall system of the JPEG Snack format.
— 4.2 describes the system decoder model.
— 4.3 defines an essential model of metadata to compose the JPEG Snack format.
— Clauses 5 and 6 describe the JPEG Snack format in detail.
— Annexes A to C explain how the metadata is serialized and describe the formation of the JPEG Snack
file and its usage examples.
4.1 System description
This document specifies metadata and its behaviour to compose the JPEG Snack content by
synchronizing multimedia on the decoder side. This document primarily defines a metadata model
consisting of two formats:
— Object-structured format: describes the content and additional behaviours of the objects are
structured in the object-composition description.
— Object-composition format: describes the positional and temporal relationships between objects
and the composition of the objects onto the decoder display.
Its hierarchical structure of the JPEG Snack format is depicted in Figure 4.1.
2
  © ISO/IEC 2023 – All rights reserved

---------------------- Page: 7 ----------------------
ISO/IEC 19566-8:2023(E)
Figure 4.1 — Overview of the JPEG Snack format
The JPEG Snack format provides information that enables JPEG Snack applications to share and render
media contents by accessing the objects in the file or reference to objects contained in other files. All
objects are not necessarily embedded in the same file. Each object constituting a JPEG Snack file is
structured using a box defined in ISO/IEC 19566 and stored into a JPEG image file.
The object-structured format defines the appearance and behaviour of the individual object. This
format includes the size and opacity of the object, movement information in a given timeline of the
representation, and information on the location where the media data, such as an image codestream, is
found (see Clause 5).
The object-composition format identifies the objects that compose the representation and defines each
object’s creation and destruction. This format describes the temporal and spatial relationship between
objects by providing information on the time and position of the individual object to show, and the time
and position of their disappearance. Each object has independent position information on the decoder
screen, and the composition information determines the z-order of the objects displayed to the user
(see Clause 6).
4.2 System decoder model
A JPEG Snack decoder implements the metadata model described in 4.1. The decoder has three
conceptual necessary components: default image, timeline, and layer and position, as depicted in
Figure 4.2. The decoder decodes the JPEG image to prepare a default image and compose a JPEG Snack
representation with several objects using this default image as a background. Since the JPEG Snack is
created by defining when, where and how objects are composed, the decoder shall handle timeline,
layer, and position.
3
© ISO/IEC 2023 – All rights reserved

---------------------- Page: 8 ----------------------
ISO/IEC 19566-8:2023(E)
Figure 4.2 — Overview of the JPEG Snack decoder
This document defines the formats based on the informative system decoder model of JPEG Snack, as
depicted in Figure 4.3, to allow various JPEG image coding standards to represent JPEG Snack contents
in a concerted way. Figure 4.3 illustrates an example of the JPEG Snack decoder in which the formats
defined in 4.1 may be implemented.
In Figure 4.3, the object composer receives a JPEG codestream that contains metadata and media data
through the JUMBF parser, constructs the JPEG Snack representation, invokes media decoders to decode
its media data from the codestream, and renders the media content decoded to the output devices. The
object composer controls the media decoder and compositor to decode and display its media content
regarding time and position appropriately. This version of the document allows images, captions, image
sequences, audio clips, video clips to be composed in a representation of JPEG Snack.

a d
Metadata. Position + z-order.
b e
Media data. Media output.
c
Media format + time.
Figure 4.3 — Overview of the system decoder model for JPEG Snack
4
  © ISO/IEC 2023 – All rights reserved

---------------------- Page: 9 ----------------------
ISO/IEC 19566-8:2023(E)
4.3 Metadata model
The system decoder model described in 4.2 is based on the JPEG Snack format depicted in Figure 4.1 to
support the playback of JPEG Snack contents being constituted by multiple media contents.
The metadata is a hierarchical model, as illustrated in Figure 4.4, containing multiple object metadata
(see Clause 6) aligned with composition metadata corresponded to the object-composition format.
Within the object metadata corresponded to the object-structured format, properties (see Annex A)
composing the objects into a representation of the JPEG Snack format such as position, time, and
transition are contained. Each object may be rendered individually in a logical timeline of the decoder
to support re-editing the object; for example, a user may choose a specific object to hide in his/her JPEG
Snack viewer.
Figure 4.4 — High-level metadata model of JPEG Snack
Object metadata specifies the content and additional behaviour of the individual objects that compose
the representation and identifies where the object’s resides. An ID is an identifier of the object in the
representation and a Type attribute allows a decoder to recognize properties of the object proactively.
Composition metadata coordinates the objects composing a JPEG Snack representation. The objects are
arranged into Objects within a composition along with position and time with an identifier attribute.
A Position property determines where the object pointed to by the ObjectID is placed. When objects
are overlapped according to the Position property, the Time and Persistency properties organize the
objects to be placed in front or behind the other object (see 6.2).
JPEG Snack shall have only one composition metadata consisting of one or more objects within a scope
of the JPEG Snack file.
The JPEG Snack decoder described in 4.2 composes a timeline (see 6.1.2) for playback of the JPEG
Snack content by combining the Time information of all objects, and they exist in the representation
individually using their Position and Time information.
4.4 Object-structured file organization
An object in the file organization is a JUMBF box. The JPEG Snack files are formed as a series of boxes.
All metadata is contained in boxes, as illustrated in Figure 4.5. JUMBF boxes for JPEG Snack contains
metadata to compose the JPEG Snack representation, and other types of JUMBF box are used to deliver
the media content, such as a codestream and XML document for each object. The boxes shall be
embedded as defined in Annex A and ISO/IEC 19566-5.
5
© ISO/IEC 2023 – All rights reserved

---------------------- Page: 10 ----------------------
ISO/IEC 19566-8:2023(E)
Figure 4.5 — Organization of the JPEG Snack file
The JPEG Snack format provides information to define the metadata for composing the representation
and the format in which the metadata is structured in the JPEG image files. The JPEG Snack file has a
different file extension according to the default codestream. Conventional JPEG decoders may ignore
JUMBF boxes for the JPEG. For example, if the JPEG Snack metadata is embedded in the file of the
ISO/IEC 10918-1, denoted by JPEG-1, the extension of the JPEG Snack file is ‘.jpg’ like conventional JPEG-
1 images while the conventional JPEG-1 decoder decodes only the default codestream. This feature
provides compatibility to the existing JPEG image coding standards, including future standards based
on the box-based format.
NOTE 1 The default codestream is placed at the end of the file to be compatible with the conventional JPEG
image coding standards. For example, the JPEG-1 decoder can ignore any extra data beyond the EOI (end of
image) marker.
NOTE 2 Codestream is a sequence of bits representing a compressed image and associated metadata.
In addition, content types of which is indicated by the object metadata may be different JUMBF boxes
based on the object type. The object may refer to JUMBF boxes for media data embedded in another file.
The referencing shall be done as defined in ISO/IEC 19566-5:2019, Annex C.
5 Object-structured format
5.1 General
As described in Clause 4, in the JPEG Snack format, the representation of the JPEG Snack is composed of
a group of media contents. The object in this document is a unit that composes a JPEG Snack format and
contains information to represent the media contents.
Figures 5.1 and 5.2 illustrate the roles of the object-composition and object-structured formats to
compose JPEG Snack representation. The object-composition format (see Clause 6) provides composition
information to define when and where the objects that are constructed will appear and disappear in
a representation, whereas the object-structured format signals information on the individual object’s
behaviour and location of the resource. In Figure 4.3, while the object composer manages instances of
the object, the decoding of the individual object is conducted independently by the media decoder. The
6
  © ISO/IEC 2023 – All rights reserved

---------------------- Page: 11 ----------------------
ISO/IEC 19566-8:2023(E)
object composer informs the compositor z-order and movement information of the object. Then the
compositor renders the decoded media data accordingly based on the z-order and position information.
NOTE An invisible object, such as an audio clip, does not have z-order and position information. And a
description of spatial audio is not included in this document, whereas it is considered as a typical audio clip.
a)  Composition of objects at t b)  Composition of objects at t
0 1
Key
t time when the representation is started
0
t time when the representation is ended
1
a
Origin.
b
Representation.
Figure 5.1 — Example of the object-composition format. The t is a time when the
0
representation is started and the t is when the representation is ended.
1
In the example of Figure 5.1, object 2 is above object 1 so that object 1 has an occluded region. Also,
object 3 has an occluded region beyond a representation. The object composer shall handle these
regions smoothly. For object 4, the duration of existence is shorter than the JPEG Snack's total duration.
See Clause 6.1.2 for more details on the temporal composition of objects.
a) Objects’ shape at t b) Objects’ shape at t
0 1
7
© ISO/IEC 2023 – All rights reserved

---------------------- Page: 12 ----------------------
ISO/IEC 19566-8:2023(E)
c) Object’s movement from at t to t
0 1

a d
Origin. Height.
b e
Representation. Width.
c f
Hidden region. Object’s origin.
Figure 5.2 — Example of the object’s movement. c) exemplifies a moving object by instruction
set.
In Figure 5.2, object 3 moves to another position as time goes from t to t . The movement of the object
0 1
shall be defined by instruction sets as depicted in Figure 5.2 c). Details on the mechanism of moving
objects are described in subclause 6.2.4.
NOTE Objects 1, 2, 3, and 4 in Figure 5.1 correspond to the rectangular, star, circle, and caption in Figure 5.2,
respectively.
5.2 Object definition
5.2.1 General
This subclause defines the object-structured format that specifies the semantics of the objects that
compose a JPEG Snack representation, and the syntax is defined in Annex A. Table 5.1 describes the
semantics of the object with attributes and elements to define the media content and the object’s
properties within the representation.
Attributes and elements define the shape and behaviour of the media content, which is an object
that makes up the representation, and determine the object type as listed in Table 5.1 (see 5.2.3 and
5.2.4). Attributes contain media content information that can identify an object, and elements define
properties for rendering the object on the representation. Table 5.1 describes the meaning of the
parameters constituting an object.
Table 5.1 — Semantics of the object
Attribute name Description
Id An identifier for this object which is a non-zero 8-bit integer. This shall be unique in
a JPEG Snack file.
Type A string in UTF-8 characters. Either ‘static’ or ‘dynamic’ shall be defined.
Number of media An integer declares a number of the media content corresponding to this object. In
the case of the dynamic object, for example, consecutive image sequences shall be
identified in the scope of a single object.
Media type An identifier for the media contents of the object. When the number of media is greater
than 1, the corresponding media contents shall be the same media type.
Element name Description
8
  © ISO/IEC 2023 – All rights reserved

---------------------- Page: 13 ----------------------
ISO/IEC 19566-8:2023(E)
TTabablele 5 5.11 ((ccoonnttiinnueuedd))
Style An identifier provides a unique address where a resource can be found. For more
details see ISO/IEC 19566-5.
The resource provides an additional style of the object, such as transition and font-fam-
[1]
ily. For more details see Cascading Style Sheets (CSS) specification .
Opacity A floating-point number range 0–1 that provides a condition of being transparent. 1
means that this object is fully opaque.
Location An identifier provides a unique address where a resource can be found.
For more details see ISO/IEC 19566-5.
5.2.2 Object types and media types
The type of the object shall be determined by the media type of the content. This document is for the
object types listed in Table 5.2, and the content type of the object is designated using media types, as
described in Table 5.2. Definitions of the object type are also provided in the table.
Table 5.2 — Supported media types
Object type Media type Description
Image All media of type image as listed A still image.
in the IANA Media Type Regis-
See 5.2.3.1.
[1]
try
Caption text/markdown A piece of text for additional information.
See 5.2.3.2.
Pointer All media of type image as listed A graphical indicator used to arouse attention to the
in the IANA Media Type Regis- region of interest that works as a presentation pointer.
[1]
try
See 5.2.3.3.
Image sequence All media of type image as listed A set of consecutive still images.
in the IANA Media Type Regis-
See 5.2.4.1.
[1]
try
Video clip All media of type video as listed in A briefly recorded file used to convey audio-visual
[1]
the IANA Media Type Registry information to the audience.
See 5.2.4.2.
Audio clip All media of type audio as listed in A briefly recorded file used to convey audible informa-
[1]
the IANA Media Type Registry tion to the audience.
See 5.2.4.3.
When a JPEG Snack decoder does not support a media type, the corresponding object is ignored, and the
decoder shall inform that the object is missing in the representation.
JPEG Snack differentiates image and image sequence while both objects use the same media type. Even
though several images are contained in a single JPEG Snack representation, those images are relatively
less correlated in the context of the representation, which means that occlusion or exclusion of some
images out of several images do not harm the representation. However, images in a sequence are highly
correlated to create a valid context of the representation. For example, when one of the frames in
animation is skipped, the animation may look strange.
In this document, objects are categorized into static and dynamic objects based on if the object’s
contents are changing during the representation as time goes. Details on the object are defined in 5.2.3
and 5.2.4.
5.2.3 Static objects
This version of the document defines image, caption, and pointer as static objects. The value of the type
attribute shall be a 'static' and number of media shall be 1, while other attributes and elements vary
9
© ISO/IEC 2023 – All rights reserved

---------------------- Page: 14 ----------------------
ISO/IEC 19566-8:2023(E)
to the content of the object, as described in Table 5.3. Style and opacity elements are optional. If those
elements are absent, the object has a fixed position and fully opaque in the representation.
Table 5.3 — Semantics of the static object
Attribute name Value Description
Id 1…N Shall be provided.
Type ‘static’
Number of media 1
...

© ISO #### – All rights reserved
INTERNATIONAL ORGANISATION FOR STANDARDISATION
INTERNATIONAL ELECTROTECHNICAL COMMISSION


ISO/IEC 19566-8: JPEG Snack (X)
ISO/IEC JTC 1/SC 29/WG 1
Secretariat: Japan JISC
Date: 2022-12-14
Information technologies — JPEG systems — —
Part 8:
JPEG Snack
Technologies de l'information — Systèmes JPEG —
Partie 8: JPEG SnackMétadonnées d’enrichissement destinées à faciliter la consommation des
contenus JPEG
ISFDIS stage

---------------------- Page: 1 ----------------------
ISO #####-#:####(X)
© ISO/IEC 2022
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this
publication may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical,
including photocopying, or posting on the internet or an intranet, without prior written permission. Permission
can be requested from either ISO at the address below or ISO’s member body in the country of the requester.

   ISO copyright office
   CP 401 • Ch. de Blandonnet 8
   CH-1214 Vernier, Geneva
   Phone: +41 22 749 01 11
   Fax: +41 22 749 09 47
   Email: copyright@iso.org
   Website: www.iso.org

ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: + 41 22 749 01 11
E-mail: copyright@iso.org
Website: www.iso.org
Published in Switzerland
2 © ISO #### – All rights reserved

---------------------- Page: 2 ----------------------
ISO #####-#:####(X)
Contents
Foreword . 5
Introduction . 6
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Overview . 2
4.1 System description . 2
4.2 System decoder model . 4
4.3 Metadata model . 5
4.4 Object-structured file organization . 6
5 Object-structured format . 8
5.1 General . 8
5.2 Object definition . 9
5.2.1 General. 9
5.2.2 Object types and media types . 10
5.2.3 Static objects . 11
5.2.4 Dynamic objects . 14
6 Object-composition format . 16
6.1 General . 16
6.1.1 Default image . 16
6.1.2 Timeline . 16
6.2 Composing objects . 17
6.2.1 Temporal relationship between the default image and objects . 18
6.2.2 Spatial relationship between the default image and objects . 19
6.2.3 Layering the objects . 20
6.2.4 Moving the objects . 22
Annex A (normative) Boxes for JPEG Snack . 25
Annex B (informative) Container of JPEG Snack . 32
Annex C (informative) Usage examples . 34
Bibliography . 43

© ISO #### – All rights reserved 3

---------------------- Page: 3 ----------------------
ISO #####-#:####(X)
4 © ISO #### – All rights reserved

---------------------- Page: 4 ----------------------
ISO #####-#:####(X)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directiveswww.iso.org/directives
or www.iec.ch/members_experts/refdocs). Field Code Changed
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents) or the
IEC list of patent declarations received (see https://patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see
www.iso.org/iso/foreword.html.www.iso.org/iso/foreword.html. In the IEC, see
www.iec.ch/understanding-standards. Field Code Changed
This document was prepared by Joint Technical Committee ISO/JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
A list of all parts in the ISO/IEC 19566 series can be found on the ISO websiteand IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards body.
A complete listing of these bodies can be found at
www.iso.org/members.htmlwww.iso.org/members.html and www.iec.ch/national-committees. Field Code Changed
© ISO #### – All rights reserved 5

---------------------- Page: 5 ----------------------
ISO #####-#:####(X)
Introduction
The ISO/IEC 19566 series, on JPEG systems, contributes to the specification of system-level
functionalities.
JPEG Snack is a means to convey relatively simple multimedia experiences which is fundamentally
based on images and the image file format. Many digital storytelling experiences are based on
converting images into video-based technologies, whereas images are directly used in JPEG Snack,
along with playback of other media (video, audio, titles, captions, and effects) coordinated through
an explicit timeline.


6 © ISO #### – All rights reserved

---------------------- Page: 6 ----------------------
ISO #####-#:####(X/IEC PRF 19566-8:2022(E)

© ISO #### /IEC 2022 – All rights reserved vii
vii

---------------------- Page: 7 ----------------------
ISO/IEC PRF 19566-8:2022(E)
Information technologies — JPEG systems — —
Part 8:
JPEG Snack
1 Scope
This document defines JPEG Snack metadata that enriches a representation of multiple media contents,
in order to facilitate sharing, editing, and presentation; it further specifies metadata and container
formats for JPEG Snack format.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 10918-1, Information technology — Part 1: Digital compression and coding of continuous-tone
still images —: Requirements and guidelines.
ISO/IEC ISO/IEC 15444-2, Information technology — Part 2: Extensions
ISO/IEC 18477-3, Information technology — Scalable compression and coding of continuous-tone still
images — Part 3: Box file format
ISO/IEC 19566-5, Information technology — Part 5: JPEG Universal Metadata Box Format (JUMBF)
ISO/IEC 15444-2, Information technology — Part 2: Extensions



3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 10918-1and1 and ISO/IEC
18477-3 and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— — ISO Online browsing platform: available at https://www.iso.org/obphttps://www.iso.org/obp
— — IEC Electropedia: available at https://www.electropedia.org/https://www.electropedia.org/
3.1
snack culture
consumption of image-rich media in a short story format
3.2
media type
indicator of the format and content of the file transmitted through the Internet.
© ISO #### /IEC 2022 – All rights reserved 1
1

---------------------- Page: 8 ----------------------
ISO/IEC PRF 19566-8:2022(E)
3.3
z-order
ordering of overlapping two dimensional regions that define the occlusion precedence amongst them

4 Overview
This document specifies metadata and formats that enable storing, sharing, and rendering snack culture
contents with JPEG image coding standards.
NOTE: The snack culture contents are defined as follows:
— — image sequence from which one or more frames are generated by manipulating still images;
— — image sequence recorded with a short playing duration, e.g. 1.5 s;
— image sequence with transition effects and/or overlay along with subtitles, audio clips,
— and graphics.

JPEG Snack is a format that defines the representation of multimedia, such as images, image sequences,
text, audio, and video clips, including transition effects, based on the existing JPEG family image coding
standards. Besides, it supports a timing mechanism to synchronize multimedia with a global timeline in
a context. This mechanism allows users to watch multimedia contents like short-form video clips.
However, unlike conventional video formats, it supports storing images without transcoding from image
to dedicated video codec.
In order to define the functionalities of the JPEG Snack format, this document is organized as follows:
— — 4.14.1 describes the overall system of the JPEG Snack format.
— — 4.24.2 describes the system decoder model.
— — 4.34.3 defines an essential model of metadata to compose the JPEG Snack format.
— — Clauses 5Clauses 5 and 66 describe the JPEG Snack format in detail.
— — Annexes A to CAnnexes A to C explain how the metadata is serialized and describe the formation
of the JPEG Snack file and its usage examples.

4.1 System description
This document specifies metadata and its behaviour to compose the JPEG Snack content by synchronizing
multimedia on the decoder side. This document primarily defines a metadata model consisting of two
formats:
— — Object-structured format: describes the content and additional behaviours of the objects are
structured in the object-composition description.
— — Object-composition format: describes the positional and temporal relationships between objects
and the composition of the objects onto the decoder display.
Its hierarchical structure of the JPEG Snack format is depicted in Figure 4.1.Figure 4.1.
22 © ISO #### /IEC 2022 – All rights reserved

---------------------- Page: 9 ----------------------
ISO/IEC PRF 19566-8:2022(E)


Figure 4.1 — Overview of the JPEG Snack format

The JPEG Snack format provides information that enables JPEG Snack applications to share and render
media contents by accessing the objects in the file or reference to objects contained in other files. All
objects are not necessarily embedded in the same file. Each object constituting a JPEG Snack file is
structured using a box defined in ISO/IEC 19566 and stored into a JPEG image file.
The object-structured format defines the appearance and behaviour of the individual object. This format
includes the size and opacity of the object, movement information in a given timeline of the
© ISO #### /IEC 2022 – All rights reserved 3
3

---------------------- Page: 10 ----------------------
ISO/IEC PRF 19566-8:2022(E)
representation, and information on the location where the media data, such as an image codestream, is
found (see Clause 5). Clause 5).
The object-composition format identifies the objects that compose the representation and defines each
object’s creation and destruction. This format describes the temporal and spatial relationship between
objects by providing information on the time and position of the individual object to show, and the time
and position of their disappearance. Each object has independent position information on the decoder
screen, and the composition information determines the z-order of the objects displayed to the user (see
Clause 6Clause 6).
4.2 System decoder model
A JPEG Snack decoder implements the metadata model described in 4.1.4.1. The decoder has three
conceptual necessary components: default image, timeline, and layer and position, as depicted in Figure
4.2.Figure 4.2. The decoder decodes the JPEG image to prepare a default image and compose a JPEG Snack
representation with several objects using this default image as a background. Since the JPEG Snack is
created by defining when, where and how objects are composed, the decoder shall handle timeline, layer,
and position.



Figure 4.2— — Overview of the JPEG Snack decoder
This document defines the formats based on the informative system decoder model of JPEG Snack, as
depicted in Figure 4.3,Figure 4.3, to allow various JPEG image coding standards to represent JPEG Snack
contents in a concerted way. Figure 4.3Figure 4.3 illustrates an example of the JPEG Snack decoder in
which the formats defined in 4.14.1 may be implemented.
In Figure 4.3,In Figure 4.3, the object composer receives a JPEG codestream that contains metadata and
media data through the JUMBF parser, constructs the JPEG Snack representation, invokes media decoders
to decode its media data from the codestream, and renders the media content decoded to the output
devices. The object composer controls the media decoder and compositor to decode and display its media
44 © ISO #### /IEC 2022 – All rights reserved

---------------------- Page: 11 ----------------------
ISO/IEC PRF 19566-8:2022(E)
content regarding time and position appropriately. This version of the document allows images, captions,
image sequences, audio clips, video clips to be composed in a representation of JPEG Snack.

Key


a d
Metadata. Position + z-order.
b e
Media data. Media output.
c
Media format + time.

Figure 4.3— — Overview of the system decoder model for JPEG Snack

4.3 Metadata model
The system decoder model described in 4.24.2 is based on the JPEG Snack format depicted in Figure
4.1Figure 4.1 to support the playback of JPEG Snack contents being constituted by multiple media
contents.
© ISO #### /IEC 2022 – All rights reserved 5
5

---------------------- Page: 12 ----------------------
ISO/IEC PRF 19566-8:2022(E)
The metadata is a hierarchical model, as illustrated in Figure 4.4,Figure 4.4, containing multiple object
metadata (see Clause 6)Clause 6) aligned with composition metadata corresponded to the object-
composition format. Within the object metadata corresponded to the object-structured format,
properties (see Annex A)Annex A) composing the objects into a representation of the JPEG Snack format
such as position, time, and transition are contained. Each object may be rendered individually in a logical
timeline of the decoder to support re-editing the object; for example, a user may choose a specific object
to hide in his/her JPEG Snack viewer.


Figure 4.4 — High-level metadata model of JPEG Snack
Object metadata specifies the content and additional behaviour of the individual objects that compose
the representation, and identifies where the object’s resides. An ID is an identifier of the object in the
representation, and a Type attribute allows a decoder to recognize properties of the object proactively.
Composition metadata coordinates the objects composing a JPEG Snack representation. The objects are
arranged into Objects within a composition along with position and time with an identifier attribute. A
Position property determines where the object pointed to by the ObjectID is placed. When objects are
overlapped according to the Position property, the Time and Persistency properties organize the objects
to be placed in front or behind the other object (see 6.26.2).
JPEG Snack shall have only one composition metadata consisting of one or more objects within a scope of
the JPEG Snack file.
The JPEG Snack decoder described in 4.24.2 composes a timeline (see 6.1.26.1.2) for playback of the JPEG
Snack content by combining the Time information of all objects, and they exist in the representation
individually using their Position and Time information.
4.4 Object-structured file organization
An object in the file organization is a JUMBF box. The JPEG Snack files are formed as a series of boxes. All
metadata is contained in boxes, as illustrated in Figure 4.5.Figure 4.5. JUMBF boxes for JPEG Snack
contains metadata to compose the JPEG Snack representation, and other types of JUMBF box are used to
deliver the media content, such as a codestream and XML document for each object. The boxes shall be
embedded as defined in Annex AAnnex A and ISO/IEC 19566-5.
66 © ISO #### /IEC 2022 – All rights reserved

---------------------- Page: 13 ----------------------
ISO/IEC PRF 19566-8:2022(E)


Figure 4.5 — Organization of the JPEG Snack file
The JPEG Snack format provides information to define the metadata for composing the representation
and the format in which the metadata is structured in the JPEG image files. The JPEG Snack file has a
different file extension according to the default codestream. Conventional JPEG decoders may ignore
JUMBF boxes for the JPEG. For example, if the JPEG Snack metadata is embedded in the file of the ISO/IEC
10918-1, denoted by JPEG-1, the extension of the JPEG Snack file is ‘.jpg’ like conventional JPEG-1 images
while the conventional JPEG-1 decoder decodes only the default codestream. This feature provides
compatibility to the existing JPEG image coding standards, including future standards based on the box-
based format.
NOTE 1 The default codestream is placed at the end of the file to be compatible with the conventional JPEG image
coding standards. For example, the JPEG-1 decoder can ignore any extra data beyond the EOI (end of image) marker.
NOTE 2 Codestream is a sequence of bits representing a compressed image and associated metadata.
© ISO #### /IEC 2022 – All rights reserved 7
7

---------------------- Page: 14 ----------------------
ISO/IEC PRF 19566-8:2022(E)
In addition, content types of which is indicated by the object metadata may be different JUMBF boxes
based on the object type. The object may refer to JUMBF boxes for media data embedded in another file.
The referencing shall be done as defined in ISO/IEC 19566-5:2019, Annex C.

5 Object-structured format
5.1 General
As described in Clause 4,Clause 4, in the JPEG Snack format, the representation of the JPEG Snack is
composed of a group of media contents. The object in this document is a unit that composes a JPEG Snack
format and contains information to represent the media contents.
Figures 5.1 and 5.2Figures 5.1 and 5.2 illustrate the roles of the object-composition and object-structured
formats to compose JPEG Snack representation. The object-composition format (see Clause 6Clause 6)
provides composition information to define when and where the objects that are constructed will appear
and disappear in a representation, whereas the object-structured format signals information on the
individual object’s behavior and location of the resource. In Figure 4.3,In Figure 4.3, while the object
composer manages instances of the object, the decoding of the individual object is conducted
independently by the media decoder. The object composer informs the compositor z-order and
movement information of the object. Then the compositor renders the decoded media data accordingly
based on the z-order and position information.
NOTE An invisible object, such as an audio clip, does not have z-order and position information. And a
description of spatial audio is not included in this document, whereas it is considered as a typical audio clip.


a)  Composition of objects at t b)  Composition of objects at t
0 1
Key
t0 time when the representation is started
t1 time when the representation is ended
a
Origin.
b
Representation.

Figure 5.1 — Example of the object-composition format. The t is a time when the
0
representation is started and the t is when the representation is ended.
1
In the example of Figure 5.1,Figure 5.1, object 2 is above object 1 so that object 1 has an occluded region.
Also, object 3 has an occluded region beyond a representation. The object composer shall handle these
regions smoothly. For object 4, the duration of existence is shorter than the JPEG Snack's total duration.
See Clause 6.1.2Clause 6.1.2 for more details on the temporal composition of objects.

88 © ISO #### /IEC 2022 – All rights reserved

---------------------- Page: 15 ----------------------
ISO/IEC PRF 19566-8:2022(E)
d Height Deleted Cells
a Origin.
Deleted Cells


Representa e Width Deleted Cells
a) Objects’ shape at t b) Objects’ shape at t b
0 1
tion.
Deleted Cells
Inserted Cells

c Hidden f
Deleted Cells
c) Object’s originmovement from at t to t
0 1
region.
Deleted Cells

Deleted Cells

a d
Origin. Height.
b e
Representation. Width.
c f
Hidden region. Object’s origin.
Figure 5.2 — Example of the object’s movement. (c) exemplifies a moving object by instruction
set.
In Figure 5.2,In Figure 5.2, object 3 moves to another position as time goes from t to t . The movement
0 1
of the object shall be defined by instruction sets as depicted in Figure 5.2(c).Figure 5.2 c). Details on the
mechanism of moving objects are described in clause 6.2.4subclause 6.2.4.

NOTE Objects 1, 2, 3, and 4 in Figure 5.1 correspond to the rectangular, star, circle, and caption in Figure 5.2,
respectively.
5.2 Object definition
5.2.1 General
This subclause defines the object-structured format that specifies the semantics of the objects that
compose a JPEG Snack representation, and the syntax is defined in Annex A. Table 5.1Annex A. Table 5.1
© ISO #### /IEC 2022 – All rights reserved 9
9

---------------------- Page: 16 ----------------------
ISO/IEC PRF 19566-8:2022(E)
describes the semantics of the object with attributes and elements to define the media content and the
object’s properties within the representation.
Attributes and elements define the shape and behaviour of the media content, which is an object that
makes up the representation, and determine the object type as listed in Table 5.1 (see 5.2.3 and
5.2.4).Table 5.1 (see 5.2.3 and 5.2.4). Attributes contain media content information that can identify an
object, and elements define properties for rendering the object on the representation. Table 5.1Table 5.1
describes the meaning of the parameters constituting an object.
Table 5.1 — Semantics of the object
Attribute name Description
Id An identifier for this object which is a non-zero 8-bit integer. This shall be unique
in a JPEG Snack file.
Type A string in UTF-8 characters. Either ‘static’ or ‘dynamic’ shall be defined.
Number of media An integer declares a number of the media content corresponding to this object. In
the case of the dynamic object, for example, consecutive image sequences shall be
identified in the scope of a single object.
Media type An identifier for the media contents of the object. When the number of media is
greater than 1, the corresponding media contents shall be the same media type.
Element name Description
Style An identifier provides a unique address where a resource can be found. For more
details see ISO/IEC 19566-5.
The resource provides an additional style of the object, such as transition and font-
[1] [1]
family. For more details see Cascading Style Sheets (CSS) specification . .
Opacity A floating-point number range 0–1 that provides a condition of being transparent.
1 means that this object is fully opaque.
Location An identifier provides a unique address where a resource can be found.
For more details see ISO/IEC 19566-5.

5.2.2 Object types and media types
The type of the object shall be determined by the media type of the content. This document is for the
object types listed in Table 5.2,Table 5.2, and the content type of the object is designated using media
types, as described in Table 5.2.Table 5.2. Definitions of the object type are also provided in the table.
Table 5.2 — Supported media types
Object type Media type Description
Image All media of type image as listed A still image.
in the IANA Media Type
See 5.2.3.1.See 5.2.3.1.
[1][1]
Registry
Caption text/markdown A piece of text for additional information.
See 5.2.3.2.See 5.2.3.2.
Pointer All media of type image as listed A graphical indicator used to arouse attention to the
in the IANA Media Type region of interest that works as a presentation
[1][1]
Registry pointer.
See 5.2.3.3.See 5.2.3.3.
Image sequence All media of type image as listed A set of consecutive still images.
in the IANA Media Type
See 5.2.4.1.See 5.2.4.1.
[1][1]
Registry
Video clip All media of type video as listed A briefly recorded file used to convey audio-visual
in the IANA Media Type information to the audience.
[1][1]
Registry
See 5.2.4.2.See 5.2.4.2.
1010 © ISO #### /IEC 2022 – All rights reserved

---------------------- Page: 17 ----------------------
ISO/IEC PRF 19566-8:2022(E)
Object type Media type Description
Audio clip All media of type audio as listed A briefly recorded file used to convey audible
in the IANA Media Type information to the audience.
[1][1]
Registry
See 5.2.4.3.See 5.2.4.3.

When a JPEG Snack decoder does not support a media type, the corresponding object is ignored, and the
decoder shall inform that the object is missing in the representation.
JPEG Snack differentiates image and image sequence while both objects use the same media type. Even
t
...

INTERNATIONAL ISO/IEC
STANDARD 19566-8
First edition
Information technologies — JPEG
systems —
Part 8:
JPEG Snack
Technologies de l'information — Systèmes JPEG —
Partie 8: Métadonnées d’enrichissement destinées à faciliter la
consommation des contenus JPEG
PROOF/ÉPREUVE
Reference number
ISO/IEC 19566-8:2022(E)
© ISO/IEC 2022

---------------------- Page: 1 ----------------------
ISO/IEC 19566-8:2022(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2022
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
PROOF/ÉPREUVE © ISO/IEC 2022 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 19566-8:2022(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Overview . 2
4.1 System description . 2
4.2 System decoder model . 3
4.3 Metadata model . 5
4.4 Object-structured file organization . 5
5 Object-structured format . 6
5.1 General . 6
5.2 Object definition . 8
5.2.1 General . 8
5.2.2 Object types and media types . 9
5.2.3 Static objects . 9
5.2.4 Dynamic objects.12
6 Object-composition format .14
6.1 General . 14
6.1.1 Default image . 14
6.1.2 Timeline.15
6.2 Composing objects .15
6.2.1 Temporal relationship between the default image and objects . 17
6.2.2 Spatial relationship between the default image and objects . 17
6.2.3 Layering the objects . 18
6.2.4 Moving the objects . 19
Annex A (normative) Boxes for JPEG Snack .22
Annex B (informative) Container of JPEG Snack .28
Annex C (informative) Usage examples .29
Bibliography .36
iii
© ISO/IEC 2022 – All rights reserved PROOF/ÉPREUVE

---------------------- Page: 3 ----------------------
ISO/IEC 19566-8:2022(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance
are described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria
needed for the different types of document should be noted. This document was drafted in
accordance with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives or
www.iec.ch/members_experts/refdocs).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents) or the IEC
list of patent declarations received (see https://patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see
www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
A list of all parts in the ISO/IEC 19566 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.
iv
PROOF/ÉPREUVE © ISO/IEC 2022 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC 19566-8:2022(E)
Introduction
The ISO/IEC 19566 series, on JPEG systems, contributes to the specification of system-level
functionalities.
JPEG Snack is a means to convey relatively simple multimedia experiences which is fundamentally based
on images and the image file format. Many digital storytelling experiences are based on converting
images into video-based technologies, whereas images are directly used in JPEG Snack, along with
playback of other media (video, audio, titles, captions, and effects) coordinated through an explicit
timeline.
v
© ISO/IEC 2022 – All rights reserved PROOF/ÉPREUVE

---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO/IEC 19566-8:2022(E)
Information technologies — JPEG systems —
Part 8:
JPEG Snack
1 Scope
This document defines JPEG Snack metadata that enriches a representation of multiple media contents,
in order to facilitate sharing, editing, and presentation; it further specifies metadata and container
formats for JPEG Snack format.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 10918-1, Information technology — Digital compression and coding of continuous-tone still
images: Requirements and guidelines
ISO/IEC 15444-2, Information technology — Part 2: Extensions
ISO/IEC 18477-3, Information technology — Scalable compression and coding of continuous-tone still
images — Part 3: Box file format
ISO/IEC 19566-5, Information technology — Part 5: JPEG Universal Metadata Box Format (JUMBF)
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 10918-1 and
ISO/IEC 18477-3 and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
snack culture
consumption of image-rich media in a short story format
3.2
media type
indicator of the format and content of the file transmitted through the Internet.
3.3
z-order
ordering of overlapping two dimensional regions that define the occlusion precedence amongst them
1
© ISO/IEC 2022 – All rights reserved PROOF/ÉPREUVE

---------------------- Page: 6 ----------------------
ISO/IEC 19566-8:2022(E)
4 Overview
This document specifies metadata and formats that enable storing, sharing, and rendering snack
culture contents with JPEG image coding standards.
NOTE The snack culture contents are defined as follows:
— image sequence from which one or more frames are generated by manipulating still images;
— image sequence recorded with a short playing duration, e.g. 1.5 s;
— image sequence with transition effects and/or overlay along with subtitles, audio clips, and graphics.
JPEG Snack is a format that defines the representation of multimedia, such as images, image sequences,
text, audio, and video clips, including transition effects, based on the existing JPEG family image coding
standards. Besides, it supports a timing mechanism to synchronize multimedia with a global timeline
in a context. This mechanism allows users to watch multimedia contents like short-form video clips.
However, unlike conventional video formats, it supports storing images without transcoding from
image to dedicated video codec.
In order to define the functionalities of the JPEG Snack format, this document is organized as follows:
— 4.1 describes the overall system of the JPEG Snack format.
— 4.2 describes the system decoder model.
— 4.3 defines an essential model of metadata to compose the JPEG Snack format.
— Clauses 5 and 6 describe the JPEG Snack format in detail.
— Annexes A to C explain how the metadata is serialized and describe the formation of the JPEG Snack
file and its usage examples.
4.1 System description
This document specifies metadata and its behaviour to compose the JPEG Snack content by
synchronizing multimedia on the decoder side. This document primarily defines a metadata model
consisting of two formats:
— Object-structured format: describes the content and additional behaviours of the objects are
structured in the object-composition description.
— Object-composition format: describes the positional and temporal relationships between objects
and the composition of the objects onto the decoder display.
Its hierarchical structure of the JPEG Snack format is depicted in Figure 4.1.
2
PROOF/ÉPREUVE © ISO/IEC 2022 – All rights reserved

---------------------- Page: 7 ----------------------
ISO/IEC 19566-8:2022(E)
Figure 4.1 — Overview of the JPEG Snack format
The JPEG Snack format provides information that enables JPEG Snack applications to share and render
media contents by accessing the objects in the file or reference to objects contained in other files. All
objects are not necessarily embedded in the same file. Each object constituting a JPEG Snack file is
structured using a box defined in ISO/IEC 19566 and stored into a JPEG image file.
The object-structured format defines the appearance and behaviour of the individual object. This
format includes the size and opacity of the object, movement information in a given timeline of the
representation, and information on the location where the media data, such as an image codestream, is
found (see Clause 5).
The object-composition format identifies the objects that compose the representation and defines each
object’s creation and destruction. This format describes the temporal and spatial relationship between
objects by providing information on the time and position of the individual object to show, and the time
and position of their disappearance. Each object has independent position information on the decoder
screen, and the composition information determines the z-order of the objects displayed to the user
(see Clause 6).
4.2 System decoder model
A JPEG Snack decoder implements the metadata model described in 4.1. The decoder has three
conceptual necessary components: default image, timeline, and layer and position, as depicted in
Figure 4.2. The decoder decodes the JPEG image to prepare a default image and compose a JPEG Snack
representation with several objects using this default image as a background. Since the JPEG Snack is
created by defining when, where and how objects are composed, the decoder shall handle timeline,
layer, and position.
3
© ISO/IEC 2022 – All rights reserved PROOF/ÉPREUVE

---------------------- Page: 8 ----------------------
ISO/IEC 19566-8:2022(E)
Figure 4.2 — Overview of the JPEG Snack decoder
This document defines the formats based on the informative system decoder model of JPEG Snack, as
depicted in Figure 4.3, to allow various JPEG image coding standards to represent JPEG Snack contents
in a concerted way. Figure 4.3 illustrates an example of the JPEG Snack decoder in which the formats
defined in 4.1 may be implemented.
In Figure 4.3, the object composer receives a JPEG codestream that contains metadata and media data
through the JUMBF parser, constructs the JPEG Snack representation, invokes media decoders to decode
its media data from the codestream, and renders the media content decoded to the output devices. The
object composer controls the media decoder and compositor to decode and display its media content
regarding time and position appropriately. This version of the document allows images, captions, image
sequences, audio clips, video clips to be composed in a representation of JPEG Snack.

a d
Metadata. Position + z-order.
b e
Media data. Media output.
c
Media format + time.
Figure 4.3 — Overview of the system decoder model for JPEG Snack
4
PROOF/ÉPREUVE © ISO/IEC 2022 – All rights reserved

---------------------- Page: 9 ----------------------
ISO/IEC 19566-8:2022(E)
4.3 Metadata model
The system decoder model described in 4.2 is based on the JPEG Snack format depicted in Figure 4.1 to
support the playback of JPEG Snack contents being constituted by multiple media contents.
The metadata is a hierarchical model, as illustrated in Figure 4.4, containing multiple object metadata
(see Clause 6) aligned with composition metadata corresponded to the object-composition format.
Within the object metadata corresponded to the object-structured format, properties (see Annex A)
composing the objects into a representation of the JPEG Snack format such as position, time, and
transition are contained. Each object may be rendered individually in a logical timeline of the decoder
to support re-editing the object; for example, a user may choose a specific object to hide in his/her JPEG
Snack viewer.
Figure 4.4 — High-level metadata model of JPEG Snack
Object metadata specifies the content and additional behaviour of the individual objects that compose
the representation and identifies where the object’s resides. An ID is an identifier of the object in the
representation and a Type attribute allows a decoder to recognize properties of the object proactively.
Composition metadata coordinates the objects composing a JPEG Snack representation. The objects are
arranged into Objects within a composition along with position and time with an identifier attribute.
A Position property determines where the object pointed to by the ObjectID is placed. When objects
are overlapped according to the Position property, the Time and Persistency properties organize the
objects to be placed in front or behind the other object (see 6.2).
JPEG Snack shall have only one composition metadata consisting of one or more objects within a scope
of the JPEG Snack file.
The JPEG Snack decoder described in 4.2 composes a timeline (see 6.1.2) for playback of the JPEG
Snack content by combining the Time information of all objects, and they exist in the representation
individually using their Position and Time information.
4.4 Object-structured file organization
An object in the file organization is a JUMBF box. The JPEG Snack files are formed as a series of boxes.
All metadata is contained in boxes, as illustrated in Figure 4.5. JUMBF boxes for JPEG Snack contains
metadata to compose the JPEG Snack representation, and other types of JUMBF box are used to deliver
the media content, such as a codestream and XML document for each object. The boxes shall be
embedded as defined in Annex A and ISO/IEC 19566-5.
5
© ISO/IEC 2022 – All rights reserved PROOF/ÉPREUVE

---------------------- Page: 10 ----------------------
ISO/IEC 19566-8:2022(E)
Figure 4.5 — Organization of the JPEG Snack file
The JPEG Snack format provides information to define the metadata for composing the representation
and the format in which the metadata is structured in the JPEG image files. The JPEG Snack file has a
different file extension according to the default codestream. Conventional JPEG decoders may ignore
JUMBF boxes for the JPEG. For example, if the JPEG Snack metadata is embedded in the file of the
ISO/IEC 10918-1, denoted by JPEG-1, the extension of the JPEG Snack file is ‘.jpg’ like conventional JPEG-
1 images while the conventional JPEG-1 decoder decodes only the default codestream. This feature
provides compatibility to the existing JPEG image coding standards, including future standards based
on the box-based format.
NOTE 1 The default codestream is placed at the end of the file to be compatible with the conventional JPEG
image coding standards. For example, the JPEG-1 decoder can ignore any extra data beyond the EOI (end of
image) marker.
NOTE 2 Codestream is a sequence of bits representing a compressed image and associated metadata.
In addition, content types of which is indicated by the object metadata may be different JUMBF boxes
based on the object type. The object may refer to JUMBF boxes for media data embedded in another file.
The referencing shall be done as defined in ISO/IEC 19566-5:2019, Annex C.
5 Object-structured format
5.1 General
As described in Clause 4, in the JPEG Snack format, the representation of the JPEG Snack is composed of
a group of media contents. The object in this document is a unit that composes a JPEG Snack format and
contains information to represent the media contents.
Figures 5.1 and 5.2 illustrate the roles of the object-composition and object-structured formats to
compose JPEG Snack representation. The object-composition format (see Clause 6) provides composition
information to define when and where the objects that are constructed will appear and disappear in
a representation, whereas the object-structured format signals information on the individual object’s
behavior and location of the resource. In Figure 4.3, while the object composer manages instances of
the object, the decoding of the individual object is conducted independently by the media decoder. The
6
PROOF/ÉPREUVE © ISO/IEC 2022 – All rights reserved

---------------------- Page: 11 ----------------------
ISO/IEC 19566-8:2022(E)
object composer informs the compositor z-order and movement information of the object. Then the
compositor renders the decoded media data accordingly based on the z-order and position information.
NOTE An invisible object, such as an audio clip, does not have z-order and position information. And a
description of spatial audio is not included in this document, whereas it is considered as a typical audio clip.
a)  Composition of objects at t b)  Composition of objects at t
0 1
Key
t time when the representation is started
0
t time when the representation is ended
1
a
Origin.
b
Representation.
Figure 5.1 — Example of the object-composition format. The t is a time when the
0
representation is started and the t is when the representation is ended.
1
In the example of Figure 5.1, object 2 is above object 1 so that object 1 has an occluded region. Also,
object 3 has an occluded region beyond a representation. The object composer shall handle these
regions smoothly. For object 4, the duration of existence is shorter than the JPEG Snack's total duration.
See Clause 6.1.2 for more details on the temporal composition of objects.
a) Objects’ shape at t b) Objects’ shape at t
0 1
7
© ISO/IEC 2022 – All rights reserved PROOF/ÉPREUVE

---------------------- Page: 12 ----------------------
ISO/IEC 19566-8:2022(E)
c) Object’s movement from at t to t
0 1

a d
Origin. Height.
b e
Representation. Width.
c f
Hidden region. Object’s origin.
Figure 5.2 — Example of the object’s movement. c) exemplifies a moving object by instruction
set.
In Figure 5.2, object 3 moves to another position as time goes from t to t . The movement of the object
0 1
shall be defined by instruction sets as depicted in Figure 5.2 c). Details on the mechanism of moving
objects are described in subclause 6.2.4.
NOTE Objects 1, 2, 3, and 4 in Figure 5.1 correspond to the rectangular, star, circle, and caption in Figure 5.2,
respectively.
5.2 Object definition
5.2.1 General
This subclause defines the object-structured format that specifies the semantics of the objects that
compose a JPEG Snack representation, and the syntax is defined in Annex A. Table 5.1 describes the
semantics of the object with attributes and elements to define the media content and the object’s
properties within the representation.
Attributes and elements define the shape and behaviour of the media content, which is an object
that makes up the representation, and determine the object type as listed in Table 5.1 (see 5.2.3 and
5.2.4). Attributes contain media content information that can identify an object, and elements define
properties for rendering the object on the representation. Table 5.1 describes the meaning of the
parameters constituting an object.
Table 5.1 — Semantics of the object
Attribute name Description
Id An identifier for this object which is a non-zero 8-bit integer. This shall be unique in
a JPEG Snack file.
Type A string in UTF-8 characters. Either ‘static’ or ‘dynamic’ shall be defined.
Number of media An integer declares a number of the media content corresponding to this object. In
the case of the dynamic object, for example, consecutive image sequences shall be
identified in the scope of a single object.
Media type An identifier for the media contents of the object. When the number of media is greater
than 1, the corresponding media contents shall be the same media type.
Element name Description
8
PROOF/ÉPREUVE © ISO/IEC 2022 – All rights reserved

---------------------- Page: 13 ----------------------
ISO/IEC 19566-8:2022(E)
TTabablele 5 5.11 ((ccoonnttiinnueuedd))
Style An identifier provides a unique address where a resource can be found. For more
details see ISO/IEC 19566-5.
The resource provides an additional style of the object, such as transition and font-fam-
[1]
ily. For more details see Cascading Style Sheets (CSS) specification .
Opacity A floating-point number range 0–1 that provides a condition of being transparent. 1
means that this object is fully opaque.
Location An identifier provides a unique address where a resource can be found.
For more details see ISO/IEC 19566-5.
5.2.2 Object types and media types
The type of the object shall be determined by the media type of the content. This document is for the
object types listed in Table 5.2, and the content type of the object is designated using media types, as
described in Table 5.2. Definitions of the object type are also provided in the table.
Table 5.2 — Supported media types
Object type Media type Description
Image All media of type image as listed A still image.
in the IANA Media Type Regis-
See 5.2.3.1.
[1]
try
Caption text/markdown A piece of text for additional information.
See 5.2.3.2.
Pointer All media of type image as listed A graphical indicator used to arouse attention to the
in the IANA Media Type Regis- region of interest that works as a presentation pointer.
[1]
try
See 5.2.3.3.
Image sequence All media of type image as listed A set of consecutive still images.
in the IANA Media Type Regis-
See 5.2.4.1.
[1]
try
Video clip All media of type video as listed in A briefly recorded file used to convey audio-visual
[1]
the IANA Media Type Registry information to the audience.
See 5.2.4.2.
Audio clip All media of type audio as listed in A briefly recorded file used to convey audible informa-
[1]
the IANA Media Type Registry tion to the audience.
See 5.2.4.3.
When a JPEG Snack decoder does not support a media type, the corresponding object is ignored, and the
decoder shall inform that the object is missing in the representation.
JPEG Snack differentiates image and image sequence while both objects use the same media type. Even
though several images are contained in a single JPEG Snack representation, those images are relatively
less correlated in the context of the representation, which means that occlusion or exclusion of some
images out of several images do not harm the representation. However, images in a sequence are highly
correlated to create a valid context of the representation. For example, when one of the frames in
animation is skipped, the animation may look strange.
In this document, objects are categorized into static and dynamic objects based on if the object’s
contents are changing during the representation as time goes. Details on the object are defined in 5.2.3
and 5.2.4.
5.2.3 Static objects
This version of the document defines image, caption, and pointer as static objects. The value of the type
attribute shall be a 'static' and number of media shall be 1, while other attributes and elements vary
9
© ISO/IEC 2022 – All rights reserved PROOF/ÉPREUVE

---------------------- Page: 14 ----------------------
ISO/IEC 19566-8:2022(E)
to the content of the object, as described in Table 5.3. Style and opacity elements are optional. If those
elements are absent, the object has a fixed position and fully opaque in the re
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.