Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD, AN APPARATUS AND A COMPUTER PROGRAM PRODUCT FOR VIDEO ENCODING AND VIDEO DECODING
Document Type and Number:
WIPO Patent Application WO/2024/084128
Kind Code:
A1
Abstract:
The embodiments relate to a encoding method, comprising: receiving (2205) a dynamic three- dimensional (3D) mesh sequence frame, wherein the three-dimensional mesh represents a three- dimensional object; generating (2210) a base mesh for the mesh sequence frame, wherein the base mesh has a target reduction count optimized for target bitrate, and wherein the base mesh is represented by a plurality of faces, each face having a closed set of edges; setting (2215) a target subdivision iteration count to reach the target output resolution; running (2220) a set of subdivision methods for each face of the base mesh, and determining a set of displacement vectors for each level- of-detail defined by the target subdivision iteration count; assigning (2225) an identification for a face of the base mesh, which identification identifies a subdivision method that results in a defined quality for the respective face; forming (2230) clusters of sets of connected faces sharing the same subdivision method identification; generating (2235) residual samples by applying a wavelet transform to each displacement vector; encoding (2240) the residual samples in a displacement video bitstream; and encoding (2245) the base mesh, information on face clusters, and subdivision method identifications into respective bitstreams. The embodiments also relate to a method for decoding, and technical equipment for implementing the methods.

Inventors:
RONDAO ALFACE PATRICE (BE)
MARTEMIANOV ALEKSEI (FI)
KONDRAD LUKASZ (DE)
ILOLA LAURI ALEKSI (FI)
SCHWARZ SEBASTIAN (DE)
Application Number:
PCT/FI2023/050555
Publication Date:
April 25, 2024
Filing Date:
September 29, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NOKIA TECH OY (FI)
International Classes:
H04N19/597; G06T9/00; G06T17/20; H04N13/271; H04N19/33; H04N19/85
Attorney, Agent or Firm:
NOKIA TECHNOLOGIES OY et al. (FI)
Download PDF:
Claims:
Claims: 1. An apparatus for encoding comprising means for receiving a dynamic three-dimensional (3D) mesh sequence frame, wherein the three-dimensional mesh represents a three- dimensional object; means for generating a base mesh for the mesh sequence frame, wherein the base mesh has a target reduction count optimized for target bitrate, and wherein the base mesh is represented by a plurality of faces, each face having a closed set of edges; means for setting a target subdivision iteration count to reach the target output resolution; means for running a set of subdivision methods for each face of the base mesh, and determining a set of displacement vectors for each level-of- detail defined by the target subdivision iteration count; means for assigning an identification for a face of the base mesh, which identification identifies a subdivision method that results in a defined quality for the respective face; means for forming clusters of sets of connected faces sharing the same subdivision method identification; means for generating residual samples by applying a wavelet transform to each displacement vector; means for encoding the residual samples in a displacement video bitstream; and means for encoding the base mesh, information on face clusters, and subdivision method identifications into respective bitstreams. 2. The apparatus according to claim 1, wherein a set of subdivision methods comprises a midpoint subdivision method comprising prediction based on two neighboring vertices. 3. The apparatus according to claim 1, wherein a set of subdivision methods comprises a loop subdivision method, comprising repositioning vertices and boundary edges of a previous level-of-detail (LOD) and performing a loop prediction of an interior vertex based on four neighboring vertices.

4. The apparatus according to claim 1, wherein a set of subdivision methods comprises a butterfly subdivision method, comprising performing a prediction of a vertex based on eight neighboring vertices from previous level-of-detail (LOD). 5. The apparatus according to any of the claims 1 to 4, further comprising means for running several wavelet transforms and weights for each vertex of each face of each LOD of a deformed mesh, the deformed mesh having been obtained by adding the set of displacement vectors to subdivided base mesh. 6. The apparatus according to claim 5, further comprising means for assigning to a vertex an identification of a wavelet transform that results in a smallest residual. 7. The apparatus according to claim 6, further comprising means for mapping the wavelet transform of the vertex to the corresponding LOD of the deformed mesh. 8. The apparatus according to claim 7, further comprising means for assigning to each base mesh an identification describing the wavelet transform to be used for each LOD generated face. 9. The apparatus according to claim 8, further comprising creating patches of faces sharing the same subdivision method identification and sharing the same identification describing the wavelet transform. 10.The apparatus according to claim 9, further comprising grouping faces not being comprised in the created patches into one or more other patches. 11.An apparatus for decoding, comprising means for receiving one or more bitstreams; means for decoding base mesh and information on face clusters from said one or more bitstreams; means for decoding subdivision method identifications from said one or more bitstreams; means for decoding information on wavelet transform from said one or more bitstreams; means for decoding a displacement video bitstream from said one or more bitstreams, the displacement video comprising residual samples; means for generating a set of displacement vectors from the residual samples based on the extracted information on the wavelet transform; means for reconstructing each level-of-detail base mesh according to the decoded base mesh, subdivision method identifications; and means for reconstructing output mesh from the reconstructed level of detail base mesh, the set of displacement vectors, and information on face clusters. 12.A method for encoding, comprising: receiving a dynamic three-dimensional (3D) mesh sequence frame, wherein the three-dimensional mesh represents a three-dimensional object; generating a base mesh for the mesh sequence frame, wherein the base mesh has a target reduction count optimized for target bitrate, and wherein the base mesh is represented by a plurality of faces, each face having a closed set of edges; setting a target subdivision iteration count to reach the target output resolution; running a set of subdivision methods for each face of the base mesh, and determining a set of displacement vectors for each level-of-detail defined by the target subdivision iteration count; assigning an identification for a face of the base mesh, which identification identifies a subdivision method that results in a defined quality for the respective face; forming clusters of sets of connected faces sharing the same subdivision method identification; generating residual samples by applying a wavelet transform to each displacement vector; encoding the residual samples in a displacement video bitstream; and encoding the base mesh, information on face clusters, and subdivision method identifications into respective bitstreams. 13.A method for decoding, comprising receiving one or more bitstreams; decoding base mesh and information on face clusters from said one or more bitstreams; decoding subdivision method identifications from said one or more bitstreams; decoding information on wavelet transform from said one or more bitstreams; decoding a displacement video bitstream from said one or more bitstreams, the displacement video comprising residual samples; generating a set of displacement vectors from the residual samples based on the extracted information on the wavelet transform; reconstructing each level-of-detail base mesh according to the decoded base mesh, subdivision method identifications; and reconstructing output mesh from the reconstructed level of detail base mesh, the set of displacement vectors, and information on face clusters. An apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: receive a dynamic three-dimensional (3D) mesh sequence frame, wherein the three-dimensional mesh represents a three-dimensional object; generate a base mesh for the mesh sequence frame, wherein the base mesh has a target reduction count optimized for target bitrate, and wherein the base mesh is represented by a plurality of faces, each face having a closed set of edges; set a target subdivision iteration count to reach the target output resolution; run a set of subdivision methods for each face of the base mesh, and determining a set of displacement vectors for each level-of-detail defined by the target subdivision iteration count; assign an identification for a face of the base mesh, which identification identifies a subdivision method that results in a defined quality for the respective face; form clusters of sets of connected faces sharing the same subdivision method identification; generate residual samples by applying a wavelet transform to each displacement vector; encode the residual samples in a displacement video bitstream; and encode the base mesh, information on face clusters, and subdivision method identifications into respective bitstreams. An apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: receive one or more bitstreams; decode base mesh and information on face clusters from said one or more bitstreams; decode subdivision method identifications from said one or more bitstreams; decode information on wavelet transform from said one or more bitstreams; decode a displacement video bitstream from said one or more bitstreams, the displacement video comprising residual samples; generate a set of displacement vectors from the residual samples based on the extracted information on the wavelet transform; reconstruct each level-of-detail base mesh according to the decoded base mesh, subdivision method identifications; and reconstruct output mesh from the reconstructed level of detail base mesh, the set of displacement vectors, and information on face clusters.

Description:
A METHOD, AN APPARATUS AND A COMPUTER PROGRAM PRODUCT FOR VIDEO ENCODING AND VIDEO DECODING Technical Field The present solution generally relates to encoding and decoding of volumetric video. Background Volumetric video data represents a three-dimensional (3D) scene or object, and can be used as input for AR (Augmented Reality), VR (Virtual Reality), and MR (Mixed Reality) applications. Such data describes geometry (Shape, size, position in 3D space) and respective attributes (e.g., color, opacity, reflectance, …), and any possible temporal transformations of the geometry and attributes at given time instances (like frames in two-dimensional (2D) video). Volumetric video can be generated from 3D models, also referred to as volumetric visual objects, i.e., CGI (Computer Generated Imagery), or captured from real-world scenes using a variety of capture solutions, e.g., multi-camera, laser scan, combination of video and dedicated depth sensors, and more. Also, a combination of CGI and real-world data is possible. Examples of representation formats for volumetric data comprise triangle meshes, point clouds, or voxels. Temporal information about the scene can be included in the form of individual capture instances, i.e., “frames” in 2D video, or other means, e.g., position of an object as a function of time. Because volumetric video describes a 3D scene (or object), such data can be viewed from any viewpoint. Therefore, volumetric video is an important format for any AR, VR or MR applications, especially for providing 6DOF viewing capabilities. Increasing computational resources and advances in 3D data acquisition devices have enabled reconstruction of highly detailed volumetric video representations of natural scenes. Infrared, lasers, time-of-flight, and structured light are examples of devices that can be used to construct 3D video data. Representation of the 3D data depends on how the 3D data is used. Dense Voxel arrays have been used to represent volumetric medical data. In 3D graphics, polygonal meshes are extensively used. Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold. Another way to represent 3D data is coding, this 3D data as set of texture and depth map as is the case in the multi-view plus depth. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi- level surface maps. Summary The scope of protection sought for various embodiments of the invention is set out by the independent claims. The embodiments and features, if any, described in this specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding various embodiments of the invention. Various aspects include a method, an apparatus and a computer readable medium comprising a computer program stored therein, which are characterized by what is stated in the independent claims. Various embodiments are disclosed in the dependent claims. According to a first aspect, there is provided an apparatus for encoding comprising means for receiving a dynamic three-dimensional (3D) mesh sequence frame, wherein the three-dimensional mesh represents a three-dimensional object; means for generating a base mesh for the mesh sequence frame, wherein the base mesh has a target reduction count optimized for target bitrate, and wherein the base mesh is represented by a plurality of faces, each face having a closed set of edges; means for setting a target subdivision iteration count to reach the target output resolution; means for running a set of subdivision methods for each face of the base mesh, and determining a set of displacement vectors for each level-of-detail defined by the target subdivision iteration count; means for assigning an identification for a face of the base mesh, which identification identifies a subdivision method that results in a defined quality for the respective face; means for forming clusters of sets of connected faces sharing the same subdivision method identification; means for generating residual samples by applying a wavelet transform to each displacement vector; means for encoding the residual samples in a displacement video bitstream; and means for encoding the base mesh, information on face clusters, and subdivision method identifications into respective bitstreams. According to a second aspect, there is provided an apparatus for decoding comprising means for receiving one or more bitstreams; means for decoding base mesh and information on face clusters from said one or more bitstreams; means for decoding subdivision method identifications from said one or more bitstreams; means for decoding information on wavelet transform from said one or more bitstreams; means for decoding a displacement video bitstream from said one or more bitstreams, the displacement video comprising residual samples; means for generating a set of displacement vectors from the residual samples based on the extracted information on the wavelet transform; means for reconstructing each level- of-detail base mesh according to the decoded base mesh, subdivision method identifications; and means for reconstructing output mesh from the reconstructed level of detail base mesh, the set of displacement vectors, and information on face clusters. According to a third aspect, there is provided a method for encoding, comprising: receiving a dynamic three-dimensional (3D) mesh sequence frame, wherein the three-dimensional mesh represents a three-dimensional object; generating a base mesh for the mesh sequence frame, wherein the base mesh has a target reduction count optimized for target bitrate, and wherein the base mesh is represented by a plurality of faces, each face having a closed set of edges; setting a target subdivision iteration count to reach the target output resolution; running a set of subdivision methods for each face of the base mesh, and determining a set of displacement vectors for each level-of-detail defined by the target subdivision iteration count; assigning an identification for a face of the base mesh, which identification identifies a subdivision method that results in a defined quality for the respective face; forming clusters of sets of connected faces sharing the same subdivision method identification; generating residual samples by applying a wavelet transform to each displacement vector; encoding the residual samples in a displacement video bitstream; and encoding the base mesh, information on face clusters, and subdivision method identifications into respective bitstreams. According to a fourth aspect, there is provided a method for decoding, comprising: receiving one or more bitstreams; decoding base mesh and information on face clusters from said one or more bitstreams; means for decoding subdivision method identifications from said one or more bitstreams; decoding information on wavelet transform from said one or more bitstreams; decoding a displacement video bitstream from said one or more bitstreams, the displacement video comprising residual samples; generating a set of displacement vectors from the residual samples based on the extracted information on the wavelet transform; reconstructing each level-of-detail base mesh according to the decoded base mesh, subdivision method identifications; and reconstructing output mesh from the reconstructed level of detail base mesh, the set of displacement vectors, and information on face clusters. According to a fifth aspect, there is provided an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: receive a dynamic three-dimensional (3D) mesh sequence frame, wherein the three-dimensional mesh represents a three-dimensional object; generate a base mesh for the mesh sequence frame, wherein the base mesh has a target reduction count optimized for target bitrate, and wherein the base mesh is represented by a plurality of faces, each face having a closed set of edges; set a target subdivision iteration count to reach the target output resolution; run a set of subdivision methods for each face of the base mesh, and determining a set of displacement vectors for each level-of-detail defined by the target subdivision iteration count; assign an identification for a face of the base mesh, which identification identifies a subdivision method that results in a defined quality for the respective face; form clusters of sets of connected faces sharing the same subdivision method identification; generate residual samples by applying a wavelet transform to each displacement vector; encode the residual samples in a displacement video bitstream; and encode the base mesh, information on face clusters, and subdivision method identifications into respective bitstreams. According to a sixth aspect, there is provided an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: receive one or more bitstreams; decode base mesh and information on face clusters from said one or more bitstreams; decode subdivision method identifications from said one or more bitstreams; decode information on wavelet transform from said one or more bitstreams; decode a displacement video bitstream from said one or more bitstreams, the displacement video comprising residual samples; generate a set of displacement vectors from the residual samples based on the extracted information on the wavelet transform; reconstruct each level-of-detail base mesh according to the decoded base mesh, subdivision method identifications; and reconstruct output mesh from the reconstructed level of detail base mesh, the set of displacement vectors, and information on face clusters. According to a seventh aspect, there is provided computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to: receive a dynamic three-dimensional (3D) mesh sequence frame, wherein the three-dimensional mesh represents a three-dimensional object; generate a base mesh for the mesh sequence frame, wherein the base mesh has a target reduction count optimized for target bitrate, and wherein the base mesh is represented by a plurality of faces, each face having a closed set of edges; set a target subdivision iteration count to reach the target output resolution; run a set of subdivision methods for each face of the base mesh, and determining a set of displacement vectors for each level-of-detail defined by the target subdivision iteration count; assign an identification for a face of the base mesh, which identification identifies a subdivision method that results in a defined quality for the respective face; form clusters of sets of connected faces sharing the same subdivision method identification; generate residual samples by applying a wavelet transform to each displacement vector; encode the residual samples in a displacement video bitstream; and encode the base mesh, information on face clusters, and subdivision method identifications into respective bitstreams. According to an eighth aspect, there is provided computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to: receive one or more bitstreams; decode base mesh and information on face clusters from said one or more bitstreams; decode subdivision method identifications from said one or more bitstreams; decode information on wavelet transform from said one or more bitstreams; decode a displacement video bitstream from said one or more bitstreams, the displacement video comprising residual samples; generate a set of displacement vectors from the residual samples based on the extracted information on the wavelet transform; reconstruct each level-of-detail base mesh according to the decoded base mesh, subdivision method identifications; and reconstruct output mesh from the reconstructed level of detail base mesh, the set of displacement vectors, and information on face clusters. According to an embodiment, a set of subdivision methods comprises a midpoint subdivision method comprising prediction based on two neighboring vertices. According to an embodiment, a set of subdivision methods comprises a loop subdivision method, comprising repositioning vertices and boundary edges of a previous level-of-detail (LOD) and performing a loop prediction of an interior vertex based on four neighboring vertices. According to an embodiment, a set of subdivision methods comprises a butterfly subdivision method, comprising performing a prediction of a vertex based on eight neighboring vertices from previous level-of-detail (LOD). According to an embodiment, several wavelet transforms and weights are run for each vertex of each face of each LOD of a deformed mesh, the deformed mesh having been obtained by adding the set of displacement vectors to subdivided base mesh. According to an embodiment, it is assigned to a vertex an identification of a wavelet transform that results in a smallest residual. According to an embodiment, the wavelet transform of the vertex is mapped to the corresponding LOD of the deformed mesh. According to an embodiment, it is assigning to each base mesh an identification describing the wavelet transform to be used for each LOD generated face. According to an embodiment, patches of faces sharing the same subdivision method identification and sharing the same identification describing the wavelet transform are created. According to an embodiment, faces not being comprised in the created patches are grouped into one or more other patches. According to an embodiment, the computer program product is embodied on a non- transitory computer readable medium. Description of the Drawings In the following, various embodiments will be described in more detail with reference to the appended drawings, in which Fig.1a shows an example of a volumetric media conversion at an encoder; Fig.1b shows an example of a volumetric media reconstruction at a decoder; Fig.2 shows an example of block to patch mapping; Fig.3a shows an example of an atlas coordinate system; Fig.3b shows an example of a local 3D patch coordinate system; Fig.3c shows an example of a final target 3D coordinate system; Fig.4 shows a simplified example of a subdivision step of a triangle into four triangles; Fig.5 shows an example of a multiresolution analysis of a mesh; Fig.6 shows an example of an encoder comprising a pre-processing module for generating a mesh; Fig.7 shows an example of pre-processing steps at an encoder; Fig.8 shows an example of an intra frame encoder; Fig.9 shows an example of an inter frame encoder; Fig.10 shows an example of a decoder comprising a post-processing module for reconstruing a dynamic mesh sequence; Fig.11 shows an example of a decoding process in intra mode; Fig.12 shows an example of a decoding process in inter mode; Fig.13 shows an example of a base-mesh encoder; Fig.14 shows another example of a base-mesh encoder; Fig.15 shows an example of a base-mesh decoder; Fig.16 shows an example of segmentation of a mesh into sub-meshes; Fig.17 shows an example with two submeshes; Fig.18 shows an example of displacement values distribution per LOD; Fig.19 shows an example of choice of subdivision methods and error per vertex; Fig.20 shows an example of results of applying different subdivision methods; Fig.21 shows examples of subdivision methods; Fig.22 is a flowchart illustrating a method for encoding according to an embodiment; Fig.23 is a flowchart illustrating a method for decoding according to another embodiment; and Fig.24 shown an example of an apparatus. Embodiments The following description and drawings are illustrative and are not to be construed as unnecessarily limiting. The specific details are provided for a thorough understanding of the disclosure. However, in certain instances, well-known or conventional details are not described in order to avoid obscuring the description. References to one or an embodiment in the present disclosure can be, but not necessarily are, reference to the same embodiment and such references mean at least one of the embodiments. Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment in included in at least one embodiment of the disclosure. Volumetric video data represents a three-dimensional scene or object and can be used as input for AR, VR and MR applications. Such data describes geometry (shape, size, position in 3D space) and respective attributes (e.g., color, opacity, reflectance, …), plus any possible temporal transformations of the geometry and attributes at given time instances (like frames in 2D video). Volumetric video is either generated from 3D models, i.e., CGI, or captured from real-world scenes using a variety of capture solutions, e.g., multi-camera, laser scan, combination of video and dedicated depth sensors, and more. Also, a combination of CGI and real-world data is possible. Representation formats for such volumetric data are triangle meshes, point clouds, or voxels. Temporal information about the scene can be included in the form of individual capture instances, i.e., “frames” in 2D video, or other means, e.g., position of an object as a function of time. Because volumetric video describes a 3D scene (or object), such data can be viewed from any viewpoint. Therefore, volumetric video is an important format for any AR, VR or MR applications, especially for providing 6DOF viewing capabilities. Increasing computational resources and advances in 3D data acquisition devices have enabled reconstruction of highly detailed volumetric video representations or natural scenes. Infrared, lasers, time-of-flight, and structured light are all examples of devices that can be used to construct 3D video data. Representation of the 3D data depends on how the 3D data is used. Dense Voxel arrays have been used to represent volumetric medical data. In 3D graphics, polygonal meshes are extensively used. Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold. Another way to represent 3D data is coding this 3D data as set of texture and depth map as is the case in the multi-view plus depth. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi- level surface maps. In the following, a short reference of ISO/IEC DIS 23090-5 Visual Volumetric Video- based Coding (V3C) and Video-based Point Cloud Compression (V-PCC) 2nd Edition is given. Visual volumetric video comprising a sequence of visual volumetric frames, if uncompressed, may be represented by a large amount of data, which can be costly in terms of storage and transmission. This has led to the need for a high coding efficiency standard for the compression of visual volumetric data. V3C enables the encoding and decoding processes of a variety of volumetric media by using video and image coding technologies. This is achieved through first a conversion of such media from their corresponding 3D representation to multiple 2D representations, also referred to as V3C video components, before coding such information. Such representations may include occupancy, geometry, and attribute components. The occupancy component can inform a V3C decoding and/or rendering system of which samples in the 2D components are associated with data in the final 3D representation. The geometry component contains information about the precise location of 3D data in space, while attribute components can provide additional properties, e.g., texture or material information, of such 3D data. An example is shown in Figures 1a and 1b, where Figure 1a presents volumetric media conversion at an encoder, and where Figure 1b presents volumetric media reconstruction at a decoder side. The 3D media is converted to a series of 2D representations: occupancy 101, geometry 102, and attributes 103. Additional information may also be included in the bitstream to enable inverse reconstruction. Additional information that allows associating all these V3C video components, and enables the inverse reconstruction from a 2D representation back to a 3D representation is also included in a special component, referred to in this document as the atlas 104. An atlas 104 consists of multiple elements, named as patches. Each patch identifies a region in all available 2D components and contains information necessary to perform the appropriate inverse projection of this region back to the 3D space. The shape of such regions is determined through a 2D bounding volume associated with each patch as well as their coding order. The shape of these regions is also further refined after the consideration of the occupancy information. Atlases may be partitioned into patch packing blocks of equal size. The 2D bounding volumes of patches and their coding order determine the mapping between the blocks of the atlas image and the patch indices. Figure 2 shows an example of block to patch mapping with 4 projected patches onto an atlas when asps_patch_precedence_order_flag is equal to 0. Projected points are represented with dark grey. The area that does not contain any projected points is represented with light grey. Patch packing blocks are represented with dashed lines. The number inside each patch packing block represents the patch index of the patch to which it is mapped. Axes orientations are specified for internal operations. For instance, the origin of the atlas coordinates is located on the top-left corner of the atlas frame. For the reconstruction step, an intermediate axes definition for a local 3D patch coordinate system is used. The 3D local patch coordinate system is then converted to the final target 3D coordinate system using appropriate transformation steps. Figure 3a shows an example of a single patch 320 packed onto an atlas image 310. This patch 320 is then converted to a local 3D patch coordinate system (U, V, D) defined by the projection plane with origin O’, tangent (U), bi-tangent (V), and normal (D) axes. For an orthographic projection, the projection plane is equal to the sides of an axis-aligned 3D bounding volume 330, as shown in Figure 3b. The location of the bounding volume 330 in the 3D model coordinate system, defined by a left- handed system with axes (X, Y, Z), can be obtained by adding offsets TilePatch3dOffsetU, TilePatch3DOffsetV, and TilePatch3DOffsetD, as illustrated in Figure 3c. Coded V3C video components are referred to in this disclosure as video bitstreams, while a coded atlas is referred to as the atlas bitstream. Video bitstreams and atlas bitstreams may be further split into smaller units, referred to here as video and atlas sub-bitstreams, respectively, and may be interleaved together, after the addition of appropriate delimiters, to construct a V3C bitstream. V3C patch information is contained in atlas bitstream, atlas_sub_bitstream(), which contains a sequence of NAL units. NAL unit is specified to format data and provide header information in a manner appropriate for conveyance on a variety of communication channels or storage media. All data are contained in NAL units, each of which contains an integer number of bytes. A NAL unit specifies a generic format for use in both packet-oriented and bitstream systems. The format of NAL units for both packet-oriented transport and sample streams is identical except that in the sample stream format specified in Annex D of ISO/IEC 23090-5 each NAL unit can be preceded by an additional element that specifies the size of the NAL unit. NAL units in atlas bitstream can be divided to atlas coding layer (ACL) and non-atlas coding layer (non-ACL) units. The former dedicated to carry patch data while the later to carry data necessary to properly parse the ACL units or any additional auxiliary data. In the nal_unit_header() syntax nal_unit_type specifies the type of the RBSP (Raw Byte Sequence Payload) data structure contained in the NAL unit as specified in Table 4 of ISO/IEC 23090-5. nal_layer_id specifies the identifier of the layer to which an ACL NAL unit belongs or the identifier of a layer to which a non-ACL NAL unit applies. The value of nal_layer_id shall be in the range of 0 to 62, inclusive. The value of 63 may be specified in the future by ISO/IEC. Decoders conforming to a profile specified in Annex A of ISO/IEC 23090-5 shall ignore (i.e., remove from the bitstream and discard) all NAL units with values of nal_layer_id not equal to 0. While designing V3C specification it was envisaged that amendments or new editions can be created in the future. In order to ensure that the first implementations of V3C decoders are compatible with any future extension, a number of fields for future extensions to parameter sets were reserved. For example, second edition of V3C introduced an extension in VPS related to MIV and packed video component. A polygon mesh is a collection of vertices, edges and faces that defines the shape of a polyhedral object in 3D computer graphics and solid modelling. The faces may consist of triangles (triangle mesh), quadrilaterals (quads), or other simple convex polygons (n-gons), since this simplifies rendering, but may also be more generally composed of concave polygons, or even polygons with holes. Objects created with polygon meshes are represented by different types of elements. These include vertices, edges, faces, polygons, and surfaces. In many applications, only vertices, edges and either faces or polygons are stored. Polygon meshes are defined by the following elements: ^ Vertex: A position in 3D space defined as (x, y, z) along with other information such as color (r, g, b), normal vector and texture coordinates. ^ Edge: A connection between two vertices. ^ Face: A closed set of edges, in which a triangle face has three edges, and a quad face has four edges. A polygon is a coplanar set of faces. In systems that support multi-sided faces, polygons and faces are equivalent. Mathematically a polygonal mesh may be considered an unstructured grid, 1 Under preparation. Stage at time of publication: ISO/IEC CD 23090-12:2020 or undirected graph, with additional properties of geometry, shape and topology. ^ Surfaces: or smoothing groups, are useful, but not required to group smooth regions. ^ Groups: Some mesh formats contain groups, which define separate elements of the mesh, and are useful for determining separate sub-objects for skeletal animation or separate actors for non-skeletal animation. ^ Materials: defined to allow different portions of the mesh to use different shaders when rendered. ^ UV coordinates: Most mesh formats also support some form of UV coordinates which are a separate 2D representation of the mesh "unfolded" to show what portion of a 2-dimensional texture map applies to different polygons of the mesh. It is also possible for meshes to contain other vertex attribute information such as color, tangent vectors, weight maps to control animation, etc. (sometimes also called channels). An edgebreaker is an algorithm for efficient compression of 3D meshes. The edgebreaker encodes the connectivity of the face (e.g., triangle) meshes. Because of the performance and simplicity of edgebreaker, it has been adopted in popular compression libraries. As an example, edgebreaker is at the core of the of the Google Draco compression library. Google Draco is an open-source library for compression and decompressing 3D geometric meshes and point clouds. It is intended to improve the storage and transmission of 3D graphics. Mesh data may be compressed directly without projecting it into 2D-planes, like in V-PCC based mesh coding. In fact, the anchor for V-PCC mesh compression call for proposals (CfP) utilizes the Draco technology for compressing mesh data excluding textures. As said, Draco is a technique used to compress vertex positions in 3D, connectivity data (faces) as well as UV coordinates. Additional per-vertex attributes may be also compressed using Draco. The actual UV texture may be compressed using traditional video compression technologies, such as H.265 or H.264. Draco uses the edgebreaker algorithm at its core to compress 3D mesh information. It offers a good balance between simplicity and efficiency, and is part of Khronos endorsed extensions for the glTF specification. The main idea of the algorithm is to traverse mesh triangles in a deterministic way so that each new triangle is encoded next to an already encoded triangle. This enables prediction of vertex specific information from the previously encoded data by simply adding delta to the previous data. Edgebreaker utilizes symbols to signal how each new triangle is connected to the previously encoded part of the mesh. Connecting triangles in such a way results on average in 1 to 2 bits per triangle when combined with existing binary encoding techniques. The V-DMC standardization works have started after the completion of the call for proposal (CfP) issued by MPEG 3DG (ISO/IEC SC29 WG 2) on integration of MESH compression into the V3C family of standards (ISO/IEC 23090-5). The retained technology after the CfP result analysis is based on multiresolution mesh analysis and coding. This approach comprises ^ generating a base mesh that is a simplified (low resolution) mesh approximation of the original mesh, called base mesh (this is done for all frames of the dynamic mesh sequence) mi ^ performing several mesh subdivision iterative steps (e.g., each triangle (or a face of other format) is converted into four triangles by connecting the triangle edge midpoints as illustrated in Figure 8) on the generated base mesh, generating other approximation meshes mn i where n stands for the number of iterations with m 0 i = m i ^ defining displacement vectors di, also named error vectors, for each vertex of each mesh approximation mn n i with n > 0, noted d i. ^ For each subdivision level, the deformed mesh generates the best approximation of the original mesh at that resolution, given the base mesh and prior subdivision levels. The deformed mesh may be obtained by mn i + dn i, i.e., by adding the displacement vectors to the subdivided mesh vertices. ^ The displacement vectors may undergo a lazy wavelet transform prior to compression. ^ The attribute map of the original mesh is transferred to the deformed mesh at the highest resolution (i.e., subdivision level) such that texture coordinates are obtained for the deformed mesh and a new attribute map is generated. The scheme is illustrated on Figure 5. The encoding process can be separated into two main modules, as shown in Figure 6. Figure 6 illustrates an example of an encoder comprising a pre-processing module 610 and an actual encoder module 620. The pre-processing module 610 is configured to generate a Base mesh and Displacements based on its input, i.e., static, or dynamic mesh and attribute map. The encoder 620 is configured to encode the output from the preprocessing module 610 into a bitstream. Figure 7 illustrates the steps of the pre-processing module in more detailed. Those steps comprise: decimation (reducing the original mesh resolution to produce a base mesh) 710, UV-atlas isocharting (creating a parameterization of the base mesh) 720 and the subdivision surface fitting 730. The examples of an encoder are illustrated on Figure 8 and Figure 9, where Figure 8 illustrates an INTRA frame encoder and Figure 9 illustrates an INTER frame encoder. In the INTER frame encoder of Figure 9, the base mesh connectivity of the first frame of a group of frames is imposed to the subsequent frame’s base meshes to improve compression performance. In the INTRA frame encoding of Figure 8, inputs to this module are the base mesh (that is an approximation of the input mesh but that contains less faces and vertices) 802, the patch information 801 related to the input base mesh, the displacements 803, the static/dynamic input mesh frame 804 and the attribute map 805. Outputs of this module is a compressed bitstream 895 that contains a V3C extended signalling sub-bitstream including patch data information, compressed base mesh substream, a compressed displacement video component substream and a compressed Attribute video component sub-bitstream. The module takes the input base mesh and first quantize its data in the Quantization module, which can be dynamically tuned by a Control Module. The quantized base mesh is then encoded with the static mesh encoder module, which outputs a compressed base mesh sub- bitstream that is multiplexed in the output bitstream. The encoded base mesh is decoded in the Static Mesh Decoder module that generates a reconstructed quantized base mesh. The Update Displacements module takes as input the reconstructed quantizes base mesh, the pristine base mesh and the input displacements to generate new updated displacements that are remapped to the reconstructed base mesh data in order to avoid precision errors due to the static mesh encoding and decoding process. The updated displacements are filtered with a wavelet transform in the Wavelet Transform module (that also takes as input the reconstructed base mesh) and then quantized in the Quantization module. The quantized wavelet coefficients produced from the updated displacements are then packed into a video component in the Image Packing module. This video component is then encoded with a 2D video encoder such as HEVC, VVC, etc., in the Video Encoder module, and the output compressed displacement video component sub- bitstream is multiplexed along with the V3C signalling information sub-bitstream into the output compressed bitstream. Then the compressed displacement video component is first decoded and reconstructed and then unpacked into encoded and quantized wavelet coefficients in the Image Unpacking module. These wavelet coefficients are then unquantized in the inverse quantization module and reconstructed with the inverse wavelet transform module that generates reconstructed displacements. The reconstructed base mesh is unquantized in the inverse quantization module and the unquantized base mesh is combined with the reconstructed displacements in the Reconstruct Deformed Mesh module to obtain the reconstructed deformed mesh. This reconstructed deformed mesh is then fed into the Attribute Transfer module together with the Attribute map produced by the pre-processing and the input static/dynamic mesh frame. The output of the Attribute Transfer module is an updated attribute map that now corresponds to the reconstructed deformed mesh frame. The updated attribute map is then padded, undergoes color conversion and is encoded as a video component with a 2D video codec such as HEVC or VVC, in the Padding, Color Conversion and Video encoder modules respectively. The output compressed attribute map bitstream is multiplexed into the encoder output bitstream. The inter encoding process, shown in Figure 9, is similar to the intra encoding process of Figure 8 with the following changes. The reconstructed reference base mesh 910 is an input of the inter coding process. A new module called Motion Encoder 950 takes as input the quantized input base mesh and the reconstructed quantized reference base mesh 910 to produce compressed motion information encoded as a compressed motion bitstream, which is multiplexed into the encoder output compressed bitstream. All other modules and processes are similar to the intra encoding case. The compressed bitstream generated by the encoder multiplexes: ^ A sub-bitstream with the encoded base mesh using a static mesh codec ^ A sub-bitstream with the encoded motion data using an animation codec for base meshes in case INTER coding is enabled ^ A sub-bitstream with the wavelet coefficients of the displacement vectors packed in an image and encoded using a video codec ^ A sub-bitstream with the attribute map encoded using a video codec. ^ A sub-bitstream that contains all metadata required to decode and reconstruct the mesh sequence based on the aforementioned sub- bitstreams. The signalling of the metadata is based on the V3C syntax and includes necessary extensions that are specific to meshes. Figure 10 illustrates a decoding process according to an embodiment. First the compressed bitstream 1010 is demultiplexed into sub-bitstreams that are reconstructed, i.e., metadata, reconstructed base mesh, reconstructed displacements and the reconstructed attribute map data. The reconstruction of the mesh sequence is performed based on that data in the post-processing module 1060. Figure 11 and Figure 12 illustrate the decoding process in INTRA and INTER mode respectively. The intra frame decoding process as shown in example of Figure 11 comprises the following modules and processes. First the input compressed bitstream is de- multiplexed 1110 into V3C extended atlas data information (or patch information), a compressed static mesh bitstream, a compressed displacement video component and a compressed attribute map bitstream, respectively. The static mesh decoding module 1120 converts the compressed static mesh bitstream into a reconstructed quantized static mesh, which represents a base mesh. This reconstructed quantized base mesh undergoes inverse quantization in the inverse quantization module 1125 to produce a decoded reconstructed base mesh. The compressed displacement video component bitstream is decoded in the video decoding module 1130 to generate a reconstructed displacement video component. This displacement video component is unpacked into reconstructed quantized wavelet coefficients in the image unpacking module 1135. Reconstructed quantized wavelet coefficients are inverse quantized in the inverse quantization module 1140 and then undergo an inverse wavelet transform in the inverse wavelet transform module 1145, that produces decoded displacement vectors. The reconstruct deformed mesh module 1150 takes into account the patch information and takes as input the decoded reconstructed base mesh and decoded displacement vectors to produce the output decoded mesh frame. The compressed attribute map video component is decoded in the video decoding module 1130, and possibly undergoes color conversion 1160 to produce a decoded attribute map frame that corresponds to the decoded mesh frame. The inter decoding process, shown in Figure 12, is similar to the intra decoding process module with the following changes. The decoder also demultiplexes a compressed information bitstream. A decoded reference base mesh 1200 is taken as input of a motion decoder module 1210 together with the compressed motion information sub-bitstream. This decoded reference base mesh is selected from a buffer of previously decoded base mesh frames (by the intra decoder process for the first frame of a group of frames). The reconstruction of base mesh module 1250 takes the decoded reference base mesh and the decoded motion information as input to produce a decoded reconstructed quantized base mesh. All other processes are similar to the intra decoding process as shown in Figure 11. The signalling of the metadata and substreams produced by the encoder and ingested by the decoder was proposed as an extension of V3C in the technical submission to the dynamic mesh coding CfP, and should be considered as purely indicative for the moment. It is as follows and mainly consists in additional V3C unit header syntax, additional V3C unit payload syntax, and Mesh Intra patch data unit. V3C unit header syntax V3C unit payload syntax Mesh Intra patch data unit A refinement of the metadata and substreams signaling are as follows: - The output of the base mesh substream decoder is “base meshes”; - Each base mesh can have one or more submeshes. A “submesh” is a set of vertices, their connectivity and the associated attributes which can be decoded completely independently in a mesh frame; - Term “resampled base meshes” refers to an output of the mesh subdivision process. The inputs to the process are the base meshes (or sets of submeshes) and the information from the atlas data substream on how to subdivide/resample the meshes(submeshes). - “A displacement video” is the output of the displacement decoder. The inputs to the process are the decoded geometry video and the information from the atlas data substream on how to interpret/process this video. The displacement video contains displacement values to be added to the corresponding vertices. - “A facegroupId” is one of the attribute types assigned to each triangle face (or face of other shape) of the resampled base meshes. FacegroupId can be compared with the identifications of the subparts in a patch to determine the corresponding facegroups to the patch. If facegrouId is not conveyed through the base mesh substream decoder, it can be derived by the information in the atlas data substream. V3C unit Compressed base meshes are signalled in a new substream, named as the Base Mesh data substream (unit type V3C_MD). As with other v3c units, the unit type, and its associated v3c parameter set id and atlas id are signalled in the v3c_unit_header(). V3C parameter set extension A new extension needs to be introduced in the v3c_parameter_set syntax structure to handle V-DMC. Several new parameters are introduced in this extension including the following: ^ vps_ext_mesh_data_facegroup_id_attribute_present_flag equals 1 indicates that one of the attribute types present in the base mesh data stream is the facegroup Id. ^ vps_ext_mesh_data_attribute_count indicates the number of total attributes in the base mesh including both the attributes signalled through the base mesh data substream and the attributes signalled in the video sub streams (using ai_attribute_count). When vps_ext_mesh_data_facegroup_id_attribute_present_flag equals 1, it shall be greater or equal to ai_attribute_count+1. This can be constrained by profile/levels. ^ The types of attributes that are signalled through the base mesh substream and not through the video substreams are signalled are signaled as vps_ext_mesh_attribute_type data types. When vps_ext_mesh_data_facegroup_id_attribute_present_flag equals 1, one of the vps_ext_mesh_attribute_type must be a facegroup_id. ^ vps_ext_mesh_data_substream_codec_id indicates the identifier of the codec used to compress the base mesh data. This codec may be identified through the profiles a component codec mapping SEI message, or through means outside this document. ^ vps_ext_attribute_frame_width[i] and vps_ext_attribute_frame_height[i] indicate the corresponding with and height of the video data corresponding to the i-th attribute among the attributes signalled in the video substreams. Atlas sequence parameter set extension The information contained in this extension can be overwritten by the same information in the AFPS extension or the patch data units. The following parameters are introduced: ^ asps_vmc_ext_prevent_geometry_video_conversion_flag prevents the outputs of the geometry video substream decoder from being converted. When the flag is true, the outputs are used as they are without any conversion process from Annex B in [2]. When the flag is true, the size of geometry video shall be same as nominal video sizes indicated in the bitstream. ^ asps_vmc_ext_prevent_attribute_video_conversion_flag prevents the outputs of attribute video substream decoder from being converted. When the flag is true, the outputs are used as they are without any conversion process from Annex B in [2]. When the flag is true, the size of attribute video shall be same as nominal video sizes indicated in the bitstream. ^ asps_vmc_ext_subdivision_method and asps_vmc_ext_subdivision_iteration_count signal information about the subdivision method. ^ asps_vmc_ext_transform_index indicates the transform applied to the displacement. The transform index can indicate any transform is not applied. When the transform is LINEAR_LIFTING, the necessary parameters are signalled as vmc_lifting_transform_parameters. ^ asps_vmc_ext_patch_mapping_method indicates how to map a subpart of a submesh to a patch. o When asps_vmc_ext_patch_mapping_method is equal to 0, all the triangles (or faces of other shapes) in the corresponding submesh are associated with the current patch. In this case, there is only one patch associated with the submesh. o When asps_vmc_ext_patch_mapping_method is equal to 1, the subpart_ids are explicitly signalled in the mesh patch data unit to indicate the associated subparts. o In other cases, the faces in the corresponding submesh are divided into subparts by the method indicated by asps_vmc_ext_patch_mapping_method. ^ asps_vmc_ext_tjunction_removing_method indicates the method to remove t-junctions created by different subdivision methods or by different subdivision iterations of two faces sharing an edge. ^ asps_vmc_ext_num_attribute indicates the total number of attributes that the corresponding mesh carries. Its value shall be less or equal to vps_ext_mesh_data_attribute_count. ^ asps_vmc_ext_attribute_type is the type of the i-th attribute and it shall be one of ai_attribute_type_ids or vps_ext_mesh_attribute_types. ^ asps_vmc_ext_direct_atrribute_projection_enabled_flag indicates that the 2d locations where attributes are projected are explicitly signalled in the mesh patch data units. Therefore, the projection id and orientation index in V3C V-PCC ISO/IEC 23090-5:2021 can be also used as in ISO/IEC 23090- 5:2021. Atlas Frame Parameter set extension ^ afps_vmc_ext_single_submesh_in_frame_flag indicates there is only one submesh for the mesh frame ^ When afps_vmc_ext_overriden_flag in afps_vmc_extension() is true, the subdivision method, displacement coordinate system, transform index, transform parameters, and attribute transform parameters can be signalled again and the information ioverrides the one signalled in asps_vmc_extension(). ^ afps_vmc_ext_single_attribute_tile_in_frame_flag indicates there is only one tile for each attribute signalled in the video streams. ^ afps_ext_vmc_attribute_tile_information() contains the tile information for the attributes signalled through the video substreams. Atlas Tile Header A tile can be associated with one or more submeshes whose identification is ath_submesh_id. Patch data unit As with the V-PCC Patch data units, Mesh patch data units are signaled in the Atlas data substream. Mesh Intra patch data unit, Mesh Inter patch data unit, Mesh Merge patch data unit, and Mesh Skip patch data unit can be used. • mdu_submesh_id indicate which submesh the patch is associated with among those indicated in the atlas tile header. • mdu_vertex_count_minus1 and mdu_triangle_count_minus1 indicate the number of vertices and triangles associated with the current patch. • When asps_vmc_ext_patch_mapping_method is not 0, the syntax elements mdu_num_subparts and mdu_subpart_id are signalled. When asps_vmc_ext_patch_mapping_method is 1, the associated triangle faces are the union of the triangle faces whose facegroupId is equal to mdu_subpart_id. • When mdu_patch_parameters_enable_flag is true, the subdivision method, displacement coordinate system, transform index, transform parameters, and attribute transform parameters can be signalled again and the information overrides the corresponding information signalled in in asps_vmc_extension(). The signaling of the base mesh substream is also under investigation and is illustrated in Figure 13 in simplified and tentative manner. One of the key features of the current V-DMC specification design is the support for a base mesh signal that can be encoded using any currently or future specified static mesh codec. For example, such information could be coded using Draco 3D Graphics Compression. This representation could provide the basis for applying other decoded information to reconstruct the output mesh frame within the context of V-DMC. Furthermore, for coding dynamic mesh frames, it is highly desirable to be able to exploit any temporal correlation that may exist with previously coded base mesh frames. In Figures 13, 14, 15, where Figure 14 illustrates another example of a base- mesh encoder, and Figure 15 illustrates an example of a base-mesh decoder, this has been accomplished by encoding a mesh motion field instead of directly encoding the base mesh, and using this information and a previously encoded base mesh to reconstruct the base mesh of the current frame. This approach could be seen as the equivalent of inter prediction in video coding. It is desirable also to associate all coded base mesh frames or motion fields with information that could help determine their decoding output order as well as their referencing relationships. It is possible, for example, that better coding efficiency could be achieved if the coding order of all frames does not follow the display order or by using as reference for generating a motion field for frame N an arbitrary previously coded motion field or base mesh instead of the immediately previous coded one. Also highly desirable is the ability to instantly detect random access points and independently decode multiple sub-meshes that together can form a single mesh, much like subpictures in video compression. For all the above reasons, a new Base Mesh Substream format is introduced. This new format resembles a video coding format such as HEVC or the atlas sub- bitstream used in V3C, with the base mesh sub-bitstream also constructed using NAL units. High Level Syntax (HLS) structures such as base mesh sequence parameter sets, base mesh frame parameter sets, submesh layer are also specified. One of the desirable features of this design is the ability to segment a mesh into multiple smaller partitions, referred to in this document as submeshes. An example of this is illustrated in Figure 16. The submeshes shown in (b) can be decoded completely independently, which can help with partial decoding and spatial random access. Although it may not be a requirement for all applications, some applications may require that the segmentation in submeshes remains consistent and fixed in time. The submeshes do not need to use the same coding type, i.e., for one frame one submesh may use intra coding while for another inter coding could be used at the same decoding instance, but it is commonly a requirement that the same coding order is used and the same references are available for all submeshes corresponding at a particular time instance. Such restrictions can help guarantee proper random access capabilities for the entire stream. An example where two submeshes are used is shown in the Figure 17. NAL unit syntax As discussed earlier, the new bitstream is also based on NAL units, and it is similar to those of the atlas substream in V3C. The syntax is provided below. General NAL unit syntax NAL unit header syntax NAL unit semantics This section contains some of the semantics that correspond to the above syntax structures. More details would be provided for syntax elements that have not been defined in complete detail. General NAL unit semantics NumBytesInNalUnit specifies the size of the NAL unit in bytes. This value is required for decoding of the NAL unit. Some form of demarcation of NAL unit boundaries is necessary to enable inference of NumBytesInNalUnit. One such demarcation method is specified in Annex TBD for the sample stream format. Other methods of demarcation can be specified outside this document. It is to be notice that the mesh coding layer (MCL) is specified to efficiently represent the content of the mesh data. The NAL is specified to format that data and provide header information in a manner appropriate for conveyance on a variety of communication channels or storage media. All data are contained in NAL units, each of which contains an integer number of bytes. A NAL unit specifies a generic format for use in both packet-oriented and bitstream systems. The format of NAL units for both packet-oriented transport and sample streams is identical except that in the sample stream format specified in Annex TBD each NAL unit can be preceded by an additional element that specifies the size of the NAL unit. rbsp_byte[ i ] is the i-th byte of an RBSP. An RBSP is specified as an ordered sequence of bytes as follows: The RBSP contains a string of data bits (SODB) as follows: ^ If the SODB is empty (i.e., zero bits in length), the RBSP is also empty. ^ Otherwise, the RBSP contains the SODB as follows: 1)The first byte of the RBSP contains the first (most significant, left- most) eight bits of the SODB; the next byte of the RBSP contains the next eight bits of the SODB, etc., until fewer than eight bits of the SODB remain. 2)The rbsp_trailing_bits( ) syntax structure is present after the SODB as follows: i)The first (most significant, left-most) bits of the final RBSP byte contain the remaining bits of the SODB (if any). ii)The next bit consists of a single bit equal to 1 (i.e., rbsp_stop_one_bit). iii)When the rbsp_stop_one_bit is not the last bit of a byte- aligned byte, one or more bits equal to 0 (i.e. instances of rbsp_alignment_zero_bit) are present to result in byte alignment. Syntax structures having these RBSP properties are denoted in the syntax tables using an "_rbsp" suffix. These structures are carried within NAL units as the content of the rbsp_byte[ i ] data bytes. The association of the RBSP syntax structures to the NAL units is as specified in Table 4 of ISO/IEC 23090-5. It is to be noticed that when the boundaries of the RBSP are known, the decoder can extract the SODB from the RBSP by concatenating the bits of the bytes of the RBSP and discarding the rbsp_stop_one_bit, which is the last (least significant, right-most) bit equal to 1, and discarding any following (less significant, farther to the right) bits that follow it, which are equal to 0. The data necessary for the decoding process is contained in the SODB part of the RBSP. NAL unit header semantics Similar NAL unit types, as for the atlas case, were defined for the base mesh enabling similar functionalities for random access and segmentation of the mesh. Unlike the atlas that is split into tiles, in this document we define the concept of a sub-mesh and define specific nal units that correspond to coded mesh data. In addition, NAL units that can include metadata such as SEI messages are also defined. In particular, the base mesh NAL unit types supported are specified as follows: Raw byte sequence payloads, trailing bits, and byte alignment syntax Base mesh sequence parameter set RBSP syntax As with similar bitstreams, the primary syntax structure that is defined for a base mesh bitstream is a sequence parameter set. This syntax structure contains basic information about the bitstream, identifying features for the codecs supported for either the intra coded and inter coded meshes, as well as information about references General base mesh sequence parameter set RBSP syntax ^ bmsps_intra_mesh_codec_id indicates the static mesh codec used to encode the base meshes in this base mesh substream. It could be associated with a specific mesh or motion mesh codec through the profiles specified in the corresponding specification or could be explicitly indicated with an SEI message as is done in the V3C specification for the video sub-bitstreams. ^ bmsps_intra_mesh_data_size_precision_bytes_minus1 (+1) specifies the precision, in bytes, of the size of the coded mesh data. ^ bmsps_inter_mesh_codec_present_flag indicates if a specific codec indicated by bmsps_inter_mesh_codec_id is used to encode the inter predicted submeshes ^ bmsps_inter_mesh_data_size_precision_bytes_minus1(+1) specifies the precision, in bytes, of the size of the inter predicted mesh data. This precision is signalled considering the size of the coded mesh data and the inter predicted mesh data(e.g. motion field) can be significantly different. ^ bmsps_facegroup_segmentation_method indicates how facegroups could be derived for a mesh. A facegroup is a set of triangle faces (or faces of other shapes) in a submesh. Each triangle face is associated with a FacegroupId indicating the facegroup it belongs to. When bmsps_facegroup_segmentation_method is 0, then FacegroupId is present directly in the coded submesh. Other values indicate that the facegroup can be derived using different methodologies based on the characteristics of the stream. For example, value 1 means that there is no FacegroupId associated with any face. A value 2 means that all faces are identified with a single ID, a 3 that facegroups are identified based on the connected component method, while a value of 4 indicates that each individual face has its own unique ID. Currently ue(v) is used to indicate bmsps_facegroup_segmentation_method, but fixed length coding or partitioning to more elements could have been used instead. Base Mesh Profile, tier, and level syntax ^ bmptl_extended_sub_profile_flag providing support for sub profiles can be quite useful for further restricting the base mesh profiles depending on usage and applications. Base mesh frame parameter set RBSP syntax The base mesh frame parameter set has the frame level information such as number of submeshes in the frames corresponding to one mfh_mesh_frm_order_cnt_lsb. A submesh is coded in one mesh_data_submesh_layer() and is independently decodable from other submeshes. In the case of inter frame prediction, a submesh can refer only to the submeshes with the same smh_id in its associated reference frames. The mechanism is equivalent to what is specified in 8.3.6.2.2 in V3C. Base mesh submesh layer rbsp syntax bmesh submesh layer rbsp syntax A bmesh_submesh_layer contains a submesh information. One or more bmesh_submesh_layer_rbsp can correspond to one mesh frame indicated by mfh_mesh_frm_order_cnt_lsb. submesh header syntax ^ smh_id is the id of the current submesh contained in the mesh data submesh data. ^ smh_type indicates how the mesh is coded. If smh_type is I_SUBMESH, the mesh data is coded with the indicated static mesh codec. If smh_type is P_SUBMESH, inter prediction is used to code the mesh data. Submesh data unit smdu_intra_sub_mesh_unit( unitSize ) contains a sub mesh unit stream of size unitSize, in bytes, as an ordered stream of bytes or bits within which the locations of unit boundaries are identifiable from patterns in the data. The format of such sub mesh unit stream is identified by a 4CC code as defined by bmptl_profile_codec_group_idc or by a component codec mapping SEI message. smdu_inter_sub_mesh_unit( unitSize ) contains a sub mesh unit stream of size unitSize, in bytes, as an ordered stream of bytes or bits within which the locations of unit boundaries are identifiable from patterns in the data. The format of such sub mesh unit stream is identified by a 4CC code as defined by bmptl_profile_codec_group_idc or by a component codec mapping SEI message. The current basis for the V-DMC test model uses a subdivision method to interpolate the base mesh connectivity and obtain displacement vectors. These displacement vectors can be filtered by a wavelet transform to compress displacement vectors. These vectors are necessary to reconstruct a deformed mesh that provides higher fidelity than the reconstructed base mesh. Following a mesh spatial scalability philosophy, several scales or levels of details (LODs) are defined based on iterative “midpoint” subdivision, and displacements are computed for each vertex of the mesh at each scale. “Midpoint” subdivision based interpolation combined with the “linear” wavelet filter leads to relatively small and sparse displacement vectors at the third scale, and depending on the mesh content, also at the second scale. However, for the first scale, for the chosen base mesh resolutions and target rates, it can be observed that the midpoint subdivision leads to large residuals that are difficult to compress with video coding. This can be explained by the fact that the deformed mesh is much smoother at higher scales that the base mesh itself, while at the first scale, displacement vectors are large and they are not necessarily well predicted by the midpoint subdivision combined with the “linear” wavelet filter. Figure 18 illustrates an example of displacement values distribution per LOD. Furthermore, enabling an encoder to choose between different subdivisions methods and wavelet transform parameters per patch or even per face, opens opportunities for improving prediction quality and therefore displacement residual encoding. Figure 19 shows that depending on the choice of subdivision method, the results may differ significantly. “Midpoint” 1910 generates coarser predictions but achieves good results on flat areas or strong edges (it is also the one with lowest processing complexity), “Loop” 1920 generates a smoother, more convex predicted shape that achieves high quality on smooth convex areas but less on strong edges or convex parts, “Butterfly” 1930 achieves highest fidelity but comes with the largest processing complexity. It is realized that by using BBM 1940, i.e., adaptively applying “Butterfly” in LODs 1 and 2, and then “Midpoint” on LOD3, generates a very similar quality with respect to Butterfly on all LODs, but with a reduction of the processing complexity. From the shape of the mesh, and error colors per vertex, it can be seen that one subdivision method can achieve better performance on some areas than others, opening possibilities to find a number of trade-offs between processing complexity and quality of the displacement predictions. Same can be said for the wavelet transform choice when interpolating signals on the subdivided shapes. In the V-DCM Test model, such interpolation is used to obtain normal directions (3D vector signal defined on the mesh vertices) from a LOD level to another one, but also to predict and update displacement vectors (3D vector signal defined on the mesh vertices) using the lifting scheme prior to image packing and compression. The extended encoder may choose the same or different predictors and updates for the subdivision, and wavelet transform as in a rate- distortion optimization used by 2D video encoders by selecting different prediction modes based on the content and target bitrates. Results also depend on the initial decimation factor and algorithm used to generate the base mesh. The reduction factor was set to a target reduction of faces to 30% of the initial face count in Figure 19. In Figure 20, the target face count reduction was coarser and set to 12.5%. In that case, the “Loop” 2020 subdivision filter does not behave as well as “Midpoint” 2010 and “Butterfly” 2030. The example of Figure 20 demonstrates that it is important to allow the encoder to optimize the subdivision method and wavelet filter choice. From the results on Figure 19 and Figure 20, it can be observed that the displacement vector amplitude generated by any of the subdivision methods is not “smooth” at any LOD, meaning that displacements may not be well predicted by a “linear” wavelet filter on the head region, while it seems more predictable in smoother areas such as the torso. This also shows that the choice of the subdivision method and wavelet filter can also be optimized locally depending on the region smoothness by the encoder to improve the quality of the predictions. The present embodiments are targeted to signalling of adaptive subdivision method and adaptive wavelet transform, where weights and masks may be selected differently from each other, and/or where the transform may be set differently at each base mesh face, at each Level of Detail (LOD), at patch data unit level, frame parameter level or sequence parameter level. Also, the present embodiments are targeted to decoding process that determines the subdivision method, wavelet transform type and weights for each base mesh face, for each LOD, for each patch, for each frame and/or for the sequence. Also, the present embodiments are targeted to an encoding process that finds the best subdivision method and wavelet transform types per base mesh face, per LOD, per patch, frame or sequence While the V-DMC Test model enables to set different quantization steps per LOD, it does not foresee or enable to modify neither the subdivision method nor the wavelet transform itself at each LOD level or at base mesh face level, which enables more fine-grained, optimal predictions. Quantization helps reducing rate but comes with quality losses, while a better, more adaptive prediction can reduce displacements at lower LODs before per-LOD quantization, which is compatible with the present embodiments. According to an embodiment, the subdivision methods and wavelet transforms can be independently chosen from a set of the following weights and vertex neighborhood masks (but not restricted to this set). Subdivision methods and wavelet transforms Loop subdivision and wavelet filters are illustrated on Figure 21. In the example of Figure 21, it is realized that for interior vertex, Midpoint requires a neighborhood of two vertices, Loop requires four neighbours and Butterfly eight neighbours. When the vertex is a boundary vertex, different weights are used. Furthermore, Butterfly weights need to be modified for extraordinary vertices, i.e., such vertices that have less or more than exactly six neighbours. Mid-point Midpoint subdivision method is the one included in the V-DMC test model and uses the simplest weights. Generating a vertex x at LOD i+1, knowing its neighbor vertices a and b in LOD i is given by Equation 1: ) ^ ^ ^^^^^^^^_^^^^^^^^^^^ ( ^ = ^ ^ + ^ ^, (Eq.1) The linear lifting scheme based on midpoint weights is as follows, the prediction step uses weights as defined in Equation 1, and an update step as defined in Equation 2. More precisely, the midpoint update filter for a vertex x in LOD i+1 knowing its predicted neighbor vertices vi in LOD i+1, for example as follows: where ^ is the neighborhood of x in LOD i+1. Loop The loop subdivision method includes an initialization step, and a prediction step. In the initialization step, the vertices of the previous LOD (sometimes called computation of even vertices) are repositioned as follows: ^^^^_^^^^^^^^^^^^^^ ( ^ ) = ^^ + ^ ^^^ ^ ^^ ^ ^ , (Eq.3) For boundary edges, the initialization may be set as ^ ^ ^ x = ^ ^ + ^ ^ + ^ ^ (Eq.5) Then the loop prediction (also called computation of odd vertices) of an interior vertex (a vertex that has 6 neighbors) x at LOD i+1, given vertices a, b, to which x will be connected by an edge, and vertices c and d that belong to the triangles containing the edge (a,b), in LOD i, may be as follows: ( ) ^ ^ ^ ^ ^^^^_^^^^^^^^^^^ ^ = ^ ^ + ^ ^ + ^ ^ + ^ ^ (Eq.6) Compared to the midpoint prediction, the loop prediction includes a smaller contribution from neighboring vertices that are not directly connected on the same edge. In case the angle between the normals of triangles abc and abd is larger than a given threshold, the midpoint prediction can optionally be preferred. In case the edge ab is a boundary edge, then the midpoint predictor is used. For the lifting scheme, or a loop-based wavelet transform, Equation 6 may be used as predictor and one or more of the Equations 3, 4, 5 may be used for update, or even Equation 2 can be used for the update filter. Given the fact the main problem is to compress efficiently signals defined on the subdivided mesh (normal, displacements…) several prediction and filter weights and masks are possible and can be tested by the encoder. Butterfly The Butterfly subdivision method does not include any initialization step. Prediction of a vertex x at LOD i+1 given its neighbors from the previous LOD i as illustrated on Figure 21 is given by In case of a strong edge (threshold on the angle between normal of triangles abc and abd) or boundary edge ab, the prediction may be given by Where a, b, c, d may be defined as on Figure 25. For the lifting scheme and wavelet transform, Equations 7 and 8 can be used for prediction filter, while update filter can be for example the one in Equation 2. Linear lifting transform weights and masks Linear lifting transform uses a similar prediction as the midpoint subdivision method, i.e., given a signal at vertex x signal(x) at LOD i+1, and the signal at neighboring vertices a and b at LOD i, such that a and b are connected by an edge ab in LOD i, thus obtaining ^ ^ Linear_lifting_prediction_signal(x) = ^ signal(a) + ^ signal(b) where signal can be the displacement vector for example, or any attribute defined on vertices. Similarly to the Linear lifting transform, predictors can be made for Loop lifting transform and Butterfly lifting transform by using the same neighborhood masks as illustrated on Figure 21. The update filter can be set similarly as in the midpoint subdivision for linear and Butterfly predictors or Loop subdivision for the Loop prediction as described before. Encoder processing Preprocessor - Decimation According to an embodiment, an encoder (for example an encoder as illustrated on Figures 7, 8, 9 and 13) preprocessor (Figure 7) may first generate a base mesh for a mesh sequence frame with a target reduction face count that is optimized with respect to a target bitrate, for example by setting a base mesh compressed bitrate as smaller or equal to a predefined ratio of the total target bitrate. This can be achieved in an iterative way for example with a gradient descent on the decimation parameters and obtained compression rate. Decimation can also be tuned by the preprocessor to maintain e.g., a higher or lower density in some areas, based on features of the content or the original density of the input mesh frame. This will also impact the subsequent subdivision density in that area. Preprocessor - Subdivision The Encoder preprocessor may set a target subdivision iteration count to reach a comparable target reduction face count as in the original mesh frame. The Encoder preprocessor may test one or a multiple set of subdivision methods, for example in brute force, or with a subset of subdivision methods that are less complex if the target devices require lower complexity processing. The Encoder preprocessor may compute the displacement vectors for the tested subdivision methods for each LOD defined by the target subdivision iteration count. For each subdivision method tested, the preprocessor may assign a subdivision method identification (id) to faces of the base mesh for which the corresponding subdivision method leads to higher quality. If the results are similar, the lowest complexity method is preferred. Once the base mesh face subdivision ids have been obtained, a segmentation algorithm may be applied to produce base mesh face clusters that are sets of connected faces, which share the same subdivision method id. Optionally, a merge step can be applied to avoid isolated faces or small cluster of faces inside large clusters of faces by adapting the face subdivision method id. According to another embodiment, the preprocessor may generate a set of subdivided meshes based on different subdivision methods as described before but not limited to those, including different subdivisions based on the density of selected face clusters. In a next step, the encoder selects the best subdivision method per mesh, per LOD and per face cluster based on: ^ quality criteria such as the average and maximal length of the displacement vectors ^ smoothness (predictability of a displacement vector based on its neighbors at the previous or same LOD ^ triangle shapes on the face cluster (number of triangles discarded because they are too elongated based on a threshold) ^ a combination of the above ^ etc. Encoder – computation of displacements based on reconstructed base mesh Once the clusters have been obtained, the encoder may ingest the base mesh, cluster information, subdivision ids, and displacements vectors. The Encoder may compress the base mesh and reconstruct it, apply the defined subdivision method per face cluster and adapt precomputed displacements. Encoder – wavelet filters/lifting scheme As discussed with respect to Figure 5, a deformed mesh is obtained by adding displacement vectors to the subdivided mesh vertices. According to an embodiment, the encoder may test several wavelet filters and weights for each vertex of each face of each LOD of the deformed mesh on the displacement vectors. The encoder may assign the wavelet transform id corresponding to the smallest residual to the vertex. According to an embodiment, the encoder may then map the per-vertex wavelet transform to their corresponding LOD of the deformed mesh faces using a majority rule or the lowest produced residual in case a vertex wavelet transform id needs to be adapted. Then according to an embodiment, the encoder may assign to each base mesh id an id that describes the wavelet filters that are to be used for each LOD generated face. This can be done by setting a unique wavelet transform per base mesh face, or a code that specifies in clockwise or anti-clockwise manner, which wavelet transform id needs to be changed per LOD face. According to another embodiment, the Encoder may assign a wavelet transform id per LOD for each base mesh face. According to another embodiment, the Encoder may create patches with faces of a base mesh face cluster that share the same subdivision method and wavelet filter information. According to another embodiment, the Encoder may group faces that are not included in patches in one or several patches that encode additional refined data per face and LOD. The Encoder may encode the signalling of subdivision method and wavelet filter along the output bitstream The Encoder may perform the same operations per frame for All Intra configurations. In Random Access configurations, in one embodiment, the Encoder groups mesh frames of one group of pictures and analyses subdivision method and wavelet transform decisions per corresponding face and vertex, for each LOD, to select the same corresponding decision for each frame. In another embodiment, the Encoder optimizes each mesh frame independently. Decoder processing According to an embodiment, the decoder decodes all provided bitstreams, including the patch metadata, and performs per base mesh face and per corresponding LOD faces the subdivision method signaled in the bitstream together with the encoded displacements and corresponding wavelet transform method signaled in the bitstream. Each LOD is reconstructed iteratively by performing the subdivision of the base mesh first, applying the inverse wavelet transform to decoded displacements residuals and apply the reconstructed displacements to generate the first LOD 1, then performing the subdivision of the reconstructed LOD 1, applying the signaled inverse wavelet transform to decoded displacements residuals and apply the reconstructed displacements to generate the LOD 2, etc. Signalling information Embodiments The subdivision method can be signaled at various levels; asps, afps, patch data unit or mesh intra level. For example, at asps level, ^ asps_vmc_ext_subdivision_method and asps_vmc_ext_subdivision_iteration_count signal information about the subdivision method. When 3 is selected (ADAPTIVE), the subdivision method is potentially modified at afps, patch data, or mesh intra levels. ^ asps_vmc_ext_transform_index indicates the transform applied to the displacement. The transform index can indicate any transform is not applied. When the transform is LINEAR_LIFTING, LOOP_LIFTING, BUTTERFLY_LIFTING, ADAPTIVE the necessary parameters are signalled as vmc_lifting_transform_parameters. The lifting transform parameters can be re-used to signal the transform parameters at various levels vmc_transform_log2_lifting_update_mask_size[attributeIndex][ ltpIndex ] indicates the number of weights used by the lifting update filter. vmc_transform_log2_lifting_update_weights[attributeIndex][ ltpIndex ] [i] indicates the ith weight of the update filter with i in the range 0 and vmc_transform_log2_lifting_update_mask_size. vmc_transform_log2_lifting_prediction_mask_size[attributeInd ex][ ltpIndex ] indicates the number of weights used by the lifting update filter. vmc_transform_log2_lifting_prediction_weights[attributeIndex ][ ltpIndex ] [i] indicates the ith weight of the update filter with i in the range 0 and vmc_transform_log2_lifting_prediction_mask_size. vmc_transform_log2_lifting_update_mask_size_boundary_vertice s [attributeIndex][ ltpIndex ] indicates the number of weights used by the lifting update filter for boundary vertices. vmc_transform_log2_lifting_update_weights_boundary_vertices [attributeIndex][ ltpIndex ] [i] indicates the ith weight of the update filter with i in the range 0 and vmc_transform_log2_lifting_update_mask_size_boundary_vertice s for boundary vertices. vmc_transform_log2_lifting_prediction_mask_size_boundary_ver tices [attributeIndex][ ltpIndex ] indicates the number of weights used by the lifting update filter for boundary vertices.. vmc_transform_log2_lifting_prediction_weights_boundary_verti ces [attributeIndex][ ltpIndex ] [i] indicates the ith weight of the update filter with i in the range 0 and vmc_transform_log2_lifting_prediction_mask_size_boundary_ver tices for boundary vertices. Vmc_transform_boundary_vertex_angular_threshold indicates in radians the anglar threshold for which a vertex is considered as boundary vertices. The angular deviation is computed by the angle difference between face normal of the edge on which the vertex is obtained by subdivision. The order of the weights corresponds to a canonical way of representing the vertex neighborhood mask as illustrated on Figure 21. In one embodiment the vmc_subdivision_parameters are introduced to signal weights used for generating the subdivision vmc_subdivision_method_skip_even_vertices_comp[attributeInde x] [lodIndex] if true, indicates if the subdivision method skips the computation of modified positions for even vertices. vmc_subdivision_log2_even_vertices_weight[attributeIndex][lo dIndex] provides the weights mask used for modifying even vertices positions in case vmc_subdivision_method_skip_even_vertices_comp is false. vmc_subdivision_log2_odd_vertices_weight[attributeIndex] [lodIndex] provides the weights mask used for computing odd vertices positions. vmc_subdivision_log2_even_vertices_mask_size[attributeIndex] [ ltpIndex ] indicates the number of weights used by the lifting update filter. vmc_subdivision_log2_even_vertices_weights[attributeIndex][ ltpIndex ] [i] indicates the ith weight of the update filter with i in the range 0 and vmc_subdivision_log2_even_vertices_mask_size. vmc_subdivision_log2_odd_vertices_mask_size[attributeIndex][ ltpIndex ] indicates the number of weights used by the lifting update filter. vmc_subdivision_log2_odd_vertices_weights[attributeIndex][ ltpIndex ] [i] indicates the ith weight of the update filter with i in the range 0 and vmc_subdivision_log2_odd_vertices_mask_size. vmc_subdivision_log2_even_vertices_mask_size_boundary_vertic es [attributeIndex][ ltpIndex ] indicates the number of weights used by the lifting update filter for boundary vertices. vmc_subdivision_log2_even_vertices_weights_boundary_vertices [attributeIndex][ ltpIndex ] [i] indicates the ith weight of the update filter with i in the range 0 and vmc_subdivision_log2_even_vertices_mask_size_boundary_vertic es for boundary vertices. vmc_subdivision_log2_odd_vertices_mask_size_boundary_vertice s [attributeIndex][ ltpIndex ] indicates the number of weights used by the lifting update filter for boundary vertices. vmc_subdivision_log2_odd_vertices_weights_boundary_vertices [attributeIndex][ ltpIndex ] [i] indicates the ith weight of the update filter with i in the range 0 and vmc_subdivision_log2_odd_vertices_mask_size_boundary_vertice s for boundary vertices. Vmc_subdivision_boundary_vertex_angular_threshold indicates in radians the angular threshold for which a vertex is considered as boundary vertices. The angular deviation is computed by the angle difference between face normal of the edge on which the vertex is obtained by subdivision. The method for encoding according to an embodiment is shown in Figure 22. The method generally comprises receiving 2205 a dynamic three-dimensional (3D) mesh sequence frame, wherein the three-dimensional mesh represents a three- dimensional object; generating 2210 a base mesh for the mesh sequence frame, wherein the base mesh has a target reduction count optimized for target bitrate, and wherein the base mesh is represented by a plurality of faces, each face having a closed set of edges; setting 2215 a target subdivision iteration count to reach the target output resolution; running 2220 a set of subdivision methods for each face of the base mesh, and determining a set of displacement vectors for each level-of- detail defined by the target subdivision iteration count; assigning 2225 an identification for a face of the base mesh, which identification identifies a subdivision method that results in a defined quality for the respective face; forming 2230 clusters of sets of connected faces sharing the same subdivision method identification; generating 2235 residual samples by applying a wavelet transform to each displacement vector; encoding 2240 the residual samples in a displacement video bitstream; and encoding 2245 the base mesh, information on face clusters, and subdivision method identifications into respective bitstreams. Each of the steps can be implemented by a respective module of a computer system. An apparatus according to an embodiment comprises means for receiving a dynamic three-dimensional (3D) mesh sequence frame, wherein the three- dimensional mesh represents a three-dimensional object; means for generating a base mesh for the mesh sequence frame, wherein the base mesh has a target reduction count optimized for target bitrate, and wherein the base mesh is represented by a plurality of faces, each face having a closed set of edges; means for setting a target subdivision iteration count to reach the target output resolution; means for running a set of subdivision methods for each face of the base mesh, and means for determining a set of displacement vectors for each level-of-detail defined by the target subdivision iteration count; means for assigning an identification for a face of the base mesh, which identification identifies a subdivision method that results in a defined quality for the respective face forming clusters of sets of connected faces sharing the same subdivision method identification; means for generating residual samples by applying a wavelet transform to each displacement vector, means for encoding the residual samples in a displacement video bitstream; and means for encoding the base mesh, information on face clusters, and subdivision method identifications residuals into respective bitstreams. The means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry. The memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Figure 22 according to various embodiments. The method for decoding according to an embodiment is shown in Figure 23. The method generally comprises receiving 2310 one or more bitstreams; decoding 2320 base mesh and information on face clusters from said one or more bitstreams; decoding 2325 subdivision method identifications from said one or more bitstreams; decoding 2330 information on wavelet transform from said one or more bitstreams; decoding 2340 a displacement video bitstream from said one or more bitstreams, the displacement video comprising residual samples; generating 2350 a set of displacement vectors from the residual samples based on the extracted information on the wavelet transform; reconstructing 2360 each level-of-detail base mesh according to the decoded base mesh, subdivision method identifications; reconstructing 2370 output mesh from the reconstructed level of detail base mesh, the set of displacement vectors, and information on face clusters. Each of the steps can be implemented by a respective module of a computer system. An apparatus according to an embodiment comprises means for receiving one or more bitstreams; means for decoding base mesh and information on face clusters from said one or more bitstreams; means for decoding subdivision method identifications from said one or more bitstreams; means for decoding information on wavelet transform from said one or more bitstreams; means for decoding a displacement video bitstream from said one or more bitstreams, the displacement video comprising residual samples; means for generating a set of displacement vectors from the residual samples based on the extracted information on the wavelet transform; means for reconstructing each level-of-detail base mesh according to the decoded base mesh, subdivision method identifications; means for reconstructing output mesh from the reconstructed level of detail base mesh, the set of displacement vectors, and information on face clusters. The means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry. The memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Figure 23 according to various embodiments. An example of an apparatus is disclosed with reference to Figure 24. Figure 24 shows a block diagram of a video coding system according to an example embodiment as a schematic block diagram of an electronic device 50, which may incorporate a codec. In some embodiments the electronic device may comprise an encoder or a decoder. The electronic device 50 may for example be a mobile terminal or a user equipment of a wireless communication system or a camera device. The electronic device 50 may be also comprised at a local or a remote server or a graphics processing unit of a computer. The device may be also comprised as part of a head-mounted display device. The apparatus 50 may comprise a display 32 in the form of a liquid crystal display. In other embodiments of the invention the display may be any suitable display technology suitable to display an image or video. The apparatus 50 may further comprise a keypad 34. In other embodiments of the invention any suitable data or user interface mechanism may be employed. For example, the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display. The apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input. The apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection. The apparatus 50 may also comprise a battery (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator). The apparatus may further comprise a camera 42 capable of recording or capturing images and/or video. The camera 42 may be a multi-lens camera system having at least two camera sensors. The camera is capable of recording or detecting individual frames which are then passed to the codec 54 or the controller for processing. The apparatus may receive the video and/or image data for processing from another device prior to transmission and/or storage. The apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50. The apparatus or the controller 56 may comprise one or more processors or processor circuitry and be connected to memory 58 which may store data in the form of image, video and/or audio data, and/or may also store instructions for implementation on the controller 56 or to be executed by the processors or the processor circuitry. The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of image, video and/or audio data or assisting in coding and decoding carried out by the controller. The apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC (Universal Integrated Circuit Card) and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network. The apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system, or a wireless local area network. The apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es). The apparatus may comprise one or more wired interfaces configured to transmit and/or receive data over a wired connection, for example an electrical cable or an optical fiber connection. The various embodiments may provide advantages. For example, the present embodiments allow for significant reductions in required bitrate for dynamic mesh compression, while keeping the reconstruction quality at a very high level. The various embodiments can be implemented with the help of computer program code that resides in a memory and causes the relevant apparatuses to carry out the method. For example, a device may comprise circuitry and electronics for handling, receiving, and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment. Yet further, a network device like a server may comprise circuitry and electronics for handling, receiving, and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of various embodiments. If desired, the different functions discussed herein may be performed in a different order and/or concurrently with other. Furthermore, if desired, one or more of the above-described functions and embodiments may be optional or may be combined. Although various aspects of the embodiments are set out in the independent claims, other aspects comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims. It is also noted herein that while the above describes example embodiments, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications, which may be made without departing from the scope of the present disclosure as, defined in the appended claims.