ADAPTIVE DISPLACEMENT PACKING FOR DYNAMIC MESH CODING

Title:

ADAPTIVE DISPLACEMENT PACKING FOR DYNAMIC MESH CODING

Document Type and Number:

WIPO Patent Application WO/2024/084326

Kind Code:

Abstract:

An example apparatus includes: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: use projection of subdivision points to a geometrical surface; package values of the subdivision points into a two dimensional (2D) box; determine whether multiple subdivision points map on same pixel; and merge or omit some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel.

Inventors:

MARTEMIANOV ALEKSEI (FI)
RONDAO ALFACE PATRICE (BE)
KONDRAD LUKASZ (DE)
ILOLA LAURI ALEKSI (FI)
SCHWARZ SEBASTIAN (DE)

Application Number:

PCT/IB2023/060065

Publication Date:

April 25, 2024

Filing Date:

October 06, 2023

Export Citation:

Click for automatic bibliography generation Help

Assignee:

NOKIA TECHNOLOGIES OY (FI)

International Classes:

G06T9/00; H04N19/463; H04N19/597

Other References:

KHALED MAMMOU (APPLE) ET AL: "[V-CG] Apple's Dynamic Mesh Coding CfP Response", no. m59281, 29 April 2022 (2022-04-29), XP030301431, Retrieved from the Internet [retrieved on 20220429]

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

What is claimed is:

1. An apparatus comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: use projection of subdivision points to a geometrical surface; package values of the subdivision points into a two dimensional (2D) box; determine whether multiple subdivision points map on same pixel; and merge or omit some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel.

2. The apparatus of claim 1, wherein the apparatus is caused to: determine the size of the 2D box based on the number of the subdivision points; or arbitrarily set the size of the 2D box.

3. The apparatus of any of the previous claims, wherein the subdivision points comprise displacement vectors.

4. The apparatus of any of the previous claims, wherein the apparatus is caused to: perform a subdivision process to generate subdi visional points; compute a displacement between a base mesh and a surface of an original mesh; and encode each of the subdivision points into a separate frame when a wavelet transform is used; or encode the subdivision points into a single frame when wavelet transform is not used.

5. The apparatus of claim 4, wherein the apparatus is caused to: signal the encoded subdivision points to a decoder.

6. The apparatus of any of the previous claims, wherein the apparatus is caused to: calculate 2D positions of displacement coefficients inside a packed frame by using a projection to the geometrical surface surrounding an encoded mesh; and reduce a number of displacement values by eliminating unused points into the packed frame of a predefined size.

7. The apparatus of any of the previous claims, wherein the apparatus is caused to: signal a packing mode used to package the values of the subdivision points; and/or signal a type of the geometrical surface.

8. The apparatus of claim 7, wherein the apparatus is caused to use a packing mode field to indicate the packing mode used to package the values of the subdivision points.

9. The apparatus of claim 7, wherein the apparatus is caused to use a projection primitive field to indicate the type of the geometrical surface.

10. The apparatus of any of the previous claims, wherein the apparatus is caused to: signal a position, a rotation angle, and/or a scale of the geometrical surface to the decoder.

11. The apparatus of claim 10, wherein the apparatus is caused to use following to signal the position, the rotation angle, and/or the scale of the geometrical surface: a delta x field to indicate position X of the geometrical surface as deltaX to central point of a base-mesh; a delta y field to indicate position Y of the geometrical surface as deltaY to central point of the base-mesh; a delta z field to indicate position X of the geometrical surface as deltaZ to central point of the base-mesh; rotation angle field to indicate the rotation angle of the geometrical surface to a central axis; and projection scale filed to indicate the scale factor of the geometrical surface.

12. The apparatus of claim 11, wherein the apparatus is caused to: use update enabled flag to indicate that a displacement projection semantics is capable of being updated in an atlas frame parameter set.

13. An apparatus comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive a bitstream comprising metadata; extract the metadata; use the extracted metadata to assign displacement pixel positions to vertex indices in order to apply decoded displacements to a mesh comprising subdivisional points; and generate an output reconstructed displaced mesh based at least on the assignment.

14. The apparatus of claim 13, wherein the subdivision points comprise displacement vectors.

15. The apparatus of any of the previous claims, wherein the apparatus is caused to: compute a displacement between a base mesh and a surface of an original mesh; and decode each of the subdivision points into a separate frame when a wavelet transform is used; or decode the subdivision points into a single frame when wavelet transform is not used.

16. The apparatus of any of the previous claims, wherein the apparatus is caused to: calculate 2D positions of displacement coefficients inside a packed frame by using a projection to the geometrical surface surrounding a decoded mesh; and reduce a number of displacement values by eliminating unused points into the packed frame of a predefined size.

17. The apparatus of any of the previous claims, wherein the apparatus is caused to: receive a packing mode used to package the values of the subdivision points; and/or receive a type of the geometrical surface.

18. The apparatus of any of the previous claims, wherein the apparatus is caused to: receive a position, a rotation angle, and/or a scale of the geometrical surface.

19. The apparatus of claim 18, wherein the position, the rotation angle, and/or the scale of the geometrical surface are received as following: a delta x field to indicate position X of the geometrical surface as deltaX to central point of a base-mesh; a delta y field to indicate position Y of the geometrical surface as deltaY to central point of the base-mesh; a delta z field to indicate position X of the geometrical surface as deltaZ to central point of the base-mesh; rotation angle field to indicate the rotation angle of the geometrical surface to a central axis; and projection scale filed to indicate the scale factor of the geometrical surface.

20. The apparatus of any of the previous claims, wherein the apparatus is further caused to: merge or omit some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel. 1. A method comprising: using projection of subdivision points to a geometrical surface; packaging values of the subdivision points into a two dimensional (2D) box; determining whether multiple subdivision points map on same pixel; and merging or omitting some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel.

22. The method of claim 21 further comprising: determining the size of the 2D box based on the number of the subdivision points; or arbitrarily setting the size of the 2D box.

23. The method of any of the claims 21 to 22, wherein the subdivision points comprise displacement vectors.

24. The method of any of the claims 21 to 23 further comprising: performing a subdivision process to generate subdivisional points; computing a displacement between a base mesh and a surface of an original mesh; and encoding each of the subdivision points into a separate frame when a wavelet transform is used; or encoding the subdivision points into a single frame when wavelet transform is not used.

25. The method of claim 24 further comprising: signaling the encoded subdivision points to a decoder.

26. The method of any of the claims 21 to 25 further comprising: calculating 2D positions of displacement coefficients inside a packed frame by using a projection to the geometrical surface surrounding an encoded mesh; and reducing a number of displacement values by eliminating unused points into the packed frame of a predefined size.

27. The method of any of the claims 21 to 26 further comprising: signaling a packing mode used to package the values of the subdivision points; and/or signaling a type of the geometrical surface.

28. The method of claim 27 further comprising using a packing mode field to indicate the packing mode used to package the values of the subdivision points.

29. The method of claim 27 further comprising using a projection primitive field to indicate the type of the geometrical surface.

30. The method of any of the claims 21 to 29 further comprising: signaling a position, a rotation angle, and/or a scale of the geometrical surface to the decoder.

31. The method of claim 30 further comprising using following to signal the position, the rotation angle, and/or the scale of the geometrical surface: a delta x field to indicate position X of the geometrical surface as deltaX to central point of a base-mesh; a delta y field to indicate position Y of the geometrical surface as deltaY to central point of the base-mesh; a delta z field to indicate position X of the geometrical surface as deltaZ to central point of the base-mesh; rotation angle field to indicate the rotation angle of the geometrical surface to a central axis; and projection scale filed to indicate the scale factor of the geometrical surface.

32. The method of claim 31 further comprising: using update enabled flag to indicate that a displacement projection semantics is capable of being updated in an atlas frame parameter set.

33. A method comprising: receiving a bitstream comprising metadata; extracting the metadata; using the extracted metadata to assign displacement pixel positions to vertex indices in order to apply decoded displacements to a mesh comprising subdivisional points; and generating an output reconstructed displaced mesh based at least on the assignment.

34. The method of claim 33, wherein the subdivision points comprise displacement vectors.

35 The method of any of the claims 33 or 34 further comprising: computing a displacement between a base mesh and a surface of an original mesh; and decoding each of the subdivision points into a separate frame when a wavelet transform is used; or decoding the subdivision points into a single frame when wavelet transform is not used.

36. The method of any of the claims 33 to 35 further comprising: calculating 2D positions of displacement coefficients inside a packed frame by using a projection to the geometrical surface surrounding a decoded mesh; and reducing a number of displacement values by eliminating unused points into the packed frame of a predefined size.

37. The method of any of the claims 33 to 36 further comprising: receiving a packing mode used to package the values of the subdivision points; and/or receiving a type of the geometrical surface.

38. The method of any of the claims 33 to 37 further comprising: receiving a position, a rotation angle, and/or a scale of the geometrical surface.

39. The method of claim 38, wherein the position, the rotation angle, and/or the scale of the geometrical surface are received as following: a delta x field to indicate position X of the geometrical surface as deltaX to central point of a base-mesh; a delta y field to indicate position Y of the geometrical surface as deltaY to central point of the base-mesh; a delta z field to indicate position X of the geometrical surface as deltaZ to central point of the base-mesh; rotation angle field to indicate the rotation angle of the geometrical surface to a central axis; and projection scale filed to indicate the scale factor of the geometrical surface.

40. The method of any of the claims 33 to 39 further comprising merging or omitting some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel.

Description:

ADAPTIVE DISPLACEMENT PACKING FOR DYNAMIC MESH CODING

TECHNICAL FIELD

[0001] The examples and non-limiting embodiments relate generally to volumetric video coding, and more particularly, to adaptive displacement packing for dynamic mesh coding.

BACKGROUND

[0002] It is known to perform encoding and decoding of images and video.

SUMMARY

[0003] An example apparatus includes: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: use projection of subdivision points to a geometrical surface; package values of the subdivision points into a two dimensional (2D) box; determine whether multiple subdivision points map on same pixel; and merge or omit some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel.

[0004] The example apparatus may further include, wherein the apparatus is caused to: determine the size of the 2D box based on the number of the subdivision points; or arbitrarily set the size of the 2D box.

[0005] The example apparatus may further include, wherein the subdivision points comprise displacement vectors.

[0006] The example apparatus may further include, wherein the apparatus is caused to: perform a subdivision process to generate subdivisional points; compute a displacement between a base mesh and a surface of an original mesh; and encode each of the subdivision points into a separate frame when a wavelet transform is used; or encode the subdivision points into a single frame when wavelet transform is not used.

[0007] The example apparatus may further include, wherein the apparatus is caused to: signal the encoded subdivision points to a decoder.

[0008] The example apparatus may further include, wherein the apparatus is caused to: calculate 2D positions of displacement coefficients inside a packed frame by using a projection to the geometrical surface surrounding an encoded mesh; and reduce a number of displacement values by eliminating unused points into the packed frame of a predefined size. [0009] The example apparatus may further include, wherein the apparatus is caused to: signal a packing mode used to package the values of the subdivision points; and/or signal a type of the geometrical surface.

[0010] The example apparatus may further include, wherein the apparatus is caused to use a packing mode field to indicate the packing mode used to package the values of the subdivision points.

[0011] The example apparatus may further include, wherein the apparatus is caused to use a projection primitive field to indicate the type of the geometrical surface.

[0012] The example apparatus may further include, wherein the apparatus is caused to: signal a position, a rotation angle, and/or a scale of the geometrical surface to the decoder.

[0013] The example apparatus may further include, wherein the apparatus is caused to use following to signal the position, the rotation angle, and/or the scale of the geometrical surface: a delta x field to indicate position X of the geometrical surface as deltaX to central point of a basemesh; a delta y field to indicate position Y of the geometrical surface as deltaY to central point of the base-mesh; a delta z field to indicate position X of the geometrical surface as deltaZ to central point of the base-mesh; rotation angle field to indicate the rotation angle of the geometrical surface to a central axis; and projection scale filed to indicate the scale factor of the geometrical surface.

[0014] The example apparatus may further include, wherein the apparatus is caused to: use update enabled flag to indicate that a displacement projection semantics is capable of being updated in an atlas frame parameter set.

[0015] Another example apparatus includes: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: receive a bitstream comprising metadata; extract the metadata; use the extracted metadata to assign displacement pixel positions to vertex indices in order to apply decoded displacements to a mesh comprising subdi visional points; and generate an output reconstructed displaced mesh based at least on the assignment.

[0016] The example apparatus may further include, wherein the subdivision points comprise displacement vectors.

[0017] The example apparatus may further include, wherein the apparatus is caused to: compute a displacement between a base mesh and a surface of an original mesh; and decode each of the subdivision points into a separate frame when a wavelet transform is used; or encode the subdivision points into a single frame when wavelet transform is not used.

[0018] The example apparatus may further include, wherein the apparatus is caused to: calculate 2D positions of displacement coefficients inside a packed frame by using a projection to the geometrical surface surrounding a decoded mesh; and reduce a number of displacement values by eliminating unused points into the packed frame of a predefined size.

[0019] The example apparatus may further include, wherein the apparatus is caused to: receive a packing mode used to package the values of the subdivision points; and/or receive a type of the geometrical surface.

[0020] The example apparatus may further include, wherein the apparatus is caused to: receive a position, a rotation angle, and/or a scale of the geometrical surface to the decoder.

[0021] The example apparatus may further include, wherein the position, the rotation angle, and/or the scale of the geometrical surface are received as following: a delta x field to indicate position X of the geometrical surface as deltaX to central point of a base-mesh; a delta y field to indicate position Y of the geometrical surface as deltaY to central point of the base-mesh; a delta z field to indicate position X of the geometrical surface as deltaZ to central point of the base-mesh; rotation angle field to indicate the rotation angle of the geometrical surface to a central axis; and projection scale filed to indicate the scale factor of the geometrical surface.

[0022] The example apparatus may further include, wherein the apparatus is further caused to: merge or omit some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel.

[0023] An example method includes: using projection of subdivision points to a geometrical surface; packaging values of the subdivision points into a two dimensional (2D) box; determining whether multiple subdivision points map on same pixel; and merging or omitting some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel.

[0024] The example method may further include: determining the size of the 2D box based on the number of the subdivision points; or arbitrarily setting the size of the 2D box.

[0025] The example apparatus may further include, wherein the subdivision points comprise displacement vectors.

[0026] The example method may further include: performing a subdivision process to generate subdivisional points; computing a displacement between a base mesh and a surface of an original mesh; and encoding each of the subdivision points into a separate frame when a wavelet transform is used; or encoding the subdivision points into a single frame when wavelet transform is not used.

[0027] The example method may further include: signaling the encoded subdivision points to a decoder.

[0028] The example method may further include: calculating 2D positions of displacement coefficients inside a packed frame by using a projection to the geometrical surface surrounding an encoded mesh; and reducing a number of displacement values by eliminating unused points into the packed frame of a predefined size.

[0029] The example method may further include: signaling a packing mode used to package the values of the subdivision points; and/or signaling a type of the geometrical surface.

[0030] The example method may further include using a packing mode field to indicate the packing mode used to package the values of the subdivision points.

[0031] The example method may further include using a projection primitive field to indicate the type of the geometrical surface.

[0032] The example method may further include signaling a position, a rotation angle, and/or a scale of the geometrical surface to the decoder.

[0033] The example method may further include using following to signal the position, the rotation angle, and/or the scale of the geometrical surface: a delta x field to indicate position X of the geometrical surface as deltaX to central point of a base-mesh; a delta y field to indicate position Y of the geometrical surface as deltaY to central point of the base-mesh; a delta z field to indicate position X of the geometrical surface as deltaZ to central point of the base-mesh; rotation angle field to indicate the rotation angle of the geometrical surface to a central axis; and projection scale filed to indicate the scale factor of the geometrical surface.

[0034] The example method may further include using update enabled flag to indicate that a displacement projection semantics is capable of being updated in an atlas frame parameter set.

[0035] Another example method includes: receiving a bitstream comprising metadata; extracting the metadata; using the extracted metadata to assign displacement pixel positions to vertex indices in order to apply decoded displacements to a mesh comprising subdivisional points; and generating an output reconstructed displaced mesh based at least on the assignment.

[0036] The example method may further include, wherein the subdivision points comprise displacement vectors.

[0037] The example method may further include: computing a displacement between a base mesh and a surface of an original mesh; and decoding each of the subdivision points into a separate frame when a wavelet transform is used; or decoding the subdivision points into a single frame when wavelet transform is not used.

[0038] The example method may further include: calculating 2D positions of displacement coefficients inside a packed frame by using a projection to the geometrical surface surrounding a decoded mesh; and reducing a number of displacement values by eliminating unused points into the packed frame of a predefined size.

[0039] The example method may further include: receiving a packing mode used to package the values of the subdivision points; and/or receiving a type of the geometrical surface.

[0040] The example method may further include: receiving a position, a rotation angle, and/or a scale of the geometrical surface.

[0041] The example method may further include, wherein the position, the rotation angle, and/or the scale of the geometrical surface are received as following: a delta x field to indicate position X of the geometrical surface as deltaX to central point of a base-mesh; a delta y field to indicate position Y of the geometrical surface as deltaY to central point of the base-mesh; a delta z field to indicate position X of the geometrical surface as deltaZ to central point of the base-mesh; rotation angle field to indicate the rotation angle of the geometrical surface to a central axis; and projection scale filed to indicate the scale factor of the geometrical surface.

[0042] The example method may further include merging or omitting some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel.

[0043] Yet another example apparatus includes: means for using projection of subdivision points to a geometrical surface; means for packaging values of the subdivision points into a two dimensional (2D) box; means for determining whether multiple subdivision points map on same pixel; and means for merging or omitting some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel. [0044] Still another example apparatus includes: means for receiving a bitstream comprising metadata; means for extracting the metadata; means for using the extracted metadata to assign displacement pixel positions to vertex indices in order to apply decoded displacements to a mesh comprising subdi visional points; and means for generating an output reconstructed displaced mesh based at least on the assignment.

[0045] An example non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations, the operations comprising: using projection of subdivision points to a geometrical surface; packaging values of the subdivision points into a two dimensional (2D) box; determining whether multiple subdivision points map on same pixel; and merging or omitting some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel.

[0046] Another example non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations, the operations comprising: receiving a bitstream comprising metadata; extracting the metadata; using the extracted metadata to assign displacement pixel positions to vertex indices in order to apply decoded displacements to a mesh comprising subdivisional points; and generating an output reconstructed displaced mesh based at least on the assignment.

BRIEF DESCRIPTION OF THE DRAWINGS

[0047] The foregoing aspects and other features are explained in the following description, taken in connection with the accompanying drawings, wherein:

[0048] FIG. 1 A is a diagram showing volumetric media conversion at an encoder side.

[0049] FIG. IB is a diagram showing volumetric media reconstruction at a decoder side.

[0050] FIG. 2 shows an example of block to patch mapping.

[0051] FIG. 3A shows an example of an atlas coordinate system.

[0052] FIG. 3B shows an example of a local 3D patch coordinate system.

[0053] FIG. 3C shows an example of a final target 3D coordinate system.

[0054] FIG. 4 shows elements of a mesh. [0055] FIG. 5 shows an example V-PCC extension for mesh encoding, based on the embodiments described herein.

[0056] FIG. 6 shows an example V-PCC extension for mesh decoding, based on the embodiments described herein.

[0057] FIG. 7 shows one subdivision step of a triangle into four triangles by connecting mid points of the initial triangle edges.

[0058] FIG. 8 depicts a multi-resolution analysis of a mesh.

[0059] FIG. 9 is block diagram of an encoder composed of a pre-processing module.

[0060] FIG. 10 depicts pre-processing steps at the encoder.

[0061] FIG. 11 is a block diagram of an intra frame encoder scheme.

[0062] FIG. 12 is a block diagram of an inter frame encoder scheme.

[0063] FIG. 13 depicts a decoder scheme composed of a decoder module that demultiplexes and decodes all sub-streams and a post-processing module that reconstructs the dynamic mesh sequence.

[0064] FIG. 14 shows a decoding process in intra mode.

[0065] FIG. 15 shows a decoding process in inter mode.

[0066] FIG. 16 shows a base mesh encoder in a VDMC encoder.

[0067] FIG. 17 shows an example base mesh encoder.

[0068] FIG. 18 shows an example base mesh decoder.

[0069] FIG. 19 illustrates an overview of a base mesh data substream structure.

[0070] FIG. 20 depicts segmentation of a mesh into sub-meshes.

[0071] FIG. 21 illustrates a 2 submesh example.

[0072] FIG. 22 shows zoom on the encoder sub-block related to displacement image packing.

[0073] FIG. 23 shows displacement video frame without quantization with number of LODs=4, normalized for visualization. [0074] FIG. 24 shows displacement for two frames.

[0075] FIG. 25 shows a 2D example of a projection to a sphere and the corresponding packing into a one dimensional (ID) line, in accordance with an embodiment.

[0076] FIG. 26 illustrates mapping of an irregularly sampled set of points to a regular grid, in accordance with an embodiment.

[0077] FIG. 27 is an example apparatus to implement the examples described herein.

[0078] FIG. 28 shows a representation of an example of non-volatile memory media.

[0079] FIG. 29 is an example method to implement the examples described herein.

[0080] FIG. 30 is example method to implement the examples described herein.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

[0081] The examples described herein relate to the new standardization activity called Videobased Dynamic Mesh Coding (V-DMC) ISO/IEC 23090-29, which is anew application of the Visual Volumetric Video Coding (V3C) standard family ISO/IEC 23090-5.

[0082] Volumetric video data

[0083] Volumetric video data represents a three-dimensional scene or object and can be used as input for AR, VR and MR applications. Such data describes geometry (shape, size, position in 3D-space) and respective attributes (e.g. color, opacity, reflectance, . . .), plus any possible temporal transformations of the geometry and attributes at given time instances (like frames in 2D video). Volumetric video is either generated from 3D models, e.g., CGI, or captured from real-world scenes using a variety of capture solutions, e.g. multi-camera, laser scan, combination of video and dedicated depth sensors, and more. Also, a combination of CGI and real-world data is possible. Typical representation formats for such volumetric data are triangle meshes, point clouds, or voxels. Temporal information about the scene can be included in the form of individual capture instances, e.g. “frames” in 2D video, or other means, e.g. position of an object as a function of time.

[0084] Because volumetric video describes a 3D scene (or object), such data can be viewed from any viewpoint. Therefore, volumetric video is an important format for AR, VR, or MR applications, especially for providing 6DOF viewing capabilities.

[0085] Increasing computational resources and advances in 3D data acquisition devices have enabled reconstruction of highly detailed volumetric video representations of natural scenes. Infrared, lasers, time-of -flight and structured light are all examples of devices that can be used to construct 3D video data. Representation of the 3D data depends on how the 3D data is used. Dense voxel arrays have been used to represent volumetric medical data. In 3D graphics, polygonal meshes are extensively used. Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold. Another way to represent 3D data is coding this 3D data as a set of textures and a depth map as is the case in the multi-view plus depth framework. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi-level surface maps.

[0086] MPEG visual volumetric video-based coding (V3C)

[0087] Selected excerpts from the ISO/IEC 23090-5 Visual Volumetric Video-based Coding and Video-based Point Cloud Compression 2nd Edition standard are referred to herein.

[0088] Visual volumetric video, a sequence of visual volumetric frames, when uncompressed, may be represented by a large amount of data, which can be costly in terms of storage and transmission. This has led to the need for a high coding efficiency standard for the compression of visual volumetric data.

[0089] The V3C specification enables the encoding and decoding processes of a variety of volumetric media by using video and image coding technologies. This is achieved through first a conversion of such media from their corresponding 3D representation to multiple 2D representations, also referred to as V3C components, before coding such information. Such representations may include occupancy, geometry, and attribute components. The occupancy component can inform a V3C decoding and/or rendering system of which samples in the 2D components are associated with data in the final 3D representation. The geometry component includes information about the precise location of 3D data in space, while attribute components can provide additional properties, e.g. texture or material information, of such 3D data. An example is shown in FIG. 1A and FIG. IB.

[0090] FIG. 1A shows volumetric media conversion at the encoder, and FIG. IB shows volumetric media conversion at the decoder side. The 3D media 102 is converted to a series of 2D representations: occupancy 118, geometry 120, and attribute 122. Additional atlas information 108 is also included in the bitstream to enable inverse reconstruction. Refer to ISO/IEC 23090-5.

[0091] As further shown in FIG. 1A, a volumetric capture operation 104 generates a projection 106 from the input 3D media 102. In some examples, the projection 106 is a projection operation. From the projection 106, an occupancy operation 110 generates the occupancy 2D representation 118, a geometry operation 112 generates the geometry 2D representation 120, and an attribute operation 114 generates the attribute 2D representation 122. The additional atlas information 108 is included in the bitstream 116. The atlas information 108, the occupancy 2D representation 118, the geometry 2D representation 120, and the attribute 2D representation 122 are encoded into the V3C bitstream 124 to encode a compressed version of the 3D media 102. Based on the examples described herein, V-DMC packing signaling 129 may also be signaled in the V3C bitstream 124 or directly to a decoder. The V-DMC packing signaling 129 may be used on the decoder side, as shown in FIG. IB.

[0092] As shown in FIG. IB, a decoder using the V3C bitstream 124 derives 2D representations using an occupancy operation 128, a geometry operation 130 and an attribute operation 132. The atlas information operation 126 provides atlas information into a bitstream 134. The occupancy operation 128 derives the occupancy 2D representation 136, the geometry operation 130 derives the geometry 2D representation 138, and the attribute operation 132 derives the attribute 2D representation 140. The 3D reconstruction operation 142 generates a decompressed reconstruction 144 of the 3D media 102, using the atlas information 126/134, the occupancy 2D representation 136, the geometry 2D representation 138, and the attribute 2D representation 140.

[0093] Additional information that allows associating all these subcomponents and enables the inverse reconstruction, from a 2D representation back to a 3D representation is also included in a special component, referred to herein as the atlas. An atlas comprises multiple elements, namely patches. Each patch identifies a region in all available 2D components and includes information necessary to perform the appropriate inverse projection of this region back to the 3D space. The shape of such regions is determined through a 2D bounding box associated with each patch as well as their coding order. The shape of these regions is also further refined after the consideration of the occupancy information.

[0094] Atlases are partitioned into patch packing blocks of equal size. Refer for example to block 202 in FIG. 2, where FIG. 2 shows an example of block to patch mapping. The 2D bounding boxes of patches and their coding order determine the mapping between the blocks of the atlas image and the patch indices. FIG. 2 shows an example of block to patch mapping with 4 projected patches (204, 204-2, 204-3, 204-4) onto an atlas 201 when asps_patch_precedence_order_flag is equal to 0. Projected points are represented with dark gray. The area that does not include any projected points is represented with light grey. Patch packing blocks 202 are represented with dashed lines. The number inside each patch packing block 202 represents the patch index of the patch (204, 204-2, 204-3, 204-4) to which it is mapped. [0095] Axes orientations are specified for internal operations. For instance, the origin of the atlas coordinates is located on the top-left corner of the atlas frame. For the reconstruction step, an intermediate axes definition for a local 3D patch coordinate system is used. The 3D local patch coordinate system is then converted to the final target 3D coordinate system using appropriate transformation steps.

[0096] FIG. 3A shows an example of an atlas coordinate system, FIG. 3B shows an example of a local 3D patch coordinate system, and FIG. 3C shows an example of a final target 3D coordinate system. Refer to ISO/IEC 23090-5.

[0097] FIG. 3A shows an example of a single patch 302 packed onto an atlas image 304. This patch 302 is then converted, with reference to FIG. 3B, to a local 3D patch coordinate system (U, V, D) defined by the projection plane with origin O’, tangent (U), bi-tangent (V), and normal (D) axes. For an orthographic projection, the projection plane is equal to the sides of an axis-aligned 3D bounding box 306, as shown in FIG. 3B. The location of the bounding box 306 in the 3D model coordinate system, defined by a left-handed system with axes (X, Y, Z), can be obtained by adding offsets TilePatch3dOffsetU 308, TilePatch3DOffsetV 310, and TilePatch3DOffsetD 312, as illustrated in FIG. 3C.

[0098] V3C High Level Syntax

[0099] Coded V3C video components are referred to herein as video bitstreams, while an atlas component is referred to as the atlas bitstream. Video bitstreams and atlas bitstreams may be further split into smaller units, referred to herein as video and atlas sub-bitstreams, respectively, and may be interleaved together, after the addition of appropriate delimiters, to construct a V3C bitstream.

[0100] V3C patch information is included in an atlas bitstream, atlas_sub_bitstream(), which includes a sequence of NAL units. A NAL unit is specified to format data and provide header information in a manner appropriate for conveyance on a variety of communication channels or storage media. All data are included in NAL units, each of which includes an integer number of bytes. A NAL unit specifies a generic format for use in both packet-oriented and bitstream systems. The format of NAL units for both packet-oriented transport and sample streams is identical except that in the sample stream format specified in Annex D of ISO/IEC 23090-5 each NAL unit can be preceded by an additional element that specifies the size of the NAL unit.

[0101] NAL units in an atlas bitstream can be divided into atlas coding layer (ACL) and nonatlas coding layer (non-ACL) units. The former is dedicated to carry patch data, while the latter is dedicated to carry data necessary to properly parse the ACL units or any additional auxiliary data. [0102] In the nal_unit_header() syntax nal_unit_type specifies the type of the RBSP data structure included in the NAL unit as specified in Table 4 of ISO/IEC 23090-5. nal_layer_id specifies the identifier of the layer to which an ACL NAL unit belongs or the identifier of a layer to which a non-ACL NAL unit applies. The value of nal_layer_id shall be in the range of 0 to 62, inclusive. The value of 63 may be specified in the future by ISO/IEC. Decoders conforming to a profile specified in Annex A of ISO/IEC 23090-5 shall ignore (e.g., remove from the bitstream and discard) all NAL units with values of nal_layer_id not equal to 0.

[0103] V3C extension mechanisms

[0104] While designing the V3C specification it was envisaged that amendments or new editions can be created in the future. In order to ensure that the first implementations of V3C decoders are compatible with any future extension, a number of fields for future extensions to parameter sets were reserved.

[0105] For example, the second edition of V3C introduced extensions in VPS related to MIV and the packed video component.

[0106] Rendering and meshes

[0107] A polygon mesh is a collection of vertices, edges and faces that defines the shape of a polyhedral object in 3D computer graphics and solid modeling. The faces usually comprise triangles (triangle mesh), quadrilaterals (quads), or other simple convex polygons (n-gons), since this simplifies rendering, but may also be more generally composed of concave polygons, or even polygons with holes.

[0108] With reference to FIG. 4, objects 400 created with polygon meshes are represented by different types of elements. These include vertices 402, edges 404, faces 406, polygons 408 and surfaces 410 as shown in FIG. 4. Thus, FIG. 4 illustrates elements of a mesh.

[0109] Polygon meshes are defined by the following elements:

[0110] Vertex (402): a position in 3D space defined as (x,y,z) along with other information such as color (r,g,b), normal vector and texture coordinates.

[0111] Edge (404): a connection between two vertices.

[0112] Face (406): a closed set of edges 404, in which a triangle face has three edges, and a quad face has four edges. A polygon 408 is a coplanar set of faces 406. In systems that support multisided faces, polygons and faces are equivalent. Mathematically a polygonal mesh may be considered an unstructured grid, or undirected graph, with additional properties of geometry, shape and topology.

[0113] Surfaces (410): or smoothing groups, are useful, but not required to group smooth regions.

[0114] Groups: some mesh formats include groups, which define separate elements of the mesh, and are useful for determining separate sub-objects for skeletal animation or separate actors for non-skeletal animation.

[0115] Materials: defined to allow different portions of the mesh to use different shaders when rendered.

[0116] UV coordinates: most mesh formats also support some form of UV coordinates which are a separate 2D representation of the mesh "unfolded" to show what portion of a 2-dimensional texture map to apply to different polygons of the mesh. It is also possible for meshes to include other such vertex attribute information such as color, tangent vectors, weight maps to control animation, etc. (sometimes also called channels).

[0117] V-PCC mesh coding extension (MPEG M49588)

[0118] FIG. 5 and FIG. 6 show the extensions to the V-PCC encoder and decoder to support mesh encoding and mesh decoding, respectively, as proposed in MPEG input document [MPEG M47608].

[0119] In the encoder extension 500, the input mesh data 502 is demultiplexed with demultiplexer 504 into vertex coordinates+attributes 506 and vertex connectivity 508. The vertex coordinates+attributes data 506 is coded using MPEG-I V-PCC (such as with MPEG-I VPCC encoder 510), whereas the vertex connectivity data 508 is coded (using vertex connectivity encoder 516) as auxiliary data 518. Both of these (encoded vertex coordinates and vertex attributes 517 and auxiliary data 518) are multiplexed using multiplexer 520 to create the final compressed output bitstream 522. Vertex ordering 514 is carried out on the reconstructed vertex coordinates 512 at the output of MPEG-I V-PCC 510 to reorder the vertices for optimal vertex connectivity encoding 516.

[0120] Based on the examples described herein, as shown in FIG. 5, the encoding process/apparatus 500 of FIG. 5 may be extended such that the encoding process/apparatus 500 signals packing signaling 530 (e.g. V-DMC packing signaling) within the output bitstream 522. Alternatively, packing signaling 530 may be provided and signaled separately from the output bitstream 522.

[0121] As shown in FIG. 6, in the decoder 600, the input bitstream 602 is demultiplexed with demultiplexer 604 to generate the compressed bitstreams for vertex coordinates+attributes 605 and vertex connectivity 606. The input/compressed bitstream 602 may comprise or may be the output from the encoder 500, namely the output bitstream 522 of FIG. 5. The vertex coordinates+attributes data 605 is decompressed using MPEG-I V-PCC decoder 608 to generate vertex attributes 612. Vertex ordering 616 is carried out on the reconstructed vertex coordinates 614 at the output of MPEG-I V-PCC decoder 608 to match the vertex order at the encoder 500. The vertex connectivity data 606 is also decompressed using vertex connectivity decoder 610 to generate vertex connectivity information 618, and everything (including vertex attributes 612, the output of vertex reordering 616, and vertex connectivity information 618) is multiplexed with multiplexer 620 to generate the reconstructed mesh 622. [0122] Based on the examples described herein, as shown in FIG. 6, the decoding process/apparatus 600 of FIG. 6 may be extended such that the decoding process/apparatus 600 receives and decodes packing signaling 630 (e.g. V-DMC packing signaling), which may be part of the compressed bitstream 602. The packing signaling 630 of FIG. 6 may comprise or correspond to the packing signaling 530 of FIG. 5. Alternatively, packing signaling 630 may be received and signaled separately from the compressed bitstream 602 or output bitstream 522 (e.g. signaled to the demultiplexer 604 separately from the compressed bitstream 602).

[0123] Generic mesh compression

[0124] Mesh data may be compressed directly without projecting it into 2D-planes, like in V- PCC based mesh coding. In fact, the anchor for V-PCC mesh compression call for proposals (CfP) utilizes off-the shelf mesh compression technology, Draco (https://google.github.io/draco/), for compressing mesh data excluding textures. Draco is used to compress vertex positions in 3D, connectivity data (faces) as well as UV coordinates. Additional per-vertex attributes may be also compressed using Draco. The actual UV texture may be compressed using traditional video compression technologies, such as H.265 or H.264.

[0125] Draco uses the edgebreaker algorithm at its core to compress 3D mesh information. Draco offers a good balance between simplicity and efficiency, and is part of Khronos endorsed extensions for the glTF specification. The main idea of the algorithm is to traverse mesh triangles in a deterministic way so that each new triangle is encoded next to an already encoded triangle. This enables prediction of vertex specific information from the previously encoded data by simply adding delta to the previous data. Edgebreaker utilizes symbols to signal how each new triangle is connected to the previously encoded part of the mesh. Connecting triangles in such a way results on average in 1 to 2 bits per triangle when combined with existing binary encoding techniques.

[0126] V-DMC

[0127] The V-DMC standardization works have started after the completion of the call for proposal (CfP) issued by MPEG 3DG (ISO/IEC SC29 WG 2) on integration of MESH compression into the V3C family of standards (ISO/IEC 23090-5). The retained technology after the CfP result analysis is based on multiresolution mesh analysis and coding. This approach includes:

[0128] 1. Generating a base mesh that is a simplified (low resolution) mesh approximation of the original mesh, called a base mesh (this is done for all frames of the dynamic mesh sequence) m,

[0129] 2. Performing several mesh subdivision iterative steps (e.g., each triangle 700 is converted into four triangles (701, 702, 703, 704) by connecting the triangle edge midpoints as illustrated on Fig. 7) on the generated base mesh, generating other approximation meshes m", where n stands for the number of iterations with m, = m°,

[0130] 3. Defining displacement vectors di, also named error vectors, for each vertex of each mesh approximation m", with n > 0, noted d ⁿi

[0131] 4. For each subdivision level, the deformed mesh, obtained by m", + d"i, e.g., by adding the displacement vectors to the subdivided mesh vertices generates the best approximation of the original mesh at that resolution, given the base mesh and prior subdivision levels.

[0132] 5. The displacement vectors may undergo a lazy wavelet transform prior to compression.

[0133] 6. The attribute map of the original mesh is transferred to the deformed mesh at the highest resolution (e.g., subdivision level) such that texture coordinates are obtained for the deformed mesh and a new attribute map is generated.

[0134] The scheme is illustrated on Fig. 8. FIG. 7 shows one subdivision step of a triangle 700 into four triangles (701, 702, 703, 704) by connecting mid points of the initial triangle edges. FIG. 8 shows a multi-resolution analysis of a mesh. A base mesh (left, 802) undergoes a first step of subdivision and error vectors are added to each vertex (arrows of 804), after a series of iterative subdivision and displacements, the highest resolution mesh is generated (right, 808 from 806). The connectivity of the highest resolution deformed mesh 808 is generally different from the original mesh 802, however, the geometry of the deformed mesh 808 is a good approximation of the original mesh geometry.

[0135] The encoding process 900 can be separated into two main modules: the pre-processing module 902 and the actual encoder module 904 as illustrated on Fig. 9. FIG. 9 shows that the encoder 901 is composed of a pre-processing module 902 that generates a base mesh 906 and the displacement vectors 908, given the input mesh sequence 903 and its attribute maps 905. The encoder module 904 generates the compressed bitstream 910 by ingesting the inputs and outputs of the pre-processing module 902. The encoder 904 provides feedback 912 to pre-processing 902.

[0136] The pre-processing 902 includes mainly three steps: decimation 1002 (reducing the original mesh resolution to produce a base mesh 906 and a decimated mesh 1004), uv-atlas isocharting 1006 (creating a parameterization of the base mesh and generating a parameterized decimated mesh 1008) and the subdivision surface fitting 1010 as illustrated on FIG. 10. Thus, FIG. 10 shows the pre-processing steps 902 at the encoder 901. [0137] The encoder is illustrated on FIG. 11 and FIG. 12 for the INTRA case (1101) and INTER case (1201) respectively. In the latter 1201, the base mesh connectivity of the first frame of a group of frames is imposed to the subsequent frame’s base meshes to improve compression performance. FIG. 11 shows an intra frame encoder scheme 1100.

[0138] FIG. 11 shows the encoder process 1100 for INTRA frame encoding. Inputs to this module are the base mesh 906 (that is an approximation of the input mesh 903 but that includes less faces and vertices), the patch information 1102 related to the input base mesh 906, the displacements 908, the static/dynamic input mesh frame 903 and the attribute map 905. Outputs of this module is a compressed bitstream 910 that includes a V3C extended signaling sub-bitstream including patch data information 1102, compressed base mesh substream 1104, a compressed displacement video component substream 1106 and a compressed attribute video component sub-bitstream 1108. The module 1101 takes the input base mesh 906 and first quantizes its data in the quantization module 1110, which can be dynamically tuned by a control module 1112. The quantized base mesh is then encoded with the static mesh encoder module 1114, which outputs a compressed base mesh subbitstream 1104 that is muxed 1116 in the output bitstream 910. The encoded base mesh is decoded in the static mesh decoder module 1118 that generates a reconstructed quantized base mesh 1120. The update displacements module 1122 takes as input the reconstructed quantized base mesh 1120, the pristine base mesh 906 and the input displacements 908 to generate new updated displacements 1124 that are remapped to the reconstructed base mesh data in order to avoid precision errors due to the static mesh encoding and decoding process. The updated displacements 1124 are filtered with a wavelet transform in the wavelet transform module 1126 (that also takes as input the reconstructed base mesh 1120) and then quantized in the quantization module 1128. The quantized wavelet coefficients 1130 produced from the updated displacements 1124 are then packed into a video component in the image packing module 1132. This video component 1133 is then encoded with a 2D video encoder such as HEVC, VVC, etc., in the video encoder module 1134, and the output compressed displacement video component sub-bitstream 1106 is muxed 1116 along with the V3C signaling information sub-bitstream 1108 into the output compressed bitstream 910. Then the compressed displacement video component is first decoded and reconstructed and then unpacked into encoded and quantized wavelet coefficients 1138 in the image unpacking module 1136. These wavelet coefficients 1138 are then unquantized in the inverse quantization module 1140 and reconstructed with the inverse wavelet transform module 1142 that generates reconstructed displacements 1144. The reconstructed base mesh 1146 is unquantized in the inverse quantization module 1148 and the unquantized base mesh 1146 is combined with the reconstructed displacements 1144 in the reconstruct deformed mesh module 1150 to obtain the reconstructed deformed mesh 1152. This reconstructed deformed mesh 1152 is then fed into the attribute transfer module 1154 together with the attribute map 905 produced by the pre-processing 902 and the input static/dynamic mesh frame 903. The output of the attribute transfer module is an updated attribute map 1156 that now corresponds to the reconstructed deformed mesh frame 1152. The updated attribute map 1154 is then padded 1156, undergoes color conversion 1158, and is encoded as a video component with a 2D video codec such as HEVC or VVC, in the padding 1156, color conversion 1158 and video encoder 1160 modules respectively. The output compressed attribute map bitstream 1108 is multiplexed 1116 into the encoder output bitstream 910.

[0139] FIG. 12 shows an inter frame encoder scheme 1200, similar to the intra case 1100, but with the base mesh connectivity being constrained for all frames of a group of frames. A motion encoder 1202 is used to efficiently encode displacements between base meshes compared to the base mesh of the first frame of the group of frames.

[0140] The inter encoding process 1200 is similar to the intra encoding process with the following changes. The reconstructed reference base mesh 1146 is an input of the inter coding process. A new module called motion encoder 1202 takes as input the quantized input base mesh 906 and the reconstructed quantized reference base mesh 1146 to produce compressed motion information encoded as a compressed motion bitstream 1204, which is multiplexed 1206 into the encoder output compressed bitstream 910. All other modules and processes are similar to the intra encoding case (1100, 1101).

[0141] The compressed bitstream 910 generated by the encoder 1201 multiplexes (1206): a sub-bitstream with the encoded base mesh using a static mesh codec, a sub-bitstream 1204 with the encoded motion data using an animation codec for base meshes in case INTER coding is enabled, a sub-bitstream 1208 with the wavelet coefficients of the displacement vectors packed in an image and encoded using a video codec 1209, a sub-bitstream 1210 with the attribute map encoded using a video codec 1212, and a sub-bitstream that includes all metadata required to decode and reconstruct the mesh sequence based on the aforementioned sub-bitstreams. The signaling of the metadata is based on the V3C syntax and includes necessary extensions that are specific to meshes.

[0142] The decoding process 1300 is illustrated on FIG. 13. First the compressed bitstream 910 is demultiplexed with decoder 1301 into sub-bitstreams that are reconstructed, e.g., metadata 1302, reconstructed base mesh 1304, reconstructed displacements 1306 and the reconstructed attribute map data 1308. The reconstruction of the mesh sequence is performed based on that data in the post-processing module 1310.

[0143] Thus, FIG. 13 depicts the decoder scheme 1300 composed of a decoder module 1201 that demultiplexes (demuxes) and decodes all sub-streams and a post-processing module 1310 that reconstructs the dynamic mesh sequence to generate output mesh 1312 and output attributes 1314.

[0144] FIG. 14 and FIG. 15 illustrate the decoding process in INTRA mode 1400 and INTER mode 1500 respectively.

[0145] FIG. 14 depicts the decoding process 1400 in intra mode using intra frame decoding 1401. The intra frame decoding process includes the following modules and processes. First the input compressed bitstream is de-multiplexed 1402 into V3C extended atlas data information (or patch information) 1403, a compressed static mesh bitstream 1405, a compressed displacement video component 1407 and a compressed attribute map bitstream 1409, respectively. The static mesh decoding module 1404 converts the compressed static mesh bitstream 1405 into a reconstructed quantized static mesh 1411, which represents a base mesh. This reconstructed quantized base mesh 1411 undergoes inverse quantization in the inverse quantization module 1406 to produce a decoded reconstructed base mesh 1413. The compressed displacement video component bitstream 1407 is decoded in the video decoding module 1408 to generate a reconstructed displacement video component 1415. This displacement video component 1415 is unpacked into reconstructed quantized wavelet coefficients in the image unpacking module 1410. Reconstructed quantized wavelet coefficients are inverse quantized in the inverse quantization module 1412 and then undergo an inverse wavelet transform in the inverse wavelet transform module 1414, that produces decoded displacement vectors 1416. The reconstruct deformed mesh module 1418 takes into account the patch information and takes as input the decoded reconstructed base mesh 1413 and decoded displacement vectors 1416 to produce the output decoded mesh frame 1312. The compressed attribute map video component 1409 is decoded with video decoder 1420, and possibly undergoes color conversion 1422 to produce a decoded attribute map frame 1314 that corresponds to the decoded mesh frame 1312.

[0146] FIG. 15 depicts the decoding process 1500 in inter mode using inter frame decoding 1501. The inter decoding process 1500 is similar to the intra decoding process module 1401 with the following changes. The decoder also demultiplexes a compressed information bitstream 1503. A decoded reference base mesh 1502 is taken as input of a motion decoder module 1504 together with the compressed motion information sub-bitstream 1503. This decoded reference base mesh 1502 is selected from a buffer of previously decoded base mesh frames (by the intra decoder process 1401 for the first frame of a group of frames). The reconstruction of base mesh module 1506 takes the decoded reference base mesh 1502 and the decoded motion information 1505 as input to produce a decoded reconstructed quantized base mesh 1507. All other processes are similar to the intra decoding process 1401. [0147] The signaling of the metadata and substreams produced by the encoder 901 and ingested by the decoder 1301 was proposed as an extension of V3C in the technical submission to the dynamic mesh coding CfP, and should be considered as purely indicative. It is as follows and mainly includes additional V3C unit header syntax, additional V3C unit payload syntax, and a mesh intra patch data unit.

[0148] V3C unit header syntax

[0149] V3C unit payload syntax

[0150] Mesh Intra patch data unit

[0151] A refinement of the metadata and substreams signaling is being discussed and is as follows.

[0152] Base meshes are the output of the base mesh substream decoder.

[0153] A submesh is a set of vertices, their connectivity and the associated attributes which can be decoded completely independently in a mesh frame. Each base mesh can have one or more submeshes.

[0154] Resampled base meshes are the output of the mesh subdivision process. The inputs to the process is the base meshes(or sets of submeshes) as well the information from the atlas data substream on how to subdivide/resample the meshes(sub meshes).

[0155] A displacement video is the output of the displacement decoder. The inputs to the process is the decoded geometry video as well the information from the atlas data substream on how to interpret/process this video. The displacement video includes displacement values to be added to the corresponding vertices. [0156] A facegroupld is one of the attribute types assigned to each triangle face of the resampled base meshes. Facegroupld can be compared with the ids of the subparts in a patch to determine the corresponding facegroups to the patch. When facegrould is not conveyed through the base mesh substream decoder, it is derived by the information in the atlas data substream.

[0157] V3C unit

[0158] Compressed base meshes are signaled in a new substream, named as the Base Mesh data substream (unit type V3C_MD). As with other v3c units, the unit type, and its associated v3c parameter set id and atlas id are signaled in the v3c_unit_header(). [0159] V3c parameter set extension

[0160] A new extension needs to be introduced in the v3c_parameter_set syntax structure to handle V-DMC. Several new parameters are introduced in this extension including the following:

[0161] vps_ext_mesh_data_facegroup_id_attribute_present_flag equals 1 indicates that one of the attribute types present in the base mesh data stream is the facegroup Id.

[0162] vps_ext_mesh_data_attribute_count indicates the number of total attributes in the base mesh including both the attributes signaled through the base mesh data substream and the attributes signaled in the video sub streams (using ai_attribute_count). When vps_ext_mesh_data_facegroup_id_attribute_present_flag equals 1, it shall be greater or equal to ai_attribute_count+l. This can be constrained by profile/levels.

[0163] The types of attributes that are signaled through the base mesh substream and not through the video substreams are signaled are signaled as vps_ext_mesh_attribute_type data types. When vps_ext_mesh_data_facegroup_id_attribute_present_flag equals 1, one of the vps_ext_mesh_attribute_type must be a facegroup_id.

[0164] vps_ext_mesh_data_substream_codec_id indicates the identifier of the codec used to compress the base mesh data. This codec may be identified through the profiles a component codec mapping SEI message, or through means outside this document.

[0165] vps_ext_attribute_frame_width[i] and vps_ext_attribute_frame_height[i] indicate the corresponding width and height of the video data corresponding to the i-th attribute among the attributes signaled in the video substreams.

[0166] Atlas sequence parameter set extension

[0167] The information included in this extension can be overwritten by the same information in the AFPS extension or the patch data units. The following parameters are introduced:

[0168] asps_vmc_ext_prevent_geometry_video_conversion_flag prevents the outputs of the geometry video substream decoder from being converted. When the flag is true, the outputs are used as they are without any conversion process from Annex B of ISO/IEC 23090-5. When the flag is true, the size of geometry video shall be same as nominal video sizes indicated in the bitstream.

[0169] asps_vmc_ext_prevent_attribute_video_conversion_flag prevents the outputs of attribute video substream decoder from being converted. When the flag is true, the outputs are used as they are without any conversion process from Annex B of ISO/IEC 23090-5. When the flag is true, the size of attribute video shall be same as nominal video sizes indicated in the bitstream.

[0170] asps_vmc_ext_subdivision_method and asps_vmc_ext_subdivision_iteration_count signal information about the subdivision method.

[0171] asps_vmc_ext_transform_index indicates the transform applied to the displacement. The transform index can indicate any transform is not applied. When the transform is LINEAR_LIFTING, the necessary parameters are signaled as vmc_lifting_transform_parameters.

[0172] asps_vmc_ext_patch_mapping_method indicates how to map a subpart of a submesh to a patch. When asps_vmc_ext_patch_mapping_method is equal to 0, all the triangles in the corresponding submesh are associated with the current patch. In this case, there is only one patch associated with the submesh. When asps_vmc_ext_patch_mapping_method is equal to 1, the subpart_ids are explicitly signaled in the mesh patch data unit to indicate the associated subparts. In other cases, the triangle faces in the corresponding submesh are divided into subparts by the method indicated by asps_vmc_ext_patch_mapping_method.

[0173] asps_vmc_ext_tjunction_removing_method indicates the method to remove t- junctions created by different subdivision methods or by different subdivision iterations of two triangles sharing an edge.

[0174] asps_vmc_ext_num_attribute indicates the total number of attributes that the corresponding mesh carries. Its value shall be less or equal to vps_ext_mesh_data_attribute_count.

[0175] asps_vmc_ext_attribute_type is the type of the i-th attribute, and it shall be one of ai_attribute_type_ids or vps_ext_mesh_attribute_types.

[0176] asps_vmc_ext_direct_atrribute_projection_enabled_flag indicates that the 2d locations where attributes are projected are explicitly signaled in the mesh patch data units. Therefore, the projection id and orientation index in V3C V-PCC ISO/IEC 23090-5:2021 can be also used as in ISO/IEC 23090-5:2021.

[0177] The asps_vmc_extension() may be as follows:

[0178] The vmc_lifting_transform_parameters syntax element may be as follows:

[0179] Atlas Frame Parameter set extension

[0180] afps_vmc_ext_single_submesh_in_frame_flag indicates there is only one submesh for the mesh frame

[0181] When afps_vmc_ext_overriden_flag in afps_vmc_extension() is true, the subdivision method, displacement coordinate system, transform index, transform parameters, and attribute transform parameters can be signaled again and the information overrides the one signaled in asps_vmc_extension().

[0182] afps_vmc_ext_single_attribute_tile_in_frame_flag indicates there is only one tile for each attribute signaled in the video streams.

[0183] The afps_vmc_extension syntax element may be as follows: [0184] afps_ext_vmc_attribute_tile_information() includes the tile information for the attributes signaled through the video substreams.

[0185] Atlas Tile Header

[0186] A tile can be associated with one or more submeshes whose id is ath_submesh_id.

[0187] Patch data unit

[0188] As with the V -PCC Patch data units, Mesh patch data units are signaled in the Atlas data substream. Mesh Intra patch data unit, Mesh Inter patch data unit, Mesh Merge patch data unit, and Mesh Skip patch data unit can be used.

[0189] The patch_information_data syntax element may be as follows:

[0190] mdu_submesh_id indicate which submesh the patch is associated with among those indicated in the atlas tile header.

[0191] mdu_vertex_count_minusl and mdu_triangle_count_minusl indicate the number of vertices and triangles associated with the current patch.

[0192] When asps_vmc_ext_patch_mapping_method is not 0, the syntax elements mdu_num_subparts and mdu_subpart_id are signaled. When asps_vmc_ext_patch_mapping_method is 1, the associated triangle faces are the union of the triangle faces whose facegroupld is equal to mdu_subpart_id.

[0193] When mdu_patch_parameters_enable_flag is true, the subdivision method, displacement coordinate system, transform index, transform parameters, and attribute transform parameters can be signaled again and the information overrides the corresponding information signaled in in asps_vmc_extension().

[0194] The mesh_inter_data_unit syntax element may be as follows:

[0195] The mesh_merge_data_unit may be as follows:

[0196] The mesh_skip_data_unit syntax element may be as follows:

[0197] The mesh_raw_data_unit syntax element may be as follows:

[0198] The signaling of the base mesh substream is also under investigation and may be as shown in FIG. 16 (output bitstream 1602), FIG. 17 (base-mesh bitstream 1702), and FIG. 18 (input base-mesh bitstream 1802).

[0199] One of the main features of the current V-DMC specification design is the support for a base mesh signal that can be encoded using any currently or future specified static mesh codec. For example, such information could be coded using Draco 3D Graphics Compression. This representation could provide the basis for applying other decoded information to reconstruct the output mesh frame within the context of V-DMC.

[0200] Furthermore, for coding dynamic mesh frames, it is highly desirable to be able to exploit any temporal correlation that may exist with previously coded base mesh frames. In a design (see FIG. 16, FIG. 17 and FIG. 18), this was accomplished by encoding (with 1604, 1704) a mesh motion field instead of directly encoding the base mesh (1601, 1701), and using this information and a previously encoded base mesh to reconstruct the base mesh of the current frame (1606, 1608, 1706, 1708, 1806, 1808). This approach could be seen as the equivalent of inter prediction in video coding.

[0201] It is highly desirable also to associate all coded base mesh frames or motion fields with information that could help determine their decoding output order as well as their referencing relationships. It is possible, for example, that better coding efficiency could be achieved when the coding order of all frames does not follow the display order or by using as reference for generating a motion field for frame N an arbitrary previously coded motion field or base mesh instead of the immediately previous coded one. Also highly desirable is the ability to instantly detect random access points and independently decode multiple sub-meshes that together can form a single mesh, much like subpictures in video compression.

[0202] For the above reasons, a new Base Mesh Substream format is introduced. This new format is very similar to a video coding format such as HEVC or the atlas sub-bitstream used in V3C, with the base mesh sub-bitstream also constructed using NAL units. Referring to FIG. 19, high level syntax (HLS) structures such as base mesh sequence parameter sets 1902, base mesh frame parameter sets 1904, and submesh layer 1906 are also specified. An overview of this bitstream 1900 with its different subcomponents is shown in FIG. 19. Thus, FIG. 19 is an overview of a base mesh data substream structure 1900. [0203] One of the desirable features of this design is the ability to segment a mesh into multiple smaller partitions, referred to in this document as submeshes (FIG. 20). FIG. 20 shows segmentation of a mesh 2002 into sub-meshes (2004 and 2006). These submeshes (2004, 2006) can be decoded completely independently, which can help with partial decoding and spatial random access. Although it may not be a requirement for all applications, some applications may require that the segmentation in submeshes remains consistent and fixed in time. The submeshes do not need to use the same coding type, e.g., for one frame one submesh may use intra coding while for another inter coding could be used at the same decoding instance, but it is commonly a requirement that the same coding order is used and the same references are available for all submeshes corresponding at a particular time instance. Such restrictions can help guarantee proper random access capabilities for the entire stream. An example where two submeshes are used is shown in FIG. 21.

[0204] FIG. 21 shows picture order count 0 having submesh 2102 and submesh 2104, picture order count 1 having submesh 2112 and submesh 2114, and picture order count 2 having submesh 2120 and submesh 2122. Shown also is base mesh frame parameter set 2130 and base mesh sequence parameter set 2140.

[0205] NAL unit syntax

[0206] As discussed earlier, the new bitstream is also based on NAL units, and it is similar to those of the atlas substream in V3C. The syntax is provided below.

[0207] General NAL unit syntax

[0208] The bmesh_nal_unit_header syntax element may be as follows:

[0209] NAL unit header syntax

[0210] The bmesh_nal_unit_header syntax element may be as follows:

[0211] NAL unit semantics

[0212] This section includes some of the semantics that correspond to the above syntax structures. More details would provided for syntax elements that have not been defined in complete detail.

[0213] 1. General NAL unit semantics

[0214] NumBytesInNalUnit specifies the size of the NAL unit in bytes. This value is required for decoding of the NAL unit. Some form of demarcation of NAL unit boundaries is necessary to enable inference of NumBytesInNalUnit. One such demarcation method is specified in Annex TBD for the sample stream format. Other methods of demarcation can be specified outside this document.

[0215] The mesh coding layer (MCL) is specified to efficiently represent the content of the mesh data. The NAL is specified to format that data and provide header information in a manner appropriate for conveyance on a variety of communication channels or storage media. All data are included in NAL units, each of which includes an integer number of bytes. A NAL unit specifies a generic format for use in both packet-oriented and bitstream systems. The format of NAL units for both packet-oriented transport and sample streams is identical except that in the sample stream format specified in Annex TBD each NAL unit can be preceded by an additional element that specifies the size of the NAL unit.

[0216] rbsp_byte[ i ] is the i-th byte of an RBSP. An RBSP is specified as an ordered sequence of bytes as follows:

[0217] The RBSP includes a string of data bits (SODB) as follows: when the SODB is empty (e.g., zero bits in length), the RBSP is also empty; Otherwise, the RBSP includes the SODB as follows:

[0218] l)The first byte of the RBSP includes the first (most significant, left-most) eight bits of the SODB; the next byte of the RBSP includes the next eight bits of the SODB, etc., until fewer than eight bits of the SODB remain.

[0219] 2)The rbsp_trailing_bits( ) syntax structure is present after the SODB as follows: i)The first (most significant, left-most) bits of the final RBSP byte include the remaining bits of the SODB (if any); ii)The next bit comprises a single bit equal to 1 (e.g., rbsp_stop_one_bit); iii)When the rbsp_stop_one_bit is not the last bit of a byte-aligned byte, one or more bits equal to 0 (e.g. instances of rbsp_alignment_zero_bit) are present to result in byte alignment.

[0220] Syntax structures having these RBSP properties are denoted in the syntax tables using an "_rbsp" suffix. These structures are carried within NAL units as the content of the rbsp_byte[ i ] data bytes. The association of the RBSP syntax structures to the NAL units is as specified in Table 4.

[0221] When the boundaries of the RBSP are known, the decoder can extract the SODB from the RBSP by concatenating the bits of the bytes of the RBSP and discarding the rbsp_stop_one_bit, which is the last (least significant, right-most) bit equal to 1, and discarding any following (less significant, farther to the right) bits that follow it, which are equal to 0. The data necessary for the decoding process is included in the SODB part of the RBSP.

[0222] 2. NAL unit header semantics

[0223] Similar NAL unit types, as for the atlas case, were defined for the base mesh enabling similar functionalities for random access and segmentation of the mesh. Unlike the atlas that is split into tiles, in this document we define the concept of a sub-mesh and define specific nal units that correspond to coded mesh data. In addition, NAL units that can include metadata such as SEI messages are also defined.

[0224] In particular, the base mesh NAL unit types supported are specified as follows:

[0225] The names for the types could be changed to avoid confusion with the atlas ones. [0226] Raw byte sequence payloads, trailing bits, and byte alignment syntax

[0227] 1. Base mesh sequence parameter set RBSP syntax

[0228] As with similar bitstreams, the primary syntax structure that is defined for a base mesh bitstream is a sequence parameter set. This syntax structure includes basic information about the bitstream, identifying features for the codecs supported for either the intra coded and inter coded meshes, as well as information about references

[0229] 1.1 General base mesh sequence parameter set RBSP syntax

[0230] bmsps_log2_max_mesh_frame_order_cnt_lsb_minus4, bmsps_max_dec_mesh_fram e_buffering_minus 1 , bmsps_long_term_ref_mesh_frames_flag, bmsps_num_ref_mesh_frame_list s_in_bmsps, bmesh_ref_list_struct( i ) are equivalent to those in ASPS.

[0231] bmsps_intra_mesh_codec_id indicates the static mesh codec used to encode the base meshes in this base mesh substream. It could be associated with a specific mesh or motion mesh codec through the profiles specified in the corresponding specification, or could be explicitly indicated with an SEI message as is done in the V3C specification for the video sub-bitstreams.

[0232] bmsps_intra_mesh_data_size_precision_bytes_minusl (+1) specifies the precision, in bytes, of the size of the coded mesh data.

[0233] bmsps_inter_mesh_codec_present_flag indicates when a specific codec indicated by bmsps_inter_mesh_codec_id is used to encode the inter predicted submeshes

[0234] bmsps_inter_mesh_data_size_precision_bytes_minusl(+ 1) specifies the precision, in bytes, of the size of the inter predicted mesh data. This precision is signaled considering the size of the coded mesh data and the inter predicted mesh data(e.g. motion field) can be significantly different.

[0235] bmsps_facegroup_segmentation_method indicates how facegroups could be derived for a mesh. A facegroup is a set of triangle faces in a submesh. Each triangle face is associated with a Facegroupld indicating the facegroup it belongs to. When bmsps_facegroup_segmentation_method is 0, then Facegroupld is present directly in the coded submesh. Other values indicate that the facegroup can be derived using different methodologies based on the characteristics of the stream. For example, value 1 means that there is no Facegroupld associated with any face. A value 2 means that all faces are identified with a single ID, a 3 that facegroups are identified based on the connected component method, while a value of 4 indicates that each individual face has its own unique ID. Currently ue(v) is used to indicate bmsps_facegroup_segmentation_method, but fixed length coding or partitioning to more elements could have been used instead.

[0236] 1.2 Base Mesh Profile, tier, and level syntax

[0237] bmptl_extended_sub_profile_flag providing support for sub profiles can be quite useful for further restricting the base mesh profiles depending on usage and applications.

[0238] 1.3 Base mesh frame parameter set RBSP syntax

[0239] The base mesh frame parameter set has the frame level information such as number of submeshes in the frames corresponding to one mfh_mesh_frm_order_cnt_lsb. A submesh is coded in one mesh_data_submesh_layer() and is independently decodable from other submeshes. In the case of inter frame prediction, a submesh can refer only to the submeshes with the same smh_id in its associated reference frames. The mechanism is equivalent to what is specified in 8.3.6.2.2 in V3C.

[0240] The mechanism is equivalent to the Atlas frame tile information syntax (8.3.6.2.2 in

V3C).

[0241] The bmesh_sub_mesh_information syntax element may be as follows: bmesh sub mesh information( ) { Descriptor

[0242] 1.4 Base mesh submesh layer rbsp syntax

[0243] 1.4.1 bmesh submesh layer rbsp syntax

[0244] A bmesh_submesh_layer includes a submesh information. One or more bmesh_submesh_layer_rbsp can correspond to one mesh frame indicated by mfh_mesh_frm_order_cnt_l sb .

[0245] 1.4.2 submesh header syntax

[0246] The mechanism is equivalent to the atlas tile header (8.3.6.11 in the standard)

[0247] smh_id is the id of the current submesh included in the mesh data submesh data.

[0248] smh_type indicates how the mesh is coded. When smhjype is I_SUBMESH, the mesh data is coded with the indicated static mesh codec. When smh_type is P_SUBMESH, inter prediction is used to code the mesh data.

[0249] 1.4.3 Submesh data unit

[0250] smdu_intra_sub_mesh_unit( unitSize ) includes a sub mesh unit stream of size unitSize, in bytes, as an ordered stream of bytes or bits within which the locations of unit boundaries are identifiable from patterns in the data. The format of such sub mesh unit stream is identified by a 4CC code as defined by bmptl_profile_codec_group_idc or by a component codec mapping SEI message.

[0251] smdu_inter_sub_mesh_unit( unitSize ) includes a sub mesh unit stream of size unitSize, in bytes, as an ordered stream of bytes or bits within which the locations of unit boundaries are identifiable from patterns in the data. The format of such sub mesh unit stream is identified by a 4CC code as defined by bmptl_profile_codec_group_idc or by a component codec mapping SEI message.

[0252] The current basis for the V-DMC test model uses a subdivision method to interpolate the base mesh connectivity and obtain displacement vectors.

[0253] These displacement vectors can be filtered by a wavelet transform to compress displacement vectors. These vectors are necessary to reconstruct a deformed mesh that provides higher fidelity than the reconstructed base mesh. Following a mesh spatial scalability philosophy, several scales or levels of details (LODs) are defined based on iterative “mid-point” subdivision, and displacements are computed for each vertex of the mesh at each scale. “Mid-point” subdivisionbased interpolation combined with the “linear” wavelet filter leads to relatively small and sparse displacement vectors at the third scale, and depending on the mesh content, also at the second scale.

[0254] FIG. 22 shows zoom on the encoder sub-block related to displacement image packing. Being packed into a single frame by using simple packing (e.g., Morton order), pixels corresponding to displacement vectors are not well correlated as the pixel context or neighborhhod does not match with the vertex neighborhood on the base mesh or subdivided mesh. Tools from video encoder cannot be used effectively for such kind of “video” frames. In other words, the video encoder could be seen as being used in the current V-DMC test model only as an external arithmetic coding engine. No tool from the video encoder (except SKIP blocks) is useful for encoding such data.

[0255] FIG. 23 shows displacement video frame 2302 without quantization with number of LODs=4, normalized for visualization. It is also important to note, that quantization of displacement values is performed before video encoding, and encoded losslessly in the video encoder. It is almost impossible to use quantization into the video encoder, because DCT transform, and various types of prediction (INTRA/INTER) will spread prediction/quantization errors across many vertices that are not connected by a triangle or edge. It causes significant 3D artifacts, which cannot be recovered in the decoder with traditional techniques (like deblocking, deringing, and the like).

[0256] FIG. 24 shows displacement for two frames 2402 and 2404.

[0257] Packing wavelet coefficients

[0258] The following scheme is used to pack the wavelet coefficients into a 2D image:

[0259] Traverse the coefficients from low to high frequency.

[0260] For each coefficient, determine the index of the NxM pixel block (e.g., N=M=16) in which it should be stored following a raster order for blocks.

[0261] The position within the NxM pixel block is computed by using a Morton order to maximize locality.

[0262] Other packing schemes could be used (e.g., zigzag order, raster order). The encoder could explicitly signal in the bitstream the used packing scheme (e.g., atlas sequence parameters). This could be done at patch, patch group, tile, or sequence level.

[0263] Displacement Video Encoding [0264] The proposed scheme is agnostic of which video coding technology is used. When coding the displacement wavelet coefficients, a lossless approach may be used since the quantization is applied in a separate module. Another approach is to rely on the video encoder to compress the coefficients in a lossy manner and apply a quantization either in the original or transform domain.

[0265] Only Morton, Zigzag and Raster orders are mentioned as methods to maximize locality. The possibility of applying quantization in the video encoder is also mentioned, but not the method itself.

[0266] As an example, improving the spatial correlation between displacement vectors enables better utilization of the video encoder tools (like INTRA/INTER-prediction, DCT, quantization, Rate-Distortion optimization, and the like), decreases the overall bit rate requirements, and improves the reconstruction quality.

[0267] Various embodiments, disclose a method of adaptive packing for displacement vectors, based on the projection of displacement vector positions to some predefined geometrical surface, suitable for good correlation.

[0268] An example embodiment suggests packing displacement values into a 2D frame, using a more natural (e.g., geometrical) scanning order, which improves data locality. This may be done, for example, by: using the projection of subdivision points to some geometrical primitive, for example, a predefined geometrical primitive; packing of values into a 2D box of predefined size (in an example, this size depends on the amount of initial subdivision vectors, but could be set arbitrarily smaller); and in the case of multiple points mapping to the same pixel, some of the corresponding displacement vectors may be merged together or omitted.

[0269] Some embodiments enable standard video coding tools in displacement video encoding for significant reductions in required bitrate for dynamic mesh compression, while keeping the reconstruction quality at a very high level.

[0270] V arious embodiments are described herein that cover the following: the encoding process of displacement vectors in various packing modes and define a correspondence between subdivided mesh vertices and displacement video pixels that the decoder can identify so that it may map displacement video pixels positions to the subdivided mesh vertex indices. Values being coded in the displacement video are the displacement values with or without wavelet filtering, the signalling for enabling various displacement packing modes, the signalling of parameters for particular packing modes.

[0271] Encoder embodiments

[0272] The subdivision process generates positions where displacement between the base mesh and the original surface may be computed.

[0273] After the subdivision process and the computation of displacements, the encoding may be performed with or without wavelet transform.

[0274] With wavelet transform, for maximizing correlation, each LOD may be packed separately as a dedicated video stream.

[0275] Without wavelet transform, all displacement vectors may be packed eventually into a single frame.

[0276] The packing techniques described below are applicable for both cases, but in case of wavelet-transformed data, each layer may be projected and encoded individually.

[0277] In an embodiment, the 2D positions of displacement coefficients inside the packed frame are calculated by using a projection to some predefined geometry (like a sphere, cylinder, ellipsoid, cube, and the like) surrounding the encoded mesh, possibly reducing the number of displacement values by eliminating unused points into the packed frame of a predefined size.

[0278] This projection geometry or surface is needed to group displacement values properly into a regular grid that may be efficiently encoded by a video codec. All displacements are already pre-calculated in the V-DMC encoder. The positions of the corresponding vertices in 3D are known in the encoder and decoder for the base mesh and the subdivided mesh predictions because it is a result of the subdivision process operating on the reconstructed decoded base mesh (e.g., close loop). The density (e.g., the resolution) of the resulting packed displacement image plane may be varied as an extended V-DMC encoding optimization process to support smoother encoding of neighboring displacement values.

[0279] The central axis of the input mesh may, for example, be used to define the mapping of projected vertices into a 2D frame. [0280] Position, rotation angle and scale (size) of predefined geometry can optionally be signalled to improve compression performance and guarantee, that displacements positions are not overlapping. Alternatively, overlapping may be handled by a simple placement rule, which is the same in encoder and decoder.

[0281] Parameters of the projection primitives are also very useful for the temporal alignment of geometry frames.

[0282] FIG. 25 shows a 2D example of a projection to a sphere (here illustrated with a circle) and the corresponding packing into a one dimensional (ID) line, corresponding to a row or column of a displacement picture, in accordance with an embodiment. FIG. 25 demonstrates an example of the packing process for a 2D case. The packing process is similar for a 3D case. A subdivided mesh 2502 includes displacement values at each vertex. These values may be projected 2504 to a geometrical primitive, which provides a possibility to store each vertex with or without overlapping and according to its position in geometrical space.

[0283] After the projection of displacements and the mapping into a 2D image, it is necessary to pack displacement values 2506 into the displacement picture (e.g., a ID line 2508), which is encoded by the video encoder. In an example, the ID line 2508 corresponds to a slice of an object.

[0284] FIG. 26 illustrates mapping of an irregularly sampled set of points to a regular grid, in accordance with an embodiment. To avoid self-intersections and inaccurate grouping (which in some case lead to noise in certain areas of an object), it is proposed to use a FacelD (face index) or vertex index as ordering parameter for 2D packing process. As shown in FIG. 26, when a vertex belongs to several faces with different FacelDS or face indices, a smallest index is preferred. For example, FaceID=0 2602 and FaceID=l 2604 may be ordered and grouped to maximize localization by first using relative 2D coordinates in the projection plane 2606 and then using the FacelD or face index to order them in a regular pixel grid 2608. In an example, the pixel grid 2608 corresponds to a row or column of a displacement picture.

[0285] As possible alternative to FacelDs, a geodesic distance between vertices (e.g., a shortest path on the surface, between the last mapped and the new candidate vertex in the pixel grid raster scan) may be used as ordering parameter when there is an ambiguity on which vertex a pixel may be mapped first at a decoder side or on which pixel a vertex may be mapped first at an encoder side.

[0286] In another embodiment, instead of projecting to a predefined geometry, the base-mesh may be used as the pre-defined geometry with some filtering. For example, the base mesh may be smoothed (low-pass filtering, for example, a mesh Laplacian smoothing filter) and uniformly scaled from the center of gravity. At maximum scaling factor values, the scaled base mesh becomes a sphere and for lower values it is a smoothed version of the base-mesh. With scaling factor=1.0, it is the base-mesh itself. This process may be represented as a smooth transition from the base-mesh to a sphere with a predefined radius.

[0287] The base mesh itself is already a good prediction of the coded surface, and it is possible to project all displacement values to the smoothed and scaled base-mesh. The mapping to a plane can be performed by segmenting the base mesh faces to a V3C atlas by clustering faces based on their normal direction and alignment with a V3C projection plane. The smoothing and scale factor ease the segmentation of the base mesh in large regular patches compared to the original base mesh, which may be more difficult to segment in large patches based on the normal directions. The segmentation may be signalled with V3C patches and patch data units.

[0288] In another embodiment, a simple spiral scanning of displacement values from topmost (e.g., based on the base mesh central axis) displacement position to bottom displacement position may be used to order pixels on the pixel grid after projection on the 2D plane. The number of spiral rounds may be selected according to the desired resolution of the displacement frame. For example, 2D coordinates of displacement values are formed as an index of the neighbor in the current spiral round (X) and number of spiral rounds (Y).

[0289] In another embodiment, the edge-breaker base mesh traversal, when present in the base mesh substream, may be used to map displacements to order the vertices in the regular grid.

[0290] Similarly to the medial axis transform, a set of concentric layers of the encoded mesh shape may be calculated by their distance to a central axis or to a skeleton, that may be signalled in the atlas frame parameter set and used for mapping and packing displacement to a 2D frame. The spiral approach described before may be used to order vertices inside a layer. Another option is to use the geodesic distance between vertices to the closest extremity of the mesh and group connected vertices with a similar geodesic distance to map and pack the corresponding displacements on 2D frame layer or tile. In such cases, the 3D position of the vertices may be used to pack the displacements corresponding to the vertices into a 2D frame, e.g., by ordering two of the 3D coordinates X,Y,Z of the vertices along the U,V coordinates of the 2D frame, the selected coordinates and mapping to UV may be flagged per layer along the bitstream.

[0291] In another embodiment, the input mesh is partitioned into submeshes that map to a single geometric primitive, where the partitioning and choice of geometric primitive is an encoder choice or optimization that may optimally be made consistent over a group of frames. The partitioning may, for example, be signalled as face clusters and patches, and the geometric primitive parameters may be signalled in the patch data unit. The displacement image may be encoded as a V3C atlas. Each patch is packed into the atlas and its coordinates are encoded in their patch data unit. The attribute displacement video component may therefore be signalled similarly to V3C geometry components without the need of an occupancy video component.

[0292] The decoder requires to extract the signalled metadata and to assign displacement pixel positions to vertex indices in order to apply the decoded displacements to the subdivided mesh and produce the output reconstructed displaced mesh.

[0293] Signalling embodiments

[0294] In an embodiment, it is necessary to signal the packing mode and the type of predefined geometry primitive.

[0295] The starting position, rotation angle and size (scaling factor) of the predefined geometry can optionally be specified.

[0296] The signalling of the displacement packing information may be performed by all or a sub-set of the syntax elements described below:

[0297] asps_vmc_ext_displacement_packing_mode indicates the packing mode according the following table:

[0298] asps_vmc_ext_displacement_projection_primitive indicates the projection primitive according to the following table:

[0299] asps_vmc_ext_displacement_projection_delta_x indicates the position X of projection primitive as deltaX to central point of base-mesh.

[0300] asps_vmc_ext_displacement_projection_delta_y indicates the position Y of projection primitive as deltaY to central point of base-mesh.

[0301] asps_vmc_ext_displacement_projection_delta_z indicates the position Z of projection primitive as deltaZ to central point of base-mesh.

[0302] asps_vmc_ext_displacement_projection_rotation_angle indicates the rotation angle of projection primitive (to central axis).

[0303] asps_vmc_ext_displacement_projection_scale indicates the scale factor of projection primitive, divided by 16 (to describe the density). [0304] asps_vmc_ext_displacement_projection_frame_update_enabled_fl ag indicates that the displacement projection semantics may be updated in the atlas frame parameter set.

[0305] Optionally, the same semantics may be signalled to the atlas frame parameter set (afps) when the information changes at a given frame.

[0306] FIG. 27 is an apparatus 2700 which may be implemented in hardware, configured to implement adaptive displacement packing for dynamic mesh coding, based on any of the examples described herein. The apparatus comprises a processor 2702, at least one memory 2704 (memory 2704 may be non-transitory, transitory, non-volatile, or volatile) including computer program code 2705, wherein the at least one memory 1404 and the computer program code 2705 are configured to, with the at least one processor 2702, cause the apparatus to implement circuitry, a process, component, module, function, coding, and/or decoding (collectively 2706) to implement adaptive displacement packing for dynamic mesh coding, based on the examples described herein. The apparatus 2700 is further configured to provide or receive signaling 2707, based on the signaling embodiments described herein. The apparatus 2700 optionally includes a display and/or I/O interface 2708 that may be used to display an output (e.g., an image or volumetric video) of a result of coding/decoding 2706. The display and/or I/O interface 2708 may also be configured to receive input such as user input (e.g. with a keypad, touchscreen, touch area, microphone, biometric recognition, one or more sensors etc.). The apparatus 2700 also includes one or more communication interfaces (I/F(s)) 2710, such as a network (N/W) interface. The communication I/F(s) 2710 may be wired and/or wireless and communicate over a channel or the Internet/other network(s) via any communication technique. The communication I/F(s) 2710 may comprise one or more transmitters and one or more receivers. The communication I/F(s) 2710 may comprise standard well-known components such as an amplifier, filter, frequency-converter, (de)modulator, and encoder/decoder circuitry(ies) and one or more antennas. In some examples, the processor 2702 is configured to implement item 2706 and/or item T1Q7 without use of memory 2704.

[0307] The apparatus 2700 may be a remote, virtual or cloud apparatus. The apparatus 2700 may be either a writer or a reader (e.g. parser), or both a writer and a reader (e.g. parser). The apparatus 2700 may be either a coder or a decoder, or both a coder and a decoder (codec). The apparatus 2700 may be a user equipment (UE), a head mounted display (HMD), or any other fixed or mobile device.

[0308] The memory 2704 may be implemented using any suitable data storage technology, such as semiconductor based memory devices, flash memory, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The memory 2704 may comprise a database for storing data. Interface 2712 enables data communication between the various items of apparatus 2700, as shown in FIG. 27. Interface 2712 may be one or more buses. For example, the interface 2712 may be one or more buses such as address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, and the like. Computer program code 2705 may comprise object-oriented software. The apparatus 2700 need not comprise each of the features mentioned, or may comprise other features as well. The apparatus 2700 may be an embodiment of and have the features of any of the apparatuses shown in FIG. 1A, FIG. IB, FIG. 5, FIG. 6, FIG. 11, FIG. 14, FIG. 23, FIG. 24, or any of the other figures described and shown herein.

[0309] FIG. 28 shows a schematic representation of non-volatile memory media 2800a (e.g. computer/compact disc (CD) or digital versatile disc (DVD)) and 2800b (e.g. universal serial bus (USB) memory stick) storing instructions and/or parameters 2802 which when executed by a processor allows the processor to perform one or more of the steps of the methods described herein.

[0310] FIG. 29 is a method 2900 to implement the examples described herein. The method 2900 may be performed with an encoder apparatus (e.g., 1101, 1201, 2301, 2700). At 2910, the method 2900 includes using projection of subdivision points to a geometrical surface. At 2920, the method 2900 includes packaging values of the subdivision points into a two dimensional (2D) box. At 2930, the method 2900 includes merging or omitting some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel.

[0311] In an embodiment, the method 2900 may further include calculating 2D positions of displacement coefficients inside a packed frame by using a projection to the geometrical surface surrounding an encoded mesh; reducing a number of displacement values by eliminating unused points into the packed frame of a predefined size

[0312] FIG. 30 is a method 3000 to implement the examples described herein. The method 3000 may be performed with a decoder apparatus (e.g., 1401, 1501, 2401, 2700). At 3010, the method includes receiving a bitstream comprising metadata. At 3020, the method includes extracting the metadata. At 3030, the method includes using the extracted metadata to assign displacement pixel positions to vertex indices in order to apply decoded displacements to a mesh comprising subdivisional points. At 3040, the method includes generating an output reconstructed displaced mesh based at least on the assignment.

[0313] In an embodiment, the method 3000 may further include merging or omitting some displacement vectors corresponding to the multiple subdivision points when it is determined that multiple subdivision point map on to the same pixel.

[0314] Some of the non-limiting benefits include, reduced bitrate requirements, improved quality of reconstructed mesh, and possibility to effectively use of video codecs for geometry compression.

[0315] References to a ‘computer’, ‘processor’, etc. should be understood to encompass not only computers having different architectures such as single/multi-processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field- programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry. References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device such as instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device, and the like.

[0316] As used herein, the term ‘circuitry’ may refer to any of the following: (a) hardware circuit implementations, such as implementations in analog and/or digital circuitry, and (b) combinations of circuits and software (and/or firmware), such as (as applicable): (i) a combination of processor(s) or (ii) portions of processor(s)/software including digital signal processor(s), software, and memory(ies) that work together to cause an apparatus to perform various functions, and (c) circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even when the software or firmware is not physically present. As a further example, as used herein, the term ‘circuitry’ would also cover an implementation of merely a processor (or multiple processors) or a portion of a processor and its (or their) accompanying software and/or firmware. The term ‘circuitry’ would also cover, for example and when applicable to the particular element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, or another network device. Circuitry may also be used to mean a function or a process, such as one implemented by an encoder or decoder, or a codec.

[0317] In the figures, arrows between individual blocks represent operational couplings therebetween as well as the direction of data flows on those couplings.

[0318] It should be understood that the foregoing description is only illustrative. Various alternatives and modifications may be devised by those skilled in the art. For example, features recited in the various dependent claims could be combined with each other in any suitable combination(s). In addition, features from different embodiments described above could be selectively combined into a new embodiment. Accordingly, the description is intended to embrace all such alternatives, modifications and variances which fall within the scope of the appended claims.

[0319] The following acronyms and abbreviations that may be found in the specification and/or the drawing figures are defined as follows:

2D and variants two-dimensional

3D and variants three-dimensional

3DG 3D graphics coding group

4CC four character code

6DOF six degrees of freedom

ACL atlas coding layer

AD atlas data afoc atlas frame order countAFPS and variantsatlas frame parameter set ai attribute index ap attribute packing

AR augmented reality

ASIC application-specific integrated circuit

ASPS and variants atlas sequence parameter set ath atlas tile header

AUD access unit delimiter

Aux auxiliary

AVD attribute video data

BFPS and variants base mesh frame parameter set

BLA broken link access

BMCL base mesh coding layer bmesh base mesh

BMFPS base mesh frame parameter set bmptl base mesh profile, tier, and level

BMSPS and variants base mesh sequence parameter set b(n) n bits bsmi base mesh sub mesh information

BSPS base mesh sequence parameter set

CD compact disc

CfP call for proposal

CGI computer-generated imagery ent count

CRA clean random access DCT discrete cosine transform

DVD digital versatile disc

EOB end of bitstream

EOS end of sequence

ESEI essential supplemental enhancement information

Exp exponential ext extension

FAMC frame-based animated mesh compression

FD filler data

FDIS final draft international standard f(n) float having n bits e.g. f(l) glTF graphics language transmission format

GVD geometry video data

H.264 advanced video coding video compression standard

H.265 high efficiency video coding video compression standard

HE VC high efficiency video coding

HLS high level syntax

HMD head mounted display

ID and variants identifier ide indication

Idx index

IDR instantaneous decoding refresh

IEC International Electrotechnical Commission

I/F interface

I/O input/output

IRAP intra random access point

ISO International Organization for Standardization

LOD and variants level of detail

LP leading picture

Isb least significant bit

Itp lifting transform parameters

MCL mesh coding layer

MD mesh data mdu mesh intra patch data unit mfoc mesh_frame_order_count midu mesh inter data unit miv and variants MPEG immersive video mmdu mesh merge data unit mpdu mesh patch data unit

MPEG moving picture experts group

MPEG-I MPEG immersive

MR mixed reality mrdu mesh raw data unit msh mesh

MUX multiplex

NAL and variants network abstraction layer

NBMCL non-BMCL

NSEI non-essential supplemental enhancement information

N/W network

OVD occupancy video data poc picture order count

Pos position

Quant quantization

RADL random access decodable leading

RASL random access skipped leading rbsp and variants raw byte sequence payload ref reference

RSV reserved

SC subcommittee

SEI supplemental enhancement information se(v) signed integer O-th order Exp-Golomb coding with left bit first (e.g., most significant bit first)) smdu submesh data unit smh and variants submesh

SODB string of data bits

STSA step-wise temporal sublayer access

TBD to be determined

TSA temporal sublayer access ti tile information u(n) unsigned integer using n bits, e.g. u(l), u(2)

UE user equipment ue(v) unsigned integer exponential Golomb coded syntax element with the left bit first

UNSPEC unspecified

USB universal serial bus uv and variants coordinate texture, where “U” and “V” are axes of a 2D texture

V3C visual volumetric video-based coding

V-DMC or VDMC video-based dynamic mesh coding vmc volumetric mesh compression

VPCC or V-PCC video-based point cloud coding/compression

VPS V3C parameter set

VR virtual reality vuh volumetric unit header

VVC versatile video coding

WG working group

Previous Patent: MULTISCALE INTER-PREDICTION FOR DYNAMIC POINT CLOUD COMPRESSION

Next Patent: SAMPLING STRUCTURE FOR MATERIALS SUSPENDED IN WATER