Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHODS AND SYSTEMS FOR RENDERING VIDEO GRAPHICS USING SCENE SEGMENTATION
Document Type and Number:
WIPO Patent Application WO/2024/086382
Kind Code:
A1
Abstract:
A system and method for rendering graphics. The method includes generating and receiving graphics data from a 3D scene rasterized to determine at least a first object. Primitive data can be generated by a vertex shader using vertex data of the scene. A provided position vector can define a center of an overall cluster data structure in three axes, and provided subdivision scalars can divide the structure in the three axes. Clusters can be divided by a bounding box using at least the subdivision scalars and extents of the structure. Geometric position data can be mapped to an associated cluster defined by a center cluster position. Cull masks can be generated for each object using generated cluster data, and the scene can be rendered using at least the cull masks and the cluster data. Other embodiments include corresponding systems and computer programs configured to perform the actions of the methods.

Inventors:
NING PAULA (US)
SANDOVAL JAVIER (US)
LI CHEN (US)
SUN HONGYU (US)
Application Number:
PCT/US2023/062046
Publication Date:
April 25, 2024
Filing Date:
February 06, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INNOPEAK TECH INC (US)
International Classes:
G06T15/00; G06T7/70; G06T15/06; G06T15/08; G06T15/20; G06T15/80
Attorney, Agent or Firm:
BRATSCHUN, Thomas D. et al. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1. A method for rendering graphics on a computer device, the methodomprising: receiving a plurality of graphics data associated with a three-dimensional (3D) ene rasterized to determine at least a first object that would be intersected by a primary ray st through each of a plurality of screen-space pixels, the plurality of graphics data including plurality of vertex data associated with a plurality of vertices in the 3D scene; generating a plurality of screen-space primitive data using at least the pluralityf vertex data and a vertex shader, the plurality of primitive data comprising at leastarycentric coordinates and a triangle index; providing a position vector defining the center of an overall cluster dataructure in three axes; providing a subdivision scalar used to uniformly divide the overall cluster dataructure along the three axes; creating clusters by dividing a bounding box using at least the subdivision alar and extents of the overall data structure; mapping geometric position data to an associated cluster defined by theosition of a cluster center; generating cluster data corresponding to the clusters by one or more compute aders executing a thread for each of the clusters; generating cull masks for each object using the cluster data; and rendering the first object using at least the cull masks and cluster data by a ader.

2. The method of claim 1 further comprising storing ReSTIR reservoirata.

3. The method of claim 1 further comprising storing ordered light lists.

4. The method of claim 1 wherein the center of the overall cluster dataructure is independent of a camera position.

5. The method of claim 1 wherein the center of the overall cluster dataructure is at a camera position.

6. The method of claim 1 wherein the cull masks comprise cull-to masksnd/or cull-from masks.

7. The method of claim 1 wherein the cull masks are assigned based onusters corresponding to equal scene segments.

8. The method of claim 1 wherein the cull masks are assigned based on a aterial and on a camera-relative distance of each object.

9. A system for rendering video graphics, the system comprising: a storage comprising executable instructions; a memory; and a processor coupled to the storage and the memory, the processor beingonfigured to: generate a plurality of graphics data associated with a three- dimensional (3D) scene rasterized to determine at least a first object that would be intersected by a primary ray cast through each of a plurality of screen-space pixels, the plurality of graphics data including a plurality of vertex data associated with a plurality of vertices in the 3D scene; generate a plurality of screen-space primitive data using at least the plurality of vertex data and a vertex shader, the plurality of primitive data comprising at least barycentric coordinates and a triangle index; provide a position vector defining the center of an overall cluster data structure in three axes; provide a subdivision scalar used to uniformly divide the overall cluster data structure along the three axes; create clusters by dividing a bounding box using at least the subdivision scalar and extents of the overall data structure; map geometric position data to an associated cluster defined by the position of a cluster center; generate cluster data corresponding to the clusters by one or more compute shaders executing a thread for each of the clusters; generate cull masks for each object using the cluster data; and render the first object using at least the cull masks and the cluster data

10. The system of claim 9 wherein the processor comprises a central ocessing unit (CPU) and a graphics processing unit (GPU).

11. The system of claim 10 wherein the memory is shared by the CPU and e GPU.

12. The system of claim 9 wherein the memory comprises a frame buffer r storing the first object.

13. The system of claim 9 further comprising a display configured to splay the first object at a refresh rate of at least 24 frames per second.

14. A method for rendering graphics on a computer device, the methodomprising: generating a three-dimensional (3D) scene including a first object; receiving a plurality of graphics data associated with the 3D scene; providing a position vector defining the center of an overall cluster dataructure in three axes; providing subdivision scalars used to divide the overall cluster data structureong the three axes; creating clusters by dividing a bounding box using at least the subdivision alars and extents of the overall data structure; mapping geometric position data to an associated cluster defined by a positionf a cluster center; generating cluster data corresponding to the clusters by one or more compute aders executing a thread for each of the clusters; generating cull masks for each object using the cluster data; and rendering the 3D scene using at least the cull masks and cluster data by a ader.

15. The method of claim 14 wherein the 3D scene is rasterized toetermine at least a first object that would be intersected by a primary ray cast through eachf a plurality of screen-space pixels, the plurality of graphics data including a plurality ofertex data associated with a plurality of vertices in the 3D scene.

16. The method of claim 15 further comprising generating a plurality of reen-space primitive data using at least the plurality of vertex data and a vertex shader, theurality of primitive data comprising at least barycentric coordinates and a triangle index.

17. The method of claim 14 wherein the bounding box is uniformlyvided.

18. The method of claim 14 the bounding box is divided according to aorizon in the 3D scene.

19. The method of claim 14 further comprising obtaining settings forviding the bounding box.

20. The method of claim 14 further comprising defining the cull masksased at least on opacities of objects in the 3D scene.

Description:
METHODS AND SYSTEMS FOR RENDERING VIDEO GRAPHICS USING SCENE SEGMENTATION

CROSS-REFERENCES TO RELATED APPLICATIONS

[0001] This application claims priority to U.S. Patent Application Ser. No. 63/418,050 (the " '050 Application”), filed October 21, 2022 (attorney docket no. 1282.INNOPEAK- 1022-099-P), entitled, "Scene segmentation for Mobile" the disclosure of which is incorporated herein by reference in its entirety for all purposes.

BACKGROUND OF THE INVENTION

[0001] As the standards of video graphics rise each year, the resource costs of rendering such video graphics continue to rise as well. These costs are particularly important to optimize in real-time applications (RTAs), such as video games, video conferencing, virtual reality (VR) applications, and extended reality (XR) applications. Additionally, because use of such RTAs in mobile devices has become widespread, it is increasingly desirable to improve the quality of video graphics in mobile applications. However, compared to desktop computers, mobile devices have limited memory capacity and bandwidth, which presents challenges to achieving adequate rendering performance. There are various solutions to address the memory-intensive nature of video graphics rendering, but they have been inadequate, as described below.

[0002] Therefore, new and improved systems and methods for rendering video graphics are desired.

BRIEF SUMMARY OF THE INVENTION

[0003] The present invention is directed to graphics rendering systems and methods. According to a specific embodiment, the present invention provides a method that utilizes world-space sampling clusters, procedural instance cull masks, and instance culled clusters. There are other embodiments as well.

[0004] Embodiments of the present invention can be implemented in conjunction with existing systems and processes. For example, a rendering system configuration and its related methods according to the present invention can be used in a wide variety of systems, including virtual reality (VR) systems, mobile devices, and the like. Additionally, various techniques according to the present invention can be adopted into existing systems via integrated circuit fabrication, operating software, and application programming interfaces (APIs). There are other benefits as well.

[0005] A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions. One general aspect includes a method for rendering graphics on a computer device. The method also includes receiving a plurality of graphics data associated with a three-dimensional (3D) scene including a plurality of vertex data associated with a plurality of vertices in the 3D scene. The method also includes generating a plurality of screen-space primitive data using at least the plurality of vertex data and a vertex shader, the plurality of primitive data may include at least barycentric coordinates and a triangle index. The method also includes providing a position vector defining the center of an overall cluster data structure in three axes. The method also includes providing a subdivision scalar used to uniformly divide the overall cluster data structure along the three axes. The method also includes creating clusters by dividing a bounding box using at least the subdivision scalar and extents of the overall data structure. The method also includes mapping geometric position data to an associated cluster defined by the position of a cluster center. The method also includes generating cluster data corresponding to the clusters by one or more compute shaders executing a thread for each of the clusters. The method also includes generating cull masks for each object using the cluster data. The method also includes rendering the objects represented in the screen-space primitive buffer using at least the cull masks and cluster data by a shader. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.

[0006] Implementations may include one or more of the following features. The method may include storing Reservoir-Based Spatio-Temporal Importance Sampling (hereafter ReSTIR) reservoir data. The method may include storing ordered light lists. The center of the overall grid is independent of a camera position. The center of the overall grid is at a camera position. The method may include storing cull masks, which may include cull-to masks and/or cull-from masks. The cull masks may be assigned based on clusters corresponding to uniformly sized scene segments. Cull masks may also be assigned based on a material and on a camera-relative distance of each object. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.

[0007] One general aspect includes a system for rendering video graphics. The system also includes a storage may include executable instructions. The system also includes a memory. The system also includes a processor coupled to the storage and the memory, the processor being configured to: generate a plurality of graphics data associated with a 3D scene including a plurality of vertex data associated with a plurality of vertices in the 3D scene; generate a plurality of screen-space primitive data using at least the plurality of vertex data and a vertex shader, the plurality of primitive data may include at least barycentric coordinates and a triangle index; provide a position vector defining the center of an overall cluster data structure in three axes; provide a subdivision scalar used to uniformly divide the overall cluster data structure along the three axes; create clusters by dividing a bounding box using at least the subdivision scalar and extents of the overall data structure; map geometric position data to an associated cluster defined by the position of a cluster center; generate cluster data corresponding to the clusters by one or more compute shaders executing a thread for each of the clusters; generate cull masks for each obj ect using the cluster data; and render the objects represented in the screen-space primitive buffer using at least the cull masks and cluster data by a shader. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods

[0008] Implementations may include one or more of the following features. The system where the processor may include a central processing unit (CPU) and a graphics processing unit (GPU), where the GPU is configured to The memory is shared by the CPU and the CPU. The memory may include a frame buffer for storing the first object. The system may include a display configured to display the first object at a refresh rate of at least 24 frames per second. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.

[0009] One general aspect includes a method for rendering graphics on a computer device. The method also includes generating a 3D scene including a first object. The method also includes receiving a plurality of graphics data associated with the 3d scene. The method also includes providing a position vector defining the center of an overall cluster data structure in three axes. The method also includes providing subdivision scalars used to divide the overall cluster data structure along the three axes. The method also includes creating clusters by dividing a bounding box using at least the subdivision scalars and extents of the overall data structure. The method also includes mapping geometric position data to an associated cluster defined by a position of a cluster center. The method also includes generating cluster data corresponding to the clusters by one or more compute shaders executing a thread for each of the clusters The method also includes generating cull masks for each object using the cluster data The method also includes rendering the 3D scene using at least the cull masks and cluster data by a shader. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.

[0010] Implementations may include one or more of the following features. The method where the 3D scene is rasterized to determine the first object that would be intersected by a primary ray cast through each pixel, the plurality of graphics data including a plurality of vertex data associated with a plurality of vertices in the 3D scene. The method may include generating a plurality of primitive data using at least the plurality of vertex data and a vertex shader, the plurality of primitive data may include at least position data. The bounding box is uniformly divided. The method bounding box is divided according to a horizon in the 3D scene. The method may include obtaining settings for dividing the bounding box. The method may include defining cull masks based at least on opacities of objects in the 3D scene. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.

[0011] It is to be appreciated that embodiments of the present invention provide many advantages over conventional techniques. Among other things, the present invention provides configurations and methods for graphics rendering systems that reduce memory bandwidth load and improve performance using world-space sampling clusters and procedural instance cull masks. Additionally, the present invention implements instance culled clusters by sharing segmentation data for improved performance of specific applications.

[0012] The present invention achieves these benefits and others in the context of known technology. However, a further understanding of the nature and advantages of the present invention may be realized by reference to the latter portions of the specification and attached drawings. BRIEF DESCRIPTION OF THE DRAWINGS

[0013] Figure 1 is a simplified diagram illustrating a mobile device configured for rendering video graphics according to embodiments of the present invention.

[0014] Figure 2 is a simplified flow diagram illustrating a conventional forward pipeline for rendering video graphics.

[0015] Figure 3 is a simplified flow diagram illustrating a conventional hybrid pipeline for rendering video graphics

[0016] Figures 4A to 4D are simplified diagrams illustrating cluster segmentation methods according to embodiments of the present invention.

[0017] Figure 5 is a simplified flow diagram illustrating a method for rendering graphics according to an embodiment of the present invention.

[0018] Figure 6 is a simplified flow diagram illustrating a method for rendering graphics according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

[0019] The present invention is directed to graphics rendering systems and methods. According to a specific embodiment, the present invention provides a method and system that utilizes world-space sampling clusters, procedural instance cull masks, and instance culled clusters. The present invention can be configured for real-time applications (RTAs), such as video conferencing, video gaming, virtual reality (VR), and extended reality (XR) applications. There are other embodiments as well.

[0020] Mobile and XR applications often have to deal with a variety of constraints when it comes to graphics processing. These constraints can impact the visual quality and performance of the application, and can make it challenging to deliver a smooth and engaging user experience. One constraint is the limited processing power of mobile devices. Compared to desktop computers, most smartphones and tablets have relatively small and energy- efficient processors that are not designed for intensive graphics processing tasks. As a result, mobile applications often have to make use of optimized algorithms and techniques to render graphics efficiently and avoid overloading the device's hardware. Another constraint is the limited memory and storage available on mobile devices. Most smartphones and tablets have limited and shared memory, which can limit the complexity and detail of the graphics that can be rendered, and can require mobile applications to use efficient data structures and algorithms to minimize the amount of memory and storage they use. A third constraint is the limited battery life of mobile devices. Graphics processing can be a significant drain on a device's battery, and mobile applications must be designed to minimize their impact on battery life to avoid running out of power during use. This can involve techniques such as reducing the frame rate, using lower-resolution graphics, or disabling certain graphics features when not needed. Another constraint is the limited bandwidth and connectivity of mobile networks Many mobile applications rely on network connectivity to access data and resources, but mobile networks can be slow and unreliable, especially in areas with poor coverage This can impact the performance of graphics-intensive applications, and can require the use of techniques such as data compression and caching to minimize the amount of data that needs to be transferred over the network. To deliver a smooth and engaging user experience, mobile applications must be carefully designed to take these constraints into account and use optimized algorithms and techniques to render graphics efficiently and effectively.

[0021] It is to be appreciated that embodiments of the present invention provide techniques — involving, but not limited to, spatial partitioning and forward rendering — to improve graphic rendering in mobile devices. Spatial partitioning is a technique used in computer graphics to improve rendering performance by dividing the 3D space into smaller, manageable regions. In real-time rendering, this is achieved by organizing the scene's objects into a data structure that allows for efficient spatial queries by only traversing geometry relevant to the current query. One data structure for spatial partitioning in forward rendering is a uniform spatial grid This divides the 3D space into a regular grid of cells, with each cell containing a list of objects that intersect with it. When rendering, the camera's view frustum is used to determine which cells are in view and only objects in those cells are added to a list of objects that need to be rendered. This avoids the need to traverse the entire scene and can significantly reduce the number of objects that need to be considered for rendering. Another data structure commonly used for spatial partitioning in forward rendering is the binary space partitioning (BSP) tree. This organizes the scene's objects into a hierarchical tree structure, with each node representing a plane that divides the space into two regions. When rendering, the camera's view frustum is used to determine which nodes are in view and only those nodes are traversed to find the objects that need to be rendered. This allows for more efficient culling of objects outside of the view frustum and can further reduce the number of objects that need to be considered for rendering. In addition to improving rendering performance, spatial partitioning can also be used for other tasks such as visibility determination, collision detection, and lighting calculations. [0022] Forward+ rendering is a technique used in computer graphics to improve the performance and visual quality of real-time rendering. It is an extension of the traditional forward rendering approach, which involves rendering each object in the scene individually and by fully shading every pixel covered by the object. Forward+ rendering uses a minimal prepass to generate at least a screen-space depth buffer prior to the full shading pass. This depth buffer is used to reject executions of the more expensive main shader that would render pixels that are occluded due to having a more distant depth value than the value in the depth buffer. The forward+ pipeline may be further extended by using a spatial partitioning technique to divide the 3D space into smaller regions, such as a grid or tree structure The number of lights in a scene are grouped up in screen space subdivisions rather than world space subdivisions. A set of pixels (hereafter “gridcell”) will contain a fixed number of light sources attached to them. During rendering, each pixel queries only the lights associated with its assigned subdivision instead of the full list of lights in the scene. This allows for efficient culling of objects outside of the view frustum and reduces the number of objects that need to be considered for rendering Once the objects in view have been determined, per-gridcell light lists are generated using a light culling step to only consider lights that affect those objects. This further reduces the number of lights that need to be processed and can save computational resources Next, a light rendering step computes the lighting for each object This involves selecting a light from the light list, evaluating the light's effect on the object's surface, accumulating the resulting colors, and repeating for other lights as desired. This step can be performed in parallel on the GPU, allowing for efficient processing of multiple lights at once. Finally, forward+ rendering uses a shading step to apply the computed lighting to the objects and generate the final image. This step can also be performed on the GPU, allowing for real-time rendering of the scene.

[0023] Forward+ rendering with screen-space light lists provides several benefits compared to traditional forward rendering. It allows for more efficient culling of objects and lights, which can improve rendering performance and reduce memory usage. It also enables the use of more lights in the scene, which can improve the visual quality of the lighting. Additionally, the parallel processing on the GPU allows for real-time rendering of complex scenes. However, this rendering architecture also has some limitations and trade-offs. One disadvantage is the overhead of constructing and maintaining the spatial partitioning data structure. This can be particularly expensive for dynamic scenes with constantly changing object positions and can impact overall rendering performance if not done efficiently. Another issue is the choice of data structure and cell size. The optimal data structure and cell size will depend on the characteristics of the scene and the desired performance, but finding the right balance can be challenging. Depth prepasses and light lists are both valuable techniques for improving the performance and visual quality of real-time rendering. However, these methods do not straightforwardly translate to ray tracing, since ray traced rendering effects rely on the entire scene when rendering. A lot of lighting data will be missing from the rendering process if screen-space light lists are used in a path tracer causing artifacts or other problems when rendering

[0024] It is to be appreciated that embodiments of the present invention, as described in further details below, efficiently implement scene segmentation and forward rendering techniques for mobile applications.

[0025] The following description is presented to enable one of ordinary skill in the art to make and use the invention and to incorporate it in the context of particular applications. Various modifications, as well as a variety of uses in different applications will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to a wide range of embodiments. Thus, the present invention is not intended to be limited to the embodiments presented, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein

[0026] In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without necessarily being limited to these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.

[0027] The reader’s attention is directed to all papers and documents which are filed concurrently with this specification and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference. All the features disclosed in this specification, (including any accompanying claims, abstract, and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

[0028] Furthermore, any element in a claim that does not explicitly state “means for” performing a specified function, or “step for” performing a specific function, is not to be interpreted as a “means” or “step” clause as specified in 35 U.S.C. Section 112, Paragraph 6. In particular, the use of “step of’ or “act of’ in the Claims herein is not intended to invoke the provisions of 35 U.S.C. 112, Paragraph 6.

[0029] Please note, if used, the labels left, right, front, back, top, bottom, forward, reverse, clockwise and counter clockwise have been used for convenience purposes only and are not intended to imply any particular fixed direction. Instead, they are used to reflect relative locations and/or directions between various portions of an object

[0030] Figure 1 is a simplified diagram illustrating mobile device 100 that is configured to perform graphic rendering with scene segmentation This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications.

[0031] As shown, the mobile device 100 can be configured within housing 110 and can include camera device 120 (or other image or video capturing device), processor device 130, memory device 140 (e.g., volatile memory storage), and storage device 150 (e.g., permanent memory storage). Camera 120 can be mounted on housing 110 and be configured to capture an input image. The input image can be stored in memory 140, which may include a random- access memory (RAM) device, an image/video buffer device, a frame buffer, or the like. Various software, executable instructions, and files can be stored in storage device 150, which may include read-only memory (ROM), a hard drive, or the like. Processor 130 may be coupled to each of the previously mentioned components and be configured to communicate between these components.

[0032] In a specific example, processor 130 includes a central processing unit (CPU), a network processing unit (NPU), or the like. Device 100 may also include graphics processing unit (GPU) 132 coupled to at least processor 130 and memory 140. In an example, memory 140 is configured to be shared between processor 130 (e g , CPU) and GPU 132, and is configured to hold data used by an application when it is run. As memory 140 is shared, it is important to use memory 140 efficiently. For example, a high memory usage by GPU 132 may negatively impact system performance.

[0033] Device 100 may also include user interface 160 and network interface 170. User interface 160 may include display region 162 that is configured to display text, images, videos, rendered graphics, interactive elements, etc. Display 162 may be coupled to the GPU 132 and may also be configured to display at a refresh rate of at least 24 frames per second. Display region 162 may comprise a touchscreen display (e.g., in a mobile device, tablet, etc.) Alternatively, user interface 160 may also include touch interface 164 for receiving user input (e.g., keyboard or keypad in a mobile device, laptop, or other computing devices). User interface 160 may be used in real-time applications (RTAs), such as multimedia streaming, video conferencing, navigation, video games, and the like.

[0034] Network interface 170 may be configured to transmit and receive instructions and files (e.g., using Wi-Fi, Bluetooth, Ethernet, etc.) for graphic rendering. In a specific example, network interface 170 may be configured to compress or down-sample images for transmission or further processing. Network interface 170 may be configured to send one or more images to a server for OCR. Processor 130 may be coupled to and configured to communicate between user interface 160, network interface 170, and/or other interfaces.

[0035] In an example, processor 130 and GPU 132 may be configured to perform steps for rendering video graphics, which can include those related to the executable instructions stored in storage 150. Processor 130 may be configured to execute application instructions and generate a plurality of graphics data associated with a 3D scene including at least a first object. The plurality of graphics data can include a plurality of vertex data associated with a plurality of vertices in the 3D scene (e.g., for each object). GPU 132 may be configured to generate a plurality of primitive data using at least the plurality of vertex data and a vertex shader The plurality of primitive data may include at least position data among others

[0036] In an example, GPU 130 may be configured to provide a position vector defining the center of an overall cluster data structure in three axes. The GPU 132 can also be configured to provide a subdivision scalar used to uniformly divide the overall cluster data structure along the three axes. Then, GPU 132 can be configured to create clusters by dividing a bounding box using at least the subdivision scalar and extends of the overall data structure. The GPU 132 may be configured to map geometric position data to an associated cluster defined by the position of a cluster center. The GPU 132 may also be configured to generate cluster data corresponding to the clusters by one or more compute shaders executing a thread for each of the clusters. Using the cluster data, the GPU 132 is configured to generate cull masks for each object. Further, GPU 132 is configured to render at least the first object (and any additional objects) using at least the cull masks and the cluster data by a shader.

[0037] To execute the rendering pipeline described, mobile device 100 includes hardware components need to be configured and optimized to handle the demands of this rendering technique. First, the device's CPU needs to be powerful and energy-efficient enough to handle the calculations and data processing required for forward+ rendering This typically involves using a high-performance CPU with multiple cores and threads, as well as specialized instructions and technologies such as SIMD and out-of-order execution to maximize performance and efficiency.

[0038] Second, the device's GPU needs to be capable of executing the complex shaders and algorithms used in the described rendering pipeline. This typically involves using a high- performance GPU with a large number of compute units and a fast memory bandwidth, as well as support for advanced graphics APIs and features such as OpenGL ES 3.0 and compute shaders

[0039] Third, the device's memory and storage subsystem need to be large and fast enough to support the data structures and textures used to store spatial gridcells containing the data required for rendering This typically involves using high-capacity and high-speed memory and storage technologies such as DDR4 RAM and UFS 2.0 or NVMe storage, as well as efficient data structures and algorithms to minimize memory and storage usage.

[0040] Fourth, the device's display and touch screen need to be high-resolution and fast- refreshing to support the visual quality and interactivity of real-time rendering. This typically involves using high-resolution and high-refresh-rate displays, as well as low-latency and high-precision touch screens to enable smooth and responsive interactions.

[0041] The device's battery and power management subsystem need to be able to support the power demands of real-time rendering. This typically involves using high-capacity batteries and efficient power management technologies such as intelligent charging and power-saving modes to ensure that the device can run for a long time without running out of power

[0042] Mobile device 100, to meet the needs of the rendering pipeline described herein, has powerful and energy-efficient hardware components, including a high-performance CPU, GPU, memory and storage subsystem, display and touch screen, and battery and power management subsystem. These components need to be optimized and configured to support the demands of forward+ rendering and enable smooth and engaging visuals and interactions.

[0043] Other embodiments of this system include corresponding computer systems, apparatuses, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods. Further details of methods are discussed with reference to the following figures. [0044] Figure 2 is a simplified flow diagram illustrating conventional forward pipeline 200 for rendering video graphics. As shown, forward pipeline 200 (i.e., includes vertex shader 210 followed by fragment shader 220). In a forward pipeline rendering process, a CPU provides graphics data of a 3D scene (e.g., from memory, storage, over network, etc.) to a graphic card or a GPU. In the GPU, vertex shader 210 transforms objects in the 3D scene from object space to screen space This process includes projecting the geometries of the objects and breaking them down into vertices, which are then transformed and split into fragments, or pixels. In fragment shader 220, these pixels are shaded (e.g., colors, lighting, textures, etc ) before they are passed onto a display (e g , screen of a smartphone, tablet, VR goggles, etc.). In lighting cases, rendering effects are processed for every vertex and on every fragment in the visible scene for every light source.

[0045] Figure 3 is a simplified flow diagram illustrating a conventional deferred pipeline 300 for rendering video graphics. Here, the prepass process 310 involves receiving the graphics data of the 3D scene from the CPU and generating a G-buffer with data needed for subsequent rendering passes, such as color, depth, normal, etc. A ray traced reflections pass 320 involves processing the G-buffer data to determine the reflections of the scene, and the ray traced shadows pass 330 involves processing the G-buffer data to determine the shadows of the scene. Then, a denoising pass 340 removes the noise for pixels that were ray-traced. In the main shading pass 350, the reflections and shadows and material evaluations are combined to produce the shaded output with each pixel color. In the post pass 360, the shaded output may be subject to additional rendering processes such as color grading, depth of field, etc.

[0046] Compared to the forward pipeline 200, the deferred pipeline 300 reduces the total fragment count by only processing rendering effects based on unoccluded pixels. This is accomplished by breaking up the rendering process into multiple stages (i.e., passes) in which the color, depth, and normal of the objects in the 3D scene are written to separate buffers that are subsequently rendered together to produce the final rendered frame. Subsequent passes use depth values to skip rendering of occluded pixels when executing more complex lighting shaders. The deferred render pipeline approach reduces the complexity of any single shader compared to the forward render pipeline approach, but having multiple rendering passes requires greater memory bandwidth, which is especially problematic for modem mobile architectures with limited and shared memory.

[0047] According to an example, the present invention provides a method and system for world-space sampling clusters. This spatial cluster system is configured with a vector that defines the extents of the cluster data structure. A default value of {0, 0, 0} uses the bounding box of the entire scene and is independent of camera position. If set to a nonzero value, the spatial cluster is created with its center at the camera, and bounds corresponding to each configured extent value.

[0048] Once the bounds have been established, the scene is subdivided into smaller clusters based on a user-configured vector that specifies the number of subdivisions along each axis. A default value of {0, 0, 0} allows manual setup of the cluster count and the bounds for each cluster (further explained with below). Each cluster contains data used to accelerate rendering of points within the cluster. This data is generated using a compute shader that executes a thread for each cluster (e.g., shaders shown in Figures 2 and 3).

[0049] Any data that improves sampling and can be used simultaneously should be stored within these clusters. In this example, light-specific optimizations were sacrificed in favor of a more general-use spatial data structure, thus importance-ordered light lists are stored in place of cuts in a hierarchical light tree. ReSTIR reservoirs can also be stored per-cluster, since ReSTIR integrates seamlessly with light importance sampling. Cluster cull masks and cut-point data for environment map visibility can also be stored per-cluster

[0050] Finally, at runtime, cluster data is loaded once to perform shading of a hit point based on the position of that hit point. Depending on the application, a hashing scheme may considered for this look-up in the case that the memory footprint of a hash map combined with the complexity of implementing an efficient hashing scheme is manageable. However, in this case, the hit point position is stratified into its associated cluster based on the extents and subdivisions described prior. If the subdivisions are {0, 0, 0}, the cluster ID is derived from the instance cull mask instead, enabling clustering methods that depend on non-spatial properties such as material parameters.

[0051] Embodiments of this general -use spatial clustering system can be used to segment the scene for both selective loading of segment data and data sharing within a given segment. For smaller scenes with fewer clusters, both the memory footprint and the number of compute dispatches required to update the clusters for rendering is tiny, though it scales with the number of subdivisions. The low minimum cost of this simple spatial clustering system makes it desirable for mobile. Compared to screen-space clustering methods, this method introduces a dependency between scene scale and performance, but the world-space locality offered makes more sense for ray tracing than the projective locality offered by screen-space clustering [0052] Further, by avoiding optimizations to partitioning that are specific to any one sampling optimization, the spatial clusters may be reused for a variety of sampling improvements. While it compromises on the quality for any one sampling optimization, it decouples the build and update time of the data structure from the number of optimizations that are implemented. This is significant when the grid cell is centered on the camera, requiring updates to the data structure any time the camera moves

[0053] In general, cull masks are useful for reducing the cost of initializing inline ray tracing by reducing the proportion of the acceleration structure that is loaded into memory and traversed. Only instances in the acceleration structure that match the cull mask will be loaded. According to an example, the present invention provides for methods for procedural setup of instance cull masks based on differing heuristics. Five examples of such methods are discussed below. As the ray tracing requirements of each scene will vary with scene content, the different methods may be applied experimentally in development and the method that best suits the scene may be chosen by inspection. For a given scene, the selected segmentation method can then be procedurally computed during load time to alleviate runtime compute costs. In the following figures, the coordinate frame used for these descriptions is a Y-up coordinate frame, with the X axis running right-left on the screen and the Z axis running in-out of the screen.

[0054] In a specific example, there are two values of cull masks of import, which will be referred to as cull-to and cull-from masks. Cull-to masks refer to the mask value packaged with instance data that is built in an acceleration structure. Only rays with mask values that are non-zero when Boolean ANDed with cull-to masks will intersect with those instances Cull-from masks refer to the mask value packaged with instance data that is directly accessed by shader code at runtime. Cull-from masks provide the mask value for the rays that are cast from hit points on the corresponding instance. A single instance may have different cull-to and cull-from mask values. In the following examples, the values will be numbered from 1 to 8, which corresponds to the bit that is set on the 8-bit cull mask, with all other bits being zero. Of course, there can be other variations, modifications, and alternatives.

[0055] The spatial grid is configured with an extents vector and a center vector By default, these are populated from the scene center and scene extents, but the center may also be set to camera position. For all grids outlined below, the extents are divided by the number of segments along a given axis to determine the positions of the boundaries between segments. Interior boundaries are placed as calculated, but grids with exterior faces are assumed to extend those faces to encompass the remainder of the scene. If the grids are intended to capture only instances strictly within the extents, a Boolean flag (e.g., TraceLocalOnly) may be enabled. This feature is useful for scenes where distant objects do not contribute significantly to ray traced effects.

[0056] According to an example, the present invention provides methods for assigning scene instances to segments based solely on the instance transform (i.e., spatial only segmentation). Instances that overlap a given segment will be assigned to that segment. Instances that overlap multiple segments will be assigned to all segments they overlap by setting the bits corresponding to those segments to 1. For spatial-only segmentation, cull-to and cull-from masks are always equal.

[0057] Figure 4A is a simplified diagram illustrating a spatial-only equal segmentation method according to an embodiment of the present invention The most straightforward method of segmenting a scene is to divide it into equal parts. This is exactly what is done for world-space clusters: given a list of instances, calculate the scene’s boundaries, then divide it into equally sized volumes. For 8-bit instance cull masks, the number of volumes must be limited to eight. As shown in diagram 401, this is done most straightforwardly with two layers of four segments (uniform grid with segment dimensions 2x2x2), defined by axis- aligned bounding boxes (AABBs) AABBs allows for efficiently comparing bounds against each instance for assignment to segments.

[0058] However, most game levels (e.g., RPG or FPS levels) are constrained to a single plane of movement, are arranged such that scene data is well-grouped in the Y direction, so there is little need to subdivide objects by their Y-position. Figure 4B is a simplified diagram illustrating a spatial-only single plane segmentation method according to an embodiment of the present invention. As shown in diagram 402, the number of segments along either the Z or the X axis are doubled (2D grid with segment dimensions 2x1x4). This axis can be chosen based on which extent is larger for the total scene bounding box.

[0059] Finally, for scenes with large transmissive and reflective elements such as bodies of water, the scene would make use of an X-Z planar split. However, these scenes usually have much more scene geometry either above or below the water. Figure 4C is a simplified diagram illustrating a spatial-only asymmetric segmentation method according to an embodiment of the present invention. As shown in diagram 403, only two halves are allocated for the Y-division with fewer instances, and the remaining six are allocated to the Y-division with more instances (X-Z split grid with 2x1 segments above and 3x2 segments below) [0060] In a specific example, this method can be further refined by setting the Y-division to a value equal to the instance with the largest X-Z bounding square, Those of ordinary skill in the art will recognize other variations, modifications, and alternatives to these spatial-only segmentation methods (e.g., segmentation along different axes, different number and ratio of segmentations, etc.).

[0061] All the previously described grids may be straightforwardly centered on the camera position and generated with dimensions based on the configured extents. However, another grid view is presented here specifically for camera-centered rendering. Thus, according to an example, the present invention provides methods for camera-centered segmentation.

[0062] Typically, objects close to the camera are more noticeable and thus must be rendered in greater detail than objects that are more distant. Camera-centered grids take advantage of this by distributing grid cells based on the transform of the camera. Figure 4D is a simplified diagram illustrating a camera-centered segmentation method according to an embodiment of the present invention. As shown in diagram 404, four segments are allocated to cover a configurable max radius around the camera (denoted 1 to 4), and the remaining four segments (denoted 5 to 8) split all the remaining instances in the scene into quadrants

[0063] Segment boundaries can cause visible artifacts in indirect illumination, especially on mirror-like surfaces which can show sharp cutoffs in reflections across instances that are assigned to separate segments. In an example, the grid is aligned at a 45-degree angle offset in the X-Z plane from camera local space to reduce the visibility of these cutoff artifacts. This requires maintaining a separate “rotated view matrix” used to transform instances from world space into rotated camera local space. Note that the bounding boxes of each instance must also be transformed correctly Then, AABB comparison can proceed as normal in rotated view space.

[0064] In an example, to avoid complicating the comparisons for outer segments 5-8, which are not box-shaped, all instances are first compared against the configured radius of the inner segments. Any instance that overlaps this radius can be added to a list of instances that will only be tested for inclusion in inner segments, and all other instances can be added to a second list of instances that will only be tested for inclusion in outer segments. This eliminates the need to test against the outer bounds during bounds testing for both lists, simplifying bounds testing into quadrant testing.

[0065] According to an example, the present invention provides for methods of materials- based scene segmentation. These methods takes advantage of the fact that some rays only need to intersect certain kinds of materials. For example, subsurface rays only need to intersect materials with subsurface scattering, and shadow rays must ignore emissive materials to shadow correctly. These methods also consider that the spatial properties of an instance inform its relevance to indirect illumination of different materials differently. For example, a perfectly smooth mirror can be expected to reflect distant objects, while rough objects are usually only affected by indirect illumination from nearby objects

[0066] In a specific example, the material categories can include thick subsurface, light, alpha cutoff, smooth transmissive, smooth reflective, rough object, center, large projected scale, etc. Determining the material types can include logical checks of properties such as subsurface scattering, index of refraction (IOR), emission, opacity, roughness, distance from center, center view size, etc. Further, each material category can have separately defined cull- to and cull-from values, as discussed previously.

[0067] The center and large projected scale categories are assigned based on the spatial properties of each instance. Instances that are already assigned to another material cull group are still considered for assignment to these two cull groups. They can be set up relative to the currently active camera or to a user-configured view-projection matrix The latter may be useful when the content of the scene is distributed such that ray tracing effects are only significant at static positions of interest.

[0068] The center category is computed based on the view-space distance of each instance from the origin All instances that are assigned to the center category are eliminated from consideration for large projected scale category.

[0069] To calculate the projected scale of an instance, the camera’s projection matrix is multiplied with a modified view matrix which, given a distance d between the camera and the instance, rotates the instance to place it at (0, 0, d) in modified view-space Then, after projecting each instance onto the center of the view plane, the projected extents of each instance’ s bounding box may be used as a proxy for the importance of the instance in indirect rendering appearance. If either the projected X or projected Y extents exceeds a user- configured threshold, the instance is added to the designated category (e.g., category 8 for 8- bit masks).

[0070] The bandwidth consumption of ray tracing processes (e.g., rayQuery initializations) is a major bottleneck in mobile ray tracing. For example, by using instance culling masks to initialize rayQueries with only 1/8 of the scene geometry, the bandwidth consumption drops significantly. This resulted in a 15-20fps speedup on mobile on a test scene with 35,000 triangles rendenng at a resolution of 1697x760. This test scene’s instance culling masks were manually configured in a way that is specific to the scene’s appearance in terms of the layout and materials on the scene’s geometry.

[0071] The methods used for procedurally setting up these culling masks replaces the painstaking manual configuration step with a simple mode selector which allows users to choose the best segmentation mode for their scene by inspection. Also, the embodiments of the present method are the first to assign instance masks based on the scale or position of the instance within the scene. Furthermore, the procedural setup for instance cull masks requires no additional hand tuning by users.

[0072] According to an example, the present invention provides a method for sharing scene segmentation data between world-space clusters and the instance culling system. In an example, both spatial sampling clusters and procedural instance cull masks are managed by an overall scene segmentation manager. The two features may optionally be linked, sacrificing spatial cluster resolution and instance cull mask flexibility in favor of performance improvements from greater reuse.

[0073] When linked, the bit width of instance cull masks limits spatial clusters to a max cluster count of eight for 8-bit masks. At this lower resolution, the spatial clusters are less able to capture high-frequency lighting differences, so scenes with hundreds of small lights with distinct characteristics would no longer see much improvement from spatial cluster- based optimizations. However, most modem games are first- or third -person games where the camera view is localized to a player character, and most scene objects are at a similar scale to the player character, which is a scenario that is well-covered with only eight clusters.

[0074] On the other hand, instance cull masks may only use spatial-only methods for assigning categories Attempting to construct a bounding box from all rough objects in a scene is all but guaranteed to generate a bounding box that overlaps with other bounding boxes and spans a volume large enough that it becomes impossible to share useable data across that volume.

[0075] As discussed previously, it is possible to override the procedural subdivision of the scene’s bounds with precalculated bounds. When spatial sampling clusters are linked with procedural instance cull masks, the procedural instance cull mask class can generate the bounding boxes for the active grid mode and pass it to the spatial cluster class. Then, since the spatial clusters are identical to the instance culling categories, the bit offset of the cull-to value of the instance being shaded may be directly used to determine which spatial cluster contains the relevant data for improving shading. Since each instance may only be associated with one spatial cluster, this imposes a constraint on procedural instance cull masks such that each instance’s cull-to value must be “one-hot”, or only have one non-zero bit. Thus, the previous methods can be extended to check the overlap between an instance and all candidate categories, and only assign the instance to the category with the greatest overlap.

[0076] Sharing scene segmentation data between world-space clusters and the instance culling system reduces the overhead of maintaining the two methods separately, since the scene data structure only needs to be computed or updated once, then can be shared across the two methods. This sacrifices the resolution of the spatial clusters (and thus the quality of scenes with high-frequency direct illumination) and can only be used with spatial-only methods for procedural instance cull masking. Nonetheless, camera-centered schemes for instance culled clusters achieve an acceptable rendering quality for content typical of first- and third-person games with improved performance.

[0077] Figure 5 is a simplified flow diagram illustrating a method 500 for rendering graphics on a computer device according to embodiments of the present invention. This diagram is merely an example, which should not unduly limit the scope of the claims One of ordinary skill in the art would recognize many variations, alternatives, and modifications For example, one or more steps may be added, removed, repeated, replaced, modified, rearranged, and/or overlapped, and they should not limit the scope of the claims.

[0078] According to an example, method 500 of rendering graphics can be performed by a rendering system, such as system 100 in Figure 1. More specifically, a processor of the system can be configured to perform the actions of method 500 by executable code stored in a memory storage (e.g., permanent storage) of the system. As shown, method 500 can include step 502 of receiving a plurality of graphics data associated with a three-dimensional (3D) scene rasterized to determine at least a first object (or all object instantiations in the scene) that would be intersected by a primary ray cast through each of a plurality of screen- space pixels. This can include all data necessary to determine the first object intersected by a ray cast through each pixel in the viewport, wherein the plurality of graphics data includes a plurality of vertex data associate with a plurality of vertices in the 3D scene. In an example, the method includes generating a plurality of screen-space primitive data using at least the plurality of vertex data. The plurality of primitive data includes at least barycentric coordinates and a triangle index, but can also include other position data. [0079] In step 504, the method includes providing a position vector defining the center of an overall cluster data structure in three axes. In a specific example, the center of the overall cluster data structure is independent of a camera position, at a camera position, or the like. In step 506, the method includes providing a subdivision scalar used to uniformly divide the overall cluster data structure along the three axes. In step 508, the method includes creating clusters by dividing a bounding box using at least the subdivision scalar and extents of the overall data structure. Dividing the bounding box can include any the previously discussed segmentation techniques and variations thereof.

[0080] In step 510, the method includes mapping geometric position data to an associated cluster defined by the position of a cluster center. In step 512, the method includes generating cluster data corresponding to the clusters by one or more compute shaders executing a thread for each of the clusters. In step 514, the method includes generating cull masks for each object using the cluster data. In a specific example, the cull masks include cull-to masks and/or cull-from masks. As discussed previously, the cull masks can be assigned based on clusters corresponding to equal scene segments, or based on a material and on a camera-relative distance of each object.

[0081] In step 516, the method includes rendering at least the first object (or all objects represented in the screen-space primitive buffer) using at least the cull masks and cluster data by a shader. The shader can be configured within a computer device, such as device 100 shown in Figure 1, and can include pipeline configurations, such as those shown in Figures 2 and 3. Further, the method can include storing ReSTIR reservoir data, ordered light lists, or the like.

[0082] Figure 6 is a simplified flow diagram illustrating a method 500 for rendering graphics on a computer device according to embodiments of the present invention This diagram is merely an example, which should not unduly limit the scope of the claims. One of ordinary skill in the art would recognize many variations, alternatives, and modifications.

For example, one or more steps may be added, removed, repeated, replaced, modified, rearranged, and/or overlapped, and they should not limit the scope of the claims.

[0083] According to an example, method 600 of rendering graphics can be performed by a rendering system, such as system 100 in Figure 1. More specifically, a processor of the system can be configured to perform the actions of method 500 by executable code stored in a memory storage (e.g., permanent storage) of the system. As shown, method 500 can include step 602 of generating a three-dimensional (3D) scene including a first object (or including all object instantiations m the scene). In a specific example, the 3D scene comprises a first object intersected by a primary ray.

[0084] In step 604, the method includes receiving a plurality of graphics data associated with the 3D scene. In a specific example, the plurality of graphics data includes a plurality of vertex data associated with a plurality of vertices in the 3D scene. In an example, the method further includes generating a plurality of primitive data using at least the plurality of vertex data and a vertex shader, the plurality of primitive data including at least position data.

[0085] In step 606, the method includes providing a position vector defining the center of an overall cluster data structure in three axes. In step 608, the method includes providing subdivision scalars used to divide the overall cluster data structure along the three axes. In step 610, the method includes creating clusters by dividing a bounding box using at least the subdivision scalars and extents of the overall data structure. In a specific example, the bounding box is uniformly divided, divided according to a horizon in the 3D scene, or the like and combinations thereof. In an example, the method further includes obtaining settings for dividing the bounding box. Dividing the bounding box can include any the previously discussed segmentation techniques and variations thereof

[0086] In step 612, the method includes mapping geometric position data to an associated cluster defined by a position of a cluster center. In step 614, the method includes generating cluster data corresponding to the clusters by one or more compute shaders executing a thread for each of the clusters. In step 616, the method includes generating cull masks for each object (e.g., the first object) using the cluster data. As discussed previously, the method can also include defining cull masks based at least on opacities of objects in the 3D scene.

[0087] In step 618, the method includes rendering the 3D scene using at least the cull masks and cluster data by a shader As discussed previously, the shader can be configured within a computer device, such as device 100 shown in Figure 1, and can include pipeline configurations, such as those shown in Figures 2 and 3.

[0088] While the above is a full description of the specific embodiments, various modifications, alternative constructions and equivalents may be used. Therefore, the above description and illustrations should not be taken as limiting the scope of the present invention which is defined by the appended claims.