Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
IDENTIFICATION OF ROOT CAUSE PATH WITH MACHINE REASONING
Document Type and Number:
WIPO Patent Application WO/2024/052924
Kind Code:
A1
Abstract:
A method of determining a root cause identification path for an anomaly in a communication network includes obtaining anomaly data describing the anomaly in the communication network, generating a directed acyclic graph, DAG, describing relationships among potential causes of the anomaly. The DAG includes a plurality of nodes corresponding to potential causes of the anomaly and edges representing dependency relationships between respective ones of the nodes. The method includes generating probability tables for each node in the DAG describing probabilities of each node being a cause of an anomaly in an immediate downstream node in the DAG, generating a dynamic Bayesian network, DBN, based on the DAG and the probability tables, and querying the DBN to identify a likely root cause path for the anomaly.

Inventors:
SHARMA RAHUL (IN)
AGARWAL SANDEEP (IN)
SHARMA AMIT KUMAR (IN)
VUPPALA SUNIL KUMAR (IN)
Application Number:
PCT/IN2022/050798
Publication Date:
March 14, 2024
Filing Date:
September 06, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERICSSON TELEFON AB L M (SE)
SHARMA RAHUL (IN)
International Classes:
H04L41/0631
Domestic Patent References:
WO2008107020A12008-09-12
Foreign References:
US11354184B22022-06-07
Attorney, Agent or Firm:
DJ, Solomon et al. (IN)
Download PDF:
Claims:
CLAIMS: 1. A method of determining a root cause path for an anomaly in a communication network, the method comprising: obtaining (802) anomaly data describing the anomaly in the communication network; generating (804) a directed acyclic graph, DAG, describing relationships among potential influences of the anomaly, wherein the DAG comprises a plurality of nodes corresponding to potential influences of the anomaly, wherein the DAG comprises edges representing dependency relationships between respective ones of the nodes; generating (806) probability tables for each node in the DAG, the probability tables describing probabilities of each node being an influence of an anomaly in an immediate downstream node in the DAG; generating (808) a dynamic Bayesian network, DBN, based on the DAG and the probability tables; and querying (810) the DBN to identify a likely root cause path for the anomaly. 2. The method of Claim 1, wherein querying the DBN comprises querying the DBN using a conditional probability query. 3. The method of Claim 1, wherein generating the probability tables comprises performing a conditional probability dependency analysis on the DAG and the anomaly data. 4. The method of Claim 1, wherein the anomaly data is generated by a machine learning classification system that identifies anomalies based on raw input data collected from the communication network. 5. The method of Claim 1, wherein the DBN takes into account delays in propagation of anomalies between adjacent nodes in the DBN. 6. The method of Claim 1, further comprising modifying the DBN in response to user input. 7. The method of Claim 6, wherein modifying the DBN comprises adding, removing or changing an edge or a node in the DBN. 8. The method of Claim 1, wherein generating the DAG comprises performing a hill climbing greedy search on the anomaly data to learn the structure of the communication network.

9. The method of Claim 1, wherein the anomaly data is generated based on input data relating to the network wherein the input data comprises Business Support System metrics, Java logs, server metrics, database logs, core network data, network key performance indicators, KPIs, and/or server logs. 10. The method of Claim 1, wherein nodes in the DAG represent application programming interfaces, APIs, and/or key performance indicators, KPIs, of the communication network. 11. A machine reasoning root cause path identification system (700), comprising: a processing circuit (734); and a memory (736), wherein the memory comprises computer readable instructions that, when executed by the processing circuit, causes the processing circuit to perform operations comprising: obtaining (802) anomaly data describing an anomaly in a communication network; generating (804) a directed acyclic graph, DAG, describing relationships among potential influences of the anomaly, wherein the DAG comprises a plurality of nodes corresponding to potential influences of the anomaly, wherein the DAG comprises edges representing dependency relationships between respective ones of the nodes; generating (806) probability tables for each node in the DAG, the probability tables describing probabilities of each node being an influence of an anomaly in an immediate downstream node in the DAG; generating (808) a dynamic Bayesian network, DBN, based on the DAG and the probability tables; and querying (810) the DBN to identify a likely root cause path for the anomaly. 12. The machine reasoning root cause path identification system of Claim 11, wherein the computer readable instructions cause the processing circuit to perform operations according to any of Claims 1 to 10. 13. A computer program product comprising a non-transitory computer readable storage medium containing computer program instructions, that, when executed by a processing circuit of a computing device, cause the computing device to perform operations comprising: obtaining (802) anomaly data describing an anomaly in a communication network; generating (804) a directed acyclic graph, DAG, describing relationships among potential influences of the anomaly, wherein the DAG comprises a plurality of nodes corresponding to potential influences of the anomaly, wherein the DAG comprises edges representing dependency relationships between respective ones of the nodes; generating (806) probability tables for each node in the DAG, the probability tables describing probabilities of each node being an influence of an anomaly in an immediate downstream node in the DAG; generating (808) a dynamic Bayesian network, DBN, based on the DAG and the probability tables; and querying (810) the DBN to identify a likely root cause path for the anomaly. 14. The computer program product of Claim 13, wherein the computer program instructions cause the processing circuit to perform operations according to any of Claims 1 to 10.

Description:
IDENTIFICATION OF ROOT CAUSE PATH WITH MACHINE REASONING BACKGROUND [0001] The present disclosure relates to management of communication networks, and in particular to systems and methods for identifying a root cause path of failures or anomalies in a wireless communication network. [0002] When managing a communication network, it is important to know the root cause path of failures or anomalies which may occur in various parts of the network, such as the network infrastructure and servers, network systems, and business support systems. Often the cause of such anomalies or failures is not apparent. In general, a “failure” refers to the inability of a network element, system or feature to function, while an “anomaly” refers to a situation where a network element, system or feature functions, but not within an expected or desired range of behavior. For simplicity, the term “anomaly” will be used herein to refer to both anomalies and failures. [0003] For example, when an end-user places an order for network services, such as a new equipment purchase, activation of a SIM card, device upgrade, etc., through a user interface, such as a customer relationship management (CRM) tool, a web portal, a point of sale terminal, etc., some orders may fail or be delayed due to an unknown cause that does not raise a failure exception. Such causes may include a delayed response of an application programming interface (API), high CPU utilization, high memory utilization, or a component-level failure. [0004] Such occurrences may be logged to an incident management system to be manually reviewed at a later time. However, incident management is a time consuming and tedious labor- intensive process that traditionally relies heavily on human knowledge. [0005] Root cause identification (RCI) is an automated process that may be used as part of a service management function to discover the cause(s) of the problem and reasons of anomalous behavior. However, existing RCI systems may not accurately identify root causes in systems with large numbers of dependencies, such as communication systems, even when the RCI system uses machine learning (ML) to assist with the analysis. [0006] A conventional RCI system 100 is illustrated in Figure 1. As shown therein, raw data 102, such as log data from a communication system, is provided to a pre-processing and aggregation function that prepares the data for analysis. An anomaly detection system 106 detects anomalies in the raw data and records the detected anomalies in a data store 108. The anomaly data is also provided to a root cause analysis (RCA) function 110 that provides point-in-time analysis of the anomaly data in an attempt to identify a root cause of the anomaly. In particular, one or more correlated metrics may be identified through dynamic time warping (DTW) 112 to help identify the root cause of an anomaly. [0007] Machine learning (ML) using ML algorithms such as DTW on various metrics can be used to obtain a list of metrics that could potentially be the root cause of an anomaly. ML Techniques can provide a list of correlated metrics that provide a hint as to the actual root cause of an anomaly. However, there are no tools that provide the ability to identify an exact root cause identification (RCI) path from such metrics. [0008] Monitoring tools can display a health score for anomalies which could be the starting point for root cause analysis, but such tools do not give the probable root cause of the problem. Similarly, rule-based automation can be done for issues/problems that are already known, but rule- based automation is not possible where the issues are unknown. SUMMARY [0009] A method of determining a root cause identification path for an anomaly in a communication network includes obtaining anomaly data describing the anomaly in the communication network, generating a directed acyclic graph, DAG, describing relationships among potential influences of the anomaly. The DAG includes a plurality of nodes corresponding to potential influences of the anomaly and edges representing dependency relationships between respective ones of the nodes. The method includes generating probability tables for each node in the DAG describing probabilities of each node being an influence of an anomaly in an immediate downstream node in the DAG, generating a dynamic Bayesian network, DBN, based on the DAG and the probability tables, and querying the DBN to identify a likely root cause path for the anomaly. [0010] Querying the DBN may include querying the DBN using a conditional probability query. [0011] In some embodiments, generating the probability tables includes performing a conditional probability dependency analysis on the DAG and the anomaly data. The anomaly data may be generated by a machine learning classification system that identifies anomalies based on raw input data collected from the communication network. [0012] In some embodiments, DBN takes into account delays in propagation of anomalies between adjacent nodes in the DBN. [0013] The method may further include modifying the DBN in response to user input. Modifying the DBN may include adding, removing or changing an edge or a node in the DBN. [0014] The DAG may be generated by performing a hill climbing greedy search on the anomaly data to learn the structure of the communication network. [0015] The anomaly data may be generated based on input data relating to the network wherein the input data comprises Business Support System metrics, Java logs, server metrics, database logs, core network data, network key performance indicators, KPIs, and/or server logs. [0016] Nodes in the DAG may represent application programming interfaces, APIs, and/or key performance indicators, KPIs, of the communication network. [0017] Some embodiments provide a machine reasoning root cause identification system, that includes a processing circuit and a memory coupled to the processing circuit. The memory includes computer readable instructions that, when executed by the processing circuit, causes the processing circuit to perform operations including obtaining anomaly data describing the anomaly in the communication network, generating a DAG describing relationships among potential influences of the anomaly. The DAG includes a plurality of nodes corresponding to potential influences of the anomaly and edges representing dependency relationships between respective ones of the nodes. The method includes generating probability tables for each node in the DAG describing probabilities of each node being an influence of an anomaly in an immediate downstream node in the DAG, generating a DBN based on the DAG and the probability tables, and querying the DBN to identify a likely root cause path for the anomaly. [0018] Some embodiments provide a computer program product including a non-transitory computer readable storage medium containing computer program instructions, that, when executed by a processing circuit of a computing device, cause the computing device to perform operations including obtaining anomaly data describing the anomaly in the communication network, generating a DAG describing relationships among potential influences of the anomaly. The DAG includes a plurality of nodes corresponding to potential influences of the anomaly and edges representing dependency relationships between respective ones of the nodes. The method includes generating probability tables for each node in the DAG describing probabilities of each node being an influence of an anomaly in an immediate downstream node in the DAG, generating a DBN based on the DAG and the probability tables, and querying the DBN to identify a likely root cause path for the anomaly. BRIEF DESCRIPTION OF THE DRAWINGS [0019] Figure 1 illustrates a conventional system for performing root cause identification. [0020] Figure 2 illustrates an example of a directed acyclic graph. [0021] Figure 3 illustrates a generalized machine reasoning approach for performing root cause identification. [0022] Figure 4 illustrates systems/methods according to some embodiments for using machine reasoning to perform root cause identification. [0023] Figure 5 illustrates an example of a directed acyclic graph structure that may be generated by a structure learning module according to some embodiments. [0024] Figure 6 illustrates a root cause analysis path generated for an anomaly. [0025] Figure 7A is a block diagram that illustrates elements of a knowledge base interface system according to some embodiments. [0026] Figure 7B illustrates various functional modules that may be stored in the memory of a knowledge base interface system according to some embodiments. [0027] Figure 8 illustrates operations of systems/methods according to some embodiments. [0028] Figure 9 illustrates a wireless communication system that may be analyzed with root cause identification systems/methods according to some embodiments. DETAILED DESCRIPTION OF EMBODIMENTS [0029] Some embodiments described herein utilize machine reasoning (MR) techniques such as a Dynamic Bayesian Network (DBN) to perform root cause identification to identify a root cause path of an anomaly. MR employs a knowledge base and a reasoner (or reasoning engine) which uses logical techniques such as deduction to generate conclusions. The knowledge base is built on various data sources, including learnings from previous solutions and subject matter experts (SMEs) as well as insights from machine learning agents in the network. Using a DBN enables the systems/methods described herein to account for the temporal aspect of the analysis by taking into account the evolution of variables over time. [0030] According to some embodiments, a MR solution is developed for real time RCI for anomalies including system/network/application failures. The use of real time RCI may result in faster problem resolution. An automated RCI system/method employing MR can replace what is currently an effort intensive, time taking, reactive manual process spanning multiple applications across many teams. [0031] With the help of ML techniques like DTW, it is possible to correlate various metrics that could lead to root cause identification. However, the an ML-based RCI system may not be able to provide the probable root cause of an anomaly. Therefore, some embodiments utilize machine reasoning techniques, such as Bayesian Learning, to identify a most probable RCI path from among alternate possible paths within a directed acyclic graph (DAG) consisting of nodes corresponding to possible influences on the root cause path and edges corresponding to probabilities, by using an appropriate weighting of probabilities. The possible influences (represented as nodes on the DAG) may include, for example, APIs used in the communication system and/or key performance indicators (KPIs) associated with the communication system. Identifying a probable root cause path may help an operation team to address the anomaly more quickly. [0032] Further, some embodiments provide functionality to add/remove both edges and nodes from the DAG in case an SME feels that some conditional dependency does not make business sense. This change can be persisted in the model itself for future RCI paths. [0033] Some embodiments described herein may enable a network operator to monitor overall system health and also, in the event an anomaly is detected, to determine a most probable RCI path on the basis of historical data. Further, some embodiments described herein may enable early detection of anomalies before they lead to failures. [0034] Some embodiments described herein provide a combination of DBN along with conditional probability queries (e.g., using inferencing) and custom logic to generate a model of a probable RCA path. In particular, continuous metrics may be converted to discrete versions that are processed using exploratory data analysis to extract muti-layer metrics. These layers are later consumed by a structure learning to generate a dependency graph. [0035] In some embodiments, the model is queried using a conditional probability query (CPQ) to determine a most probable RCA path for a given issue and a specific timestamp. The RCA path so generated is built using custom logic to traverse a maximum probability path, given the failure point. [0036] Automated structure learning (with dependency relations depicted by DAG) may be performed on the data without any prior human intelligence. All the relationships and causal path between the nodes may be generated automatically from the data and not with the help of a domain expert. [0037] A SME can alter the structure in accordance with his or her domain expertise, and may persist the structure in a model for subsequent use. [0038] Once a DAG has been constructed, any domain expert can alter the graph by adding/deleting/editing any node/edge. Retraining may occur automatically, and the changes are persisted later in the model. [0039] According to some embodiments, correlation may be performed as part of EDA to form a multi-layer structure for DBN. A Conditional Probability Query (CPQ) is performed on top of this model for inferencing. This approach may be more robust than previous approaches, as the multi-layer structure is not based on all available metrics, but only on a set of relevant metrics in layered manner. [0040] Based on the CPQ, a generated RCA path is built using custom logic to traverse a maximum probability path given a failure point. [0041] Embodiments described herein may provide certain technological advantages. For example, some embodiments described herein may provide deeper insights on RCA with probable paths and weights assigned to each path for every issue. This may enable an easy-to-use and efficient visualization of RCA path in a DAG. [0042] Some embodiments may enable an improvement in Mean Time to Resolution (MTTR) by including all potential influences in the root cause path. [0043] The ability to modify the DAG, such as by adding or removing edges and the ability to persist the changes in the model enable the RCI system according to some embodiments to be highly flexible. [0044] By quickly and accurately identifying the root cause path of a problem, an RCI system/method according to some embodiments may reduce revenue loss by providing a good customer experience during interactions with the network, such as when ordering and/or utilizing network services. This may further reduce the operational expense associated with the network. [0045] Some embodiments described herein provide systems/methods that develop a network of KPIs/APIs/metrics which can be visualized in form of a graph structure and that can help identify an RCI path based on historical patterns and relationships among metrics. Such systems/methods may help a system administrator to resolve an anomaly in a communication system by quickly and accurately identifying the root cause path of the anomaly. [0046] Some embodiments use an ML based Anomaly Detection (AD) system to detect anomalies in the system. Once an anomaly has been detected, the systems/methods use a Dynamic Bayesian Network (DBN) based Machine Reasoning (MR) approach to identify the probable RCI path. In particular, systems/methods according to some embodiments execute the following steps to perform root cause path analysis. [0047] First, the systems/methods capture raw data of metrics that are periodically generated by or observed in the communication system. The systems/methods may preprocess the raw data to put in into condition for analysis. With the processed data, the systems/methods train and execute an AD model to detect anomalies. The output of the AD model is validated and used as source data to train a DBN. [0048] The systems/methods then infer an RCA path using the DBN model. The RCA path is visualized, and active learning is used to train/retrain the DBN. [0049] The raw data obtained from the communication system may include numeric data of metrics which vary in range depending on type of metric. For example, the range of response time data will differ from CPU or memory metric data. Preprocessing is used to clean the raw data so that model can be learned from it. [0050] The AD model may include an inter-quartile range/z-score/Isolation Forest or any other classification algorithm that can be used to detect anomalies from the data. The AD model may be re-trained periodically to prevent/avoid any drift in predicted output. The AD model may be saved in a file system for later use. For example, the model may be used at run time for detecting anomalies. For example, incidents may be classified by the AD model as anomalous or non-anomalous. [0051] A Bayesian Network (BN) is a defined by a network structure including a directed acyclic graph in which each node corresponds to random variable, and a global probability distribution X (with parameters ), which can be factorized into smaller local probability distributions according to the arcs present in the graph. [0052] The main role of the network structure is to express the conditional independence relationships among the variables in the model through graphical separation, thus specifying the factorization of the global distribution: [0053] Each local distribution has its own parameter set , and ⋃ is much smaller than because many parameters are fixed by the fact that the to are independent. [0054] An example of a directed acyclic graph (DAG) 200 is illustrated in Figure 2. In the DAG 200, metrics that represent potential influences of an anomaly, such as KPIs or APIs associated with a communication system, are represented as nodes 202. The potential influences represented by nodes may also be referred to as variables of the DAG 200. Probabilities that one variable affects another variable are represented as directed edges 204 between nodes. In Figure 2, the nodes are labeled A, S, E, O, R and T. The implication of the graph shown in Figure 2 according to the chain rule is that: [0055] The second component of a BN is the probability distribution P(X). The choice should be such that the BN can be learned efficiently from data, is flexible (i.e., it can encode a reasonable variety of phenomena), and is easy to query to perform inference. [0056] Model selection and estimation of BNs are collectively known as learning and are usually performed as a two-step process including structure learning and parameter learning. Structure learning involves learning the network structure from the data, while parameter learning involves learning the local distributions implied by the learned structure. [0057] This workflow is Bayesian. Given a data set and if the parameters of the global distribution are denoted as X with , we have: structure as: [0058] Combined with the fact that the local distribution decomposes into local distributions that depend only on and its parents : [0059] Once an estimate is obtained for from structure learning, the local distributions can be identified, since the parents of each node are known. Their parameter sets can be estimated independently from each other. Assuming is sparse, each few variables and thus estimating its parameters is computationally simple. [0060] This process is performed to obtain probability distribution tables for each variable/node in the system, where each variable is a potential influence represented by a node on a DAG 200. [0061] Inference on DBNs may be performed by executing conditional probability (CP) or maximum a posteriori (MAP) queries on the DBN. The general idea of a CP query is that some evidence is known. That is, the values of some variables are known, and the nodes can be fixed accordingly. It is desired to investigate the probability of some event involving (a subset of) the other variables conditioned on the evidence available. [0062] A BN may be static or dynamic. In a Static Network, the data is considered as-is, and no temporal aspect is considered. [0063] In a DBN, temporal information is embedded into the network, which allows for an understanding of variable evolution over time. A DBN is learned in similar manner as a static Bayesian network. In a communication network, there may be a lag between the time that a change in one variable affects a dependent variable. A DBN may therefore produce more accurate results when analyzing a communication network. [0064] An Exploratory Data Analysis (EDA) was performed over all variables to determine the lag among them by calculating different statistics, such as correlations and probabilities. Considering this, variables were categorized in 4 different stages, where first category contained the bulk of variables which are leading towards subsequent layer of nodes. This layered info of variables along with data was fed to Bayesian Network constructed using the bnlearn library in R to compute structure and CPDs. [0065] Discrete data is required for all variables. In some embodiments, binary data generated at frequency of one data point per 10 or 15 minutes may be used. [0066] A DBN may be constructed by blacklisting edges among a first layer of nodes and preventing backward edges. Thus, edges are required in forward direction only. [0067] A hill climbing greedy search may be used for structure learning that explores the space of the directed acyclic graphs by single-arc addition, removal, and reversals with random restarts to avoid local optima. Score caching, score decomposability and score equivalence may be used to reduce the number of duplicated tests. [0068] A Bayesian Information Criterion score (BIC) may be used as a network score used while using the hill climbing search. [0069] Maximum Likelihood Estimation (MLE) may be used for parameter estimation. [0070] Root Cause Analysis (RCA), or root cause path analysis, is used to find a root cause path of influences leading towards the cause of the anomaly of a target variable. RCA may be performed using an inference engine, such as the inference engine provided by bnlearn. Here, execution starts from target node and traverses through the parent nodes giving maximum probability for an event occurring, given the available evidence. [0071] Active Learning may be performed on the network. In particular, the network can be edited by Subject Matter Experts (SMEs) later if they find that certain relations between variables do not make sense or need to be augmented. [0072] Figure 3 illustrates a generalized MR approach 300 for performing RCI which uses AD Output to learn a Bayesian Network. Existing knowledge is applied to generate an RCI path. In particular, referring to Figure 3, anomaly data 308 collected from an AD model is provided to a model training system 302. The model training system 302 includes a structure learning system 304 and a conditional probability dependency (CPD) evaluation system 306. The anomaly data is used by the structure learning system 304 to learn the structure of a DBN. The structure of the DBN is then used for CPD evaluation. The results of the CPD evaluation are incorporated into an ML model/knowledge base 308 from which a DAG 350 is constructed. [0073] To determine the root cause path of an anomaly 312, an inferencing engine 310 performs RCA by querying the ML model 308. [0074] Figure 4 illustrates systems/methods 400 according to some embodiments for using MR to perform RCI. As shown therein, input data 402 provides raw data to a data extraction module 404 which preprocess the raw data. The preprocessed data is provided to a machine learning (ML) system 406 which performs anomaly detection using an anomaly detection module 408. The detected anomaly data is provided to a machine reasoning (MR) system 410, which includes a structure learning module 412, a CPD evaluation module 414 and an inferencing engine 416. The MR system 410 performs RCA and generates a probable RCI path 418. The generated path may be used to obtain SME feedback, which can be used by an active learning module 420 to update the MR module 410. [0075] The input data 402 may include network related data, such as Business Support System (BSS) metrics, Java logs, server metrics, database logs, core network data, network KPIs, server logs, etc. BSS metrics may include data relating to charging, billing, operations, etc. [0076] The data extraction module 404 extracts data offline and preprocesses the data to prepare it for processing by the ML system 406. [0077] The anomaly detection module 408 of the ML system 406 uses machine learning techniques, such as statistical modelling based on Inter Quartile Range (IQR)/Z-scores, to identify anomalies in the input data. [0078] The machine reasoning system 410 receives data regarding anomalies identified by the machine learning system 406. A Dynamic Bayesian Network is used generate an RCI path. In particular, the MR module 410 includes three functional modules, namely, a structure learning module 412, a CPD evaluation module 414 and an inferencing engine 416. [0079] In the structure learning module 412, a structure consisting of a DAG is learned from historical input data using Bayesian structure learning. That is, the structure learning module 413 identifies the identities of the nodes 202 and relationships between the nodes 202 (Figure 2) from the input data. [0080] In the CPD module 414, the structure learned in the structure learning module 412 is used along with observations from the data to compute the probability for all the edges 204 associated with each node 202 (Figure 2). [0081] The inferencing engine 416 takes the structure generated by the structure learning module 412 and the CPD values generated by the CPD module 414 and generates the probable RCI path 418 for the anomaly identified by the ML module 406. [0082] The active learning module receives feedback from SMEs regarding the RCI path 418. User can add/modify nodes and/or edges of the RCI path if they feel the path is wrong. Once a user modifies the RCI path, the MR algorithms in the MR module 410 can automatically learn the new path for future recommendation of RCI path for the same type of problems. [0083] Figure 5 illustrates an example of a DAG structure 500 that may be generated by the structure learning module 412 of Figure 4. In Figure 5, each node 502 represents an API of a system, such as a business support system (BSS) for a communication system, that is potentially part of a root cause path of a detected anomaly. [0084] Figure 6 visually illustrates an RCA path generated for an anomaly with the node 502D, which represents the API api_notifyincomingevent. As shown in Figure 6, each node 502 is connected to another node by an edge 504 that has an associated probability. For example, node 502A, which represents the API api_getprepaidbalance is connected by a directed edge 504AB to node 502B, which represents the API api_getfinancialaccountdetails. The directed edge 504AB has an associated probability of 0.59, meaning that the probability that an anomaly in node 502B was influenced by an anomaly in node 502A given the state of node 502B is 0.59. [0085] For finding a root cause path, systems/methods according to some embodiments start with a target node and check for each parent of the target node to find the probability of failure of target node given the parent node and given evidence (states of node). The systems/methods find the edge leading to the target node that has the highest probability. The parent node having the highest probability is then selected as the new target node, and this process is repeated until a required depth is reached or the end of the graph is reached. In the example shown in Figure 6, the required depth may be four nodes. However, the end of the graph is reached after three nodes, because node 502A has no parent nodes. [0086] Similarly, node 502B is connected to node 502C by a directed edge 504BC that has an associated probability of 0.73, and node 502C is connected to node 502D by a directed edge 504CD that has an associated probability of 0.77. The inferencing engine 416 of Figure 4 examines all of the nodes 502 and edges 504 of the structure generated by the structure learning module 412 and CPD evaluation module 414 to determine the most likely root cause path based on the associated probabilities. This likely root cause path may be analyzed by network operators to assist in correcting the identified anomaly. [0087] Figure 7A is a block diagram of a MR-based root cause path identification system 700. Various embodiments provide a knowledge base interface system 700 that includes a processor circuit 734 a communication interface 718 coupled to the processor circuit 734, and a memory 736 coupled to the processor circuit 734. The processor circuit 734 may be a single processor or may comprise a multi-processor system. In some embodiments, processing may be performed by multiple different systems that share processing power, such as in a distributed or cloud computing system. The memory 736 includes machine-readable computer program instructions that, when executed by the processor circuit, cause the processor circuit to perform some of the operations and/or implement the functions depicted described herein. [0088] As shown, a root cause path identification system 700 includes a communication interface 718 (also referred to as a network interface) configured to provide communications with other devices. The knowledge base interface system 700 also includes a processor circuit 734 (also referred to as a processor) and a memory circuit 736 (also referred to as memory) coupled to the processor circuit 734. According to other embodiments, processor circuit 734 may be defined to include memory so that a separate memory circuit is not required. [0089] As discussed herein, operations of the root cause path identification system 700 may be performed by processing circuit 734 and/or communication interface 718. For example, the processing circuit 734 may control the communication interface 718 to transmit communications through the communication interface 718 to one or more other devices and/or to receive communications through network interface from one or more other devices. Moreover, modules may be stored in memory 736, and these modules may provide instructions so that when instructions of a module are executed by processing circuit 734, processing circuit 734 performs respective operations (e.g., operations discussed herein with respect to example embodiments. [0090] Figure 7B illustrates various functional modules that may be stored in the memory 736 of the root cause path identification system 700. The modules may include a structure learning module 722 that implements the structure learning system, a CPD analysis module 724 that implements the CPD system 714, and an inferencing engine module 726 that implements the inferencing engine 716. Figure 8 illustrates operations of systems/methods according to some embodiments. In particular, a method of determining an RCI path for an anomaly in a communication network includes obtaining anomaly data describing the anomaly in the communication network (block 802), and generating a DAG describing relationships among potential influences of the anomaly (bl0ck 804). The DAG includes a plurality of nodes corresponding to potential influences of the anomaly and edges representing dependency relationships between respective ones of the nodes. The method further includes generating probability tables for each node in the DAG that describing probabilities of each node being an influence of an anomaly in an immediate downstream node in the DAG (block 806) and generating a dynamic Bayesian network, DBN, based on the DAG and the probability tables (block 808). The DBN is queried to identify a likely root cause path for the anomaly (block 810). [0007] Figure 9 shows an example of a communication system 900 that may be analyzed with root cause path identification systems/methods according to some embodiments. [0008] In the example, the communication system 900 includes a telecommunication network 902 that includes an access network 904, such as a radio access network (RAN), and a core network 906, which includes one or more core network nodes 908. The access network 904 includes one or more access network nodes, such as network nodes 910a and 910b (one or more of which may be generally referred to as network nodes 910), or any other similar 3rd Generation Partnership Project (3GPP) access node or non-3GPP access point. The network nodes 910 facilitate direct or indirect connection of user equipment (UE), such as by connecting UEs 912a, 912b, 912c, and 912d (one or more of which may be generally referred to as UEs 912) to the core network 906 over one or more wireless connections. [0009] Example wireless communications over a wireless connection include transmitting and/or receiving wireless signals using electromagnetic waves, radio waves, infrared waves, and/or other types of signals suitable for conveying information without the use of wires, cables, or other material conductors. Moreover, in different embodiments, the communication system 900 may include any number of wired or wireless networks, network nodes, UEs, and/or any other components or systems that may facilitate or participate in the communication of data and/or signals whether via wired or wireless connections. The communication system 900 may include and/or interface with any type of communication, telecommunication, data, cellular, radio network, and/or other similar type of system. [0010] The UEs 912 may be any of a wide variety of communication devices, including wireless devices arranged, configured, and/or operable to communicate wirelessly with the network nodes 910 and other communication devices. Similarly, the network nodes 910 are arranged, capable, configured, and/or operable to communicate directly or indirectly with the UEs 912 and/or with other network nodes or equipment in the telecommunication network 902 to enable and/or provide network access, such as wireless network access, and/or to perform other functions, such as administration in the telecommunication network 902. [0011] In the depicted example, the core network 906 connects the network nodes 910 to one or more hosts, such as host 916. These connections may be direct or indirect via one or more intermediary networks or devices. In other examples, network nodes may be directly coupled to hosts. The core network 906 includes one more core network nodes (e.g., core network node 908) that are structured with hardware and software components. Features of these components may be substantially similar to those described with respect to the UEs, network nodes, and/or hosts, such that the descriptions thereof are generally applicable to the corresponding components of the core network node 908. Example core network nodes include functions of one or more of a Mobile Switching Center (MSC), Mobility Management Entity (MME), Home Subscriber Server (HSS), Access and Mobility Management Function (AMF), Session Management Function (SMF), Authentication Server Function (AUSF), Subscription Identifier De-concealing function (SIDF), Unified Data Management (UDM), Security Edge Protection Proxy (SEPP), Network Exposure Function (NEF), and/or a User Plane Function (UPF). [0012] The host 916 may be under the ownership or control of a service provider other than an operator or provider of the access network 904 and/or the telecommunication network 902, and may be operated by the service provider or on behalf of the service provider. The host 916 may host a variety of applications to provide one or more service. Examples of such applications include live and pre-recorded audio/video content, data collection services such as retrieving and compiling data on various ambient conditions detected by a plurality of UEs, analytics functionality, social media, functions for controlling or otherwise interacting with remote devices, functions for an alarm and surveillance center, or any other such function performed by a server. [0013] As a whole, the communication system 900 of Figure 9 enables connectivity between the UEs, network nodes, and hosts. In that sense, the communication system may be configured to operate according to predefined rules or procedures, such as specific standards that include, but are not limited to: Global System for Mobile Communications (GSM); Universal Mobile Telecommunications System (UMTS); Long Term Evolution (LTE), and/or other suitable 2G, 3G, 4G, 5G standards, or any applicable future generation standard (e.g., 6G); wireless local area network (WLAN) standards, such as the Institute of Electrical and Electronics Engineers (IEEE) 802.11 standards (WiFi); and/or any other appropriate wireless communication standard, such as the Worldwide Interoperability for Microwave Access (WiMax), Bluetooth, Z-Wave, Near Field Communication (NFC) ZigBee, LiFi, and/or any low-power wide-area network (LPWAN) standards such as LoRa and Sigfox. [0014] In some examples, the telecommunication network 902 is a cellular network that implements 3GPP standardized features. Accordingly, the telecommunications network 902 may support network slicing to provide different logical networks to different devices that are connected to the telecommunication network 902. For example, the telecommunications network 902 may provide Ultra Reliable Low Latency Communication (URLLC) services to some UEs, while providing Enhanced Mobile Broadband (eMBB) services to other UEs, and/or Massive Machine Type Communication (mMTC)/Massive IoT services to yet further UEs. [0015] In some examples, the UEs 912 are configured to transmit and/or receive information without direct human interaction. For instance, a UE may be designed to transmit information to the access network 904 on a predetermined schedule, when triggered by an internal or external event, or in response to requests from the access network 904. Additionally, a UE may be configured for operating in single- or multi-RAT or multi-standard mode. For example, a UE may operate with any one or combination of Wi-Fi, NR (New Radio) and LTE, i.e. being configured for multi-radio dual connectivity (MR-DC), such as E-UTRAN (Evolved-UMTS Terrestrial Radio Access Network) New Radio – Dual Connectivity (EN-DC). [0016] In the example, the hub 914 communicates with the access network 904 to facilitate indirect communication between one or more UEs (e.g., UE 912c and/or 912d) and network nodes (e.g., network node 910b). In some examples, the hub 914 may be a controller, router, content source and analytics, or any of the other communication devices described herein regarding UEs. For example, the hub 914 may be a broadband router enabling access to the core network 906 for the UEs. As another example, the hub 914 may be a controller that sends commands or instructions to one or more actuators in the UEs. Commands or instructions may be received from the UEs, network nodes 910, or by executable code, script, process, or other instructions in the hub 914. As another example, the hub 914 may be a data collector that acts as temporary storage for UE data and, in some embodiments, may perform analysis or other processing of the data. As another example, the hub 914 may be a content source. For example, for a UE that is a VR headset, display, loudspeaker or other media delivery device, the hub 914 may retrieve VR assets, video, audio, or other media or data related to sensory information via a network node, which the hub 914 then provides to the UE either directly, after performing local processing, and/or after adding additional local content. In still another example, the hub 914 acts as a proxy server or orchestrator for the UEs, in particular in if one or more of the UEs are low energy IoT devices. [0017] The hub 914 may have a constant/persistent or intermittent connection to the network node 910b. The hub 914 may also allow for a different communication scheme and/or schedule between the hub 914 and UEs (e.g., UE 912c and/or 912d), and between the hub 914 and the core network 906. In other examples, the hub 914 is connected to the core network 906 and/or one or more UEs via a wired connection. Moreover, the hub 914 may be configured to connect to an M2M service provider over the access network 904 and/or to another UE over a direct connection. In some scenarios, UEs may establish a wireless connection with the network nodes 910 while still connected via the hub 914 via a wired or wireless connection. In some embodiments, the hub 914 may be a dedicated hub – that is, a hub whose primary function is to route communications to/from the UEs from/to the network node 910b. In other embodiments, the hub 914 may be a non-dedicated hub – that is, a device which is capable of operating to route communications between the UEs and network node 910b, but which is additionally capable of operating as a communication start and/or end point for certain data channels. [0065] Although the computing devices described herein (e.g., wireless devices, network nodes, hosts) may include the illustrated combination of hardware components, other embodiments may comprise computing devices with different combinations of components. It is to be understood that these computing devices may comprise any suitable combination of hardware and/or software needed to perform the tasks, features, functions and methods disclosed herein. Determining, calculating, obtaining or similar operations described herein may be performed by processing circuitry, which may process information by, for example, converting the obtained information into other information, comparing the obtained information or converted information to information stored in the network node, and/or performing one or more operations based on the obtained information or converted information, and as a result of said processing making a determination. Moreover, while components are depicted as single boxes located within a larger box, or nested within multiple boxes, in practice, computing devices may comprise multiple different physical components that make up a single illustrated component, and functionality may be partitioned between separate components. For example, a communication interface may be configured to include any of the components described herein, and/or the functionality of the components may be partitioned between the processing circuitry and the communication interface. In another example, non-computationally intensive functions of any of such components may be implemented in software or firmware and computationally intensive functions may be implemented in hardware. [0066] In certain embodiments, some or all of the functionality described herein may be provided by processing circuitry executing instructions stored on in memory, which in certain embodiments may be a computer program product in the form of a non-transitory computer-readable storage medium. In alternative embodiments, some or all of the functionality may be provided by the processing circuitry without executing instructions stored on a separate or discrete device-readable storage medium, such as in a hard-wired manner. In any of those particular embodiments, whether executing instructions stored on a non- transitory computer-readable storage medium or not, the processing circuitry can be configured to perform the described functionality. The benefits provided by such functionality are not limited to the processing circuitry alone or to other components of the computing device, but are enjoyed by the computing device as a whole, and/or by end users and a wireless network generally.