Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR GENERATING A MACHINE LEARNING MODEL BASED ON ANOMALY NODES OF A GRAPH
Document Type and Number:
WIPO Patent Application WO/2024/081350
Kind Code:
A1
Abstract:
Provided are systems that include at least one processor to receive a dataset comprising a set of labeled anomaly nodes, a set of unlabeled anomaly nodes, and a set of normal nodes, randomly sample a node to provide a set of randomly sampled nodes, generate a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes, combine the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes, and train a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and a center of the set of normal nodes in the embedding space. Methods and computer program products are also disclosed.

Inventors:
CHEN YUZHONG (US)
WU YUHANG (US)
PAN MENGHAI (US)
DAS MAHASHWETA (US)
YANG HAO (US)
Application Number:
PCT/US2023/035007
Publication Date:
April 18, 2024
Filing Date:
October 12, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
VISA INT SERVICE ASS (US)
International Classes:
G06N3/08; G06N3/02; G06N5/04; G06N20/00; G06F18/2323; G06N3/04
Attorney, Agent or Firm:
PREPELKA, Nathan, J. et al. (US)
Download PDF:
Claims:
WHAT IS CLAIMED IS:

1 . A system, comprising: at least one processor programmed or configured to: receive a dataset comprising a set of labeled anomaly nodes, a set of unlabeled anomaly nodes, and a set of normal nodes; for each labeled anomaly node of the set of labeled anomaly nodes: randomly sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide a set of randomly sampled nodes; generate a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes, wherein, when generating the plurality of new nodes, the at least one processor is programmed or configured to: generate a label for each new node, wherein the label is associated with a score indicating how closely the new node represents an abnormal node; combine the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes; and train a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and a center of the set of normal nodes in the embedding space.

2. The system of claim 1 , wherein, when generating the label for each new node, the at least one processor is programmed or configured to: calculate a coefficient for a first new node of the plurality of new nodes based on an embedding of a first labeled anomaly node of the set of labeled anomaly nodes and an embedding of a first randomly sampled node of the set of randomly sampled nodes; and multiply a predetermined score for an anomaly node of the set of labeled anomaly nodes by the coefficient for the first new node to provide a score for the first new node.

3. The system of claim 2, wherein, when calculating the coefficient for the first new node of the plurality of new nodes, the at least one processor is programmed or configured to: calculate a result of a sigmoid function with the embedding of the first labeled anomaly node and the embedding of the first randomly sampled node as inputs; and calculate the coefficient for the first new node based on the result of the sigmoid function.

4. The system of claim 1 , wherein, when randomly sampling a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide the set of randomly sampled nodes, the at least one processor is programmed or configured to: for each labeled anomaly node of the set of labeled anomaly nodes: sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes according to a Metropolis-Hasting sampling algorithm to provide a set of randomly sampled nodes.

5. The system of claim 1 , wherein the machine learning model is a graph neural network (GNN) machine learning model configured to provide an output that includes a prediction regarding whether a node of a graph is an anomaly.

6. The system of claim 1 , wherein, when training the machine learning model, the at least one processor is programmed or configured to: train the machine learning model based on: a first distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, wherein the first distance comprises: a distance between an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes and the center of the combined set of labeled anomaly nodes in the embedding space, and a second distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, wherein the second distance comprises: a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the combined set of labeled anomaly nodes, and a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the set of normal nodes in the embedding space.

7. The system of claim 1 , wherein the at least one processor is further programmed or configured to: calculate the center of the combined set of labeled anomaly nodes in the embedding space based on an aggregation of embeddings of each labeled anomaly node in the combined set of labeled anomaly nodes; and calculate the center of the set of normal nodes in the embedding space based on an aggregation of embeddings of each node in the set of randomly sampled nodes.

8. A computer-implemented method, comprising: receiving, with at least one processor, a dataset comprising a set of labeled anomaly nodes, a set of unlabeled anomaly nodes, and a set of normal nodes; for each labeled anomaly node of the set of labeled anomaly nodes: randomly sampling, with at least one processor, a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide a set of randomly sampled nodes; generating, with at least one processor, a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes, wherein generating the plurality of new nodes comprises: generating a label for each new node, wherein the label is associated with a score indicating how closely the new node represents an abnormal node; combining, with at least one processor, the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes; and training, with at least one processor, a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and a center of the set of normal nodes in the embedding space.

9. The computer-implemented method of claim 8, wherein generating the label for each new node comprises: calculating a coefficient for a first new node of the plurality of new nodes based on an embedding of a first labeled anomaly node of the set of labeled anomaly nodes and an embedding of a first randomly sampled node of the set of randomly sampled nodes; and multiplying a predetermined score for an anomaly node of the set of labeled anomaly nodes by the coefficient for the first new node to provide a score for the first new node.

10. The computer-implemented method of claim 9, wherein calculating the coefficient for the first new node of the plurality of new nodes comprises: calculating a result of a sigmoid function with the embedding of the first labeled anomaly node and the embedding of the first randomly sampled node as inputs; and calculating the coefficient for the first new node based on the result of the sigmoid function.

1 1 . The computer-implemented method of claim 8, wherein randomly sampling a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide the set of randomly sampled nodes comprises: for each labeled anomaly node of the set of labeled anomaly nodes: sampling a node from among the set of unlabeled anomaly nodes and the set of normal nodes according to a Metropolis-Hasting sampling algorithm to provide a set of randomly sampled nodes.

12. The computer-implemented method of claim 8, wherein the machine learning model is a graph neural network (GNN) machine learning model configured to provide an output that includes a prediction regarding whether a node of a graph is an anomaly.

13. The computer-implemented method of claim 8, wherein training the machine learning model comprises: training the machine learning model based on: a first distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, wherein the first distance comprises: a distance between an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes and the center of the combined set of labeled anomaly nodes in the embedding space, and a second distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, wherein the second distance comprises: a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the combined set of labeled anomaly nodes, and a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the set of normal nodes in the embedding space.

14. The computer-implemented method of claim 8, further comprising: calculating the center of the combined set of labeled anomaly nodes in the embedding space based on an aggregation of embeddings of each labeled anomaly node in the combined set of labeled anomaly nodes; and calculating the center of the set of normal nodes in the embedding space based on an aggregation of embeddings of each node in the set of randomly sampled nodes.

15. A computer program product comprising at least one non- transitory computer-readable medium comprising one or more instructions that, when executed by at least one processor, cause the at least one processor to: receive a dataset comprising a set of labeled anomaly nodes, a set of unlabeled anomaly nodes, and a set of normal nodes; for each labeled anomaly node of the set of labeled anomaly nodes: randomly sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide a set of randomly sampled nodes; generate a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes, wherein, when generating the plurality of new nodes, the at least one processor is programmed or configured to: generate a label for each new node, wherein the label is associated with a score indicating how closely the new node represents an abnormal node; combine the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes; and train a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and a center of the set of normal nodes in the embedding space.

16. The computer program product of claim 15, wherein, the one or more instructions that cause the at least one processor to generate the label for each new node, cause the at least one processor to: calculate a coefficient for a first new node of the plurality of new nodes based on an embedding of a first labeled anomaly node of the set of labeled anomaly nodes and an embedding of a first randomly sampled node of the set of randomly sampled nodes; and multiply a predetermined score for an anomaly node of the set of labeled anomaly nodes by the coefficient for the first new node to provide a score for the first new node.

17. The computer program product of claim 16, wherein, the one or more instructions that cause the at least one processor to calculate the coefficient for the first new node of the plurality of new nodes, cause the at least one processor to: calculate a result of a sigmoid function with the embedding of the first labeled anomaly node and the embedding of the first randomly sampled node as inputs; and calculate the coefficient for the first new node based on the result of the sigmoid function.

18. The computer program product of claim 15, wherein, the one or more instructions that cause the at least one processor to randomly sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide the set of randomly sampled nodes, cause the at least one processor to: for each labeled anomaly node of the set of labeled anomaly nodes: sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes according to a Metropolis-Hasting sampling algorithm to provide a set of randomly sampled nodes.

19. The computer program product of claim 15, wherein, the one or more instructions that cause the at least one processor to train the machine learning model, cause the at least one processor to: train the machine learning model based on: a first distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, wherein the first distance comprises: a distance between an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes and the center of the combined set of labeled anomaly nodes in the embedding space, and a second distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, wherein the second distance comprises: a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the combined set of labeled anomaly nodes, and a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the set of normal nodes in the embedding space.

20. The computer program product of claim 15, wherein the one or more instructions further cause the at least one processor to: calculate the center of the combined set of labeled anomaly nodes in the embedding space based on an aggregation of embeddings of each labeled anomaly node in the combined set of labeled anomaly nodes; and calculate the center of the set of normal nodes in the embedding space based on an aggregation of embeddings of each node in the set of randomly sampled nodes.

Description:
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR GENERATING A MACHINE LEARNING MODEL BASED ON ANOMALY NODES OF A GRAPH

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims priority to United States Provisional Patent Application No. 63/415,442, filed on October 12, 2022, the disclosure of which is incorporated by reference herein in its entirety.

BACKGROUND

1 . Field

[0002] This disclosure relates generally to graph neural networks and associated graph data and, in some particular embodiments or aspects, to methods, systems, and computer program products for generating a machine learning model based on anomaly nodes of a graph.

2. Technical Considerations

[0003] Some machine learning models, such as neural networks (e.g., a convolutional neural network), may receive an input dataset including data points for training. Each data point in the training dataset may have a different effect on a neural network (e.g., a trained neural network) generated based on training the neural network after the neural network is trained. In some instances, input datasets designed for neural networks may be independent and identically distributed. Input datasets that are independent and identically distributed may be used to determine an effect (e.g., an influence) of each data point of the input dataset.

[0004] Graph neural networks (GNNs) are designed to receive graph data (e.g., graph data representing graphs) and the graph data may include nodes and edges. A GNN may include graph embeddings (e.g., node data embeddings regarding a graph, edge data embeddings regarding a graph, etc.) that provide low-dimensional feature vector representations of nodes in the GNN such that some property of the GNN is preserved. A GNN may be used to determine relationships (e.g., hidden relationships) among entities.

[0005] However, datasets of graph data for training machine learning models, including GNNs, may lack a sufficient number of labeled and/or unlabeled examples in order to properly train the machine models to make efficient and accurate determinations.

SUMMARY

[0006] Accordingly, it is an object of the presently disclosed subject matter to provide methods, systems, and computer program products for a process for generating a machine learning model based on anomaly nodes of a graph that overcome some or all of the deficiencies identified above.

[0007] According to some non-limiting embodiments or aspects, a system for generating a machine learning model based on anomaly nodes of a graph, comprises at least one processor programmed or configured to: receive a dataset comprising a set of labeled anomaly nodes, a set of unlabeled anomaly nodes, and a set of normal nodes; for each labeled anomaly node of the set of labeled anomaly nodes: randomly sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide a set of randomly sampled nodes; generate a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes, wherein, when generating the plurality of new nodes, the at least one processor is programmed or configured to: generate a label for each new node, wherein the label is associated with a score indicating how closely the new node represents an abnormal node; combine the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes; and train a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and a center of the set of normal nodes in the embedding space.

[0008] According to some non-limiting embodiments or aspects, a computer- implemented method for generating a machine learning model based on anomaly nodes of a graph, comprises receiving, with at least one processor, a dataset comprising a set of labeled anomaly nodes, a set of unlabeled anomaly nodes, and a set of normal nodes; for each labeled anomaly node of the set of labeled anomaly nodes: randomly sampling, with at least one processor, a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide a set of randomly sampled nodes; generating, with at least one processor, a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes, wherein generating the plurality of new nodes comprises: generating a label for each new node, wherein the label is associated with a score indicating how closely the new node represents an abnormal node; combining, with at least one processor, the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes; and training, with at least one processor, a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and a center of the set of normal nodes in the embedding space.

[0009] According to some non-limiting embodiments or aspects, a computer program product for generating a machine learning model based on anomaly nodes of a graph comprises at least one non-transitory computer-readable medium comprising one or more instructions that, when executed by at least one processor, cause the at least one processor to: receive a dataset comprising a set of labeled anomaly nodes, a set of unlabeled anomaly nodes, and a set of normal nodes; for each labeled anomaly node of the set of labeled anomaly nodes: randomly sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide a set of randomly sampled nodes; generate a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes, wherein, when generating the plurality of new nodes, the at least one processor is programmed or configured to: generate a label for each new node, wherein the label is associated with a score indicating how closely the new node represents an abnormal node; combine the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes; and train a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and a center of the set of normal nodes in the embedding space.

[0010] Further embodiments or aspects are set forth in the following numbered clauses:

[0011] Clause 1 : A system, comprising: at least one processor programmed or configured to: receive a dataset comprising a set of labeled anomaly nodes, a set of unlabeled anomaly nodes, and a set of normal nodes; for each labeled anomaly node of the set of labeled anomaly nodes: randomly sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide a set of randomly sampled nodes; generate a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes, wherein, when generating the plurality of new nodes, the at least one processor is programmed or configured to: generate a label for each new node, wherein the label is associated with a score indicating how closely the new node represents an abnormal node; combine the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes; and train a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and a center of the set of normal nodes in the embedding space.

[0012] Clause 2: The system of clause 1 , wherein, when generating the label for each new node, the at least one processor is programmed or configured to: calculate a coefficient for a first new node of the plurality of new nodes based on an embedding of a first labeled anomaly node of the set of labeled anomaly nodes and an embedding of a first randomly sampled node of the set of randomly sampled nodes; and multiply a predetermined score for an anomaly node of the set of labeled anomaly nodes by the coefficient for the first new node to provide a score for the first new node.

[0013] Clause 3: The system of clause 1 or 2, wherein, when calculating the coefficient for the first new node of the plurality of new nodes, the at least one processor is programmed or configured to: calculate a result of a sigmoid function with the embedding of the first labeled anomaly node and the embedding of the first randomly sampled node as inputs; and calculate the coefficient for the first new node based on the result of the sigmoid function.

[0014] Clause 4: The system of any of clauses 1 -3, wherein, when randomly sampling a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide the set of randomly sampled nodes, the at least one processor is programmed or configured to: for each labeled anomaly node of the set of labeled anomaly nodes: sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes according to a Metropolis-Hasting sampling algorithm to provide a set of randomly sampled nodes.

[0015] Clause 5: The system of any of clauses 1 -4, wherein the machine learning model is a graph neural network (GNN) machine learning model configured to provide an output that includes a prediction regarding whether a node of a graph is an anomaly. [0016] Clause 6: The system of any of clauses 1 -5, wherein, when training the machine learning model, the at least one processor is programmed or configured to: train the machine learning model based on: a first distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, wherein the first distance comprises: a distance between an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes and the center of the combined set of labeled anomaly nodes in the embedding space, and a second distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, wherein the second distance comprises: a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the combined set of labeled anomaly nodes, and a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the set of normal nodes in the embedding space. [0017] Clause 7: The system of any of clauses 1 -6, wherein the at least one processor is further programmed or configured to: calculate the center of the combined set of labeled anomaly nodes in the embedding space based on an aggregation of embeddings of each labeled anomaly node in the combined set of labeled anomaly nodes; and calculate the center of the set of normal nodes in the embedding space based on an aggregation of embeddings of each node in the set of randomly sampled nodes.

[0018] Clause 8: A computer-implemented method, comprising: receiving, with at least one processor, a dataset comprising a set of labeled anomaly nodes, a set of unlabeled anomaly nodes, and a set of normal nodes; for each labeled anomaly node of the set of labeled anomaly nodes: randomly sampling, with at least one processor, a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide a set of randomly sampled nodes; generating, with at least one processor, a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes, wherein generating the plurality of new nodes comprises: generating a label for each new node, wherein the label is associated with a score indicating how closely the new node represents an abnormal node; combining, with at least one processor, the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes; and training, with at least one processor, a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and a center of the set of normalnodes in the embedding space. [0019] Clause 9: The computer-implemented method of clause 8, whereingenerating the label for each new node comprises: calculating ac oefficient for a firstnew node of the plurality of new nodes based on an embedding of a first labeledanomaly node of the set of labeled anomaly nodes and an embedding of a firstrandomly sampled node of the set of randomly sampled nodes; and multiplying apredetermined score for an anomaly node of the set of labeled anomaly nodes by thecoefficient for the first new node to provide a score for the first new node. [0020] Clause 10: The computer-implemented method of clause 8 or 9, whereincalculating the coefficient for the first new node of the plurality of new nodescomprises: calculating a result of a sigmoid function with the embedding of the firstlabeled anomaly node and the embedding of the first randomly sampled node asinputs; and calculating the coefficient for the first new node based on the result of thesigmoidfunction. [0021] Clause 11: The computer-implemented method of any of clauses 8-10,wherein randomly sampling a node from among the set of unlabeled anomalynodes and the set of normal nodes to provide the set of randomly sampled nodes comprises:for each labeled anomaly node of the set of labeled anomaly nodes: sampling a nodefrom among the set of unlabeled anomaly nodes and the set of normal nodesaccording to a Metropolis-Hasting sampling algorithm to provide a set of randomlysampled nodes. [0022] Clause 12: The computer-implemented method of any of clauses 8-11,wherein the machine learning model is a graph neural network (GNN) machinelearning model configured to provide an outputthat includes a prediction regardingwhetheranodeofagraphisananomaly. [0023] Clause 13: The computer-implemented method of any of clauses 8-12,wherein training the machine learning model comprises: training the machine learningmodel based on: a first distance associated with an embedding of each labeledanomaly node in the combined set of labeled anomaly nodes, wherein the firstdistance comprises: a distance between an embedding of each labeled anomaly nodein the combined set of labeled anomaly nodes and the center of the combined set oflabeled anomaly nodes in the embedding space, and a second distance associatedwith an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, wherein the second distance comprises: a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the combined set of labeled anomaly nodes, and a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the set of normal nodes in the embedding space.

[0024] Clause 14: The computer-implemented method of any of clauses 8-13, further comprising: calculating the center of the combined set of labeled anomaly nodes in the embedding space based on an aggregation of embeddings of each labeled anomaly node in the combined set of labeled anomaly nodes; and calculating the center of the set of normal nodes in the embedding space based on an aggregation of embeddings of each node in the set of randomly sampled nodes.

[0025] Clause 15: A computer program product comprising at least one non- transitory computer-readable medium comprising one or more instructions that, when executed by at least one processor, cause the at least one processor to: receive a dataset comprising a set of labeled anomaly nodes, a set of unlabeled anomaly nodes, and a set of normal nodes; for each labeled anomaly node of the set of labeled anomaly nodes: randomly sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide a set of randomly sampled nodes; generate a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes, wherein, when generating the plurality of new nodes, the at least one processor is programmed or configured to: generate a label for each new node, wherein the label is associated with a score indicating how closely the new node represents an abnormal node; combine the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes; and train a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and a center of the set of normal nodes in the embedding space.

[0026] Clause 16: The computer program product of clause 15, wherein, the one or more instructions that cause the at least one processor to generate the label for each new node, cause the at least one processor to: calculate a coefficient for a first new node of the plurality of new nodes based on an embedding of a first labeled anomaly node of the set of labeled anomaly nodes and an embedding of a first randomly sampled node of the set of randomly sampled nodes; and multiply a predetermined score for an anomaly node of the set of labeled anomaly nodes by the coefficient for the first new node to provide a score for the first new node.

[0027] Clause 17: The computer program product of clause 15 or 16, wherein, the one or more instructions that cause the at least one processor to calculate the coefficient for the first new node of the plurality of new nodes, cause the at least one processor to: calculate a result of a sigmoid function with the embedding of the first labeled anomaly node and the embedding of the first randomly sampled node as inputs; and calculate the coefficient for the first new node based on the result of the sigmoid function.

[0028] Clause 18: The computer program product of any of clauses 15-17, wherein, the one or more instructions that cause the at least one processor to randomly sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide the set of randomly sampled nodes, cause the at least one processor to: for each labeled anomaly node of the set of labeled anomaly nodes: sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes according to a Metropolis-Hasting sampling algorithm to provide a set of randomly sampled nodes. [0029] Clause 19: The computer program product of any of clauses 15-18, wherein, the one or more instructions that cause the at least one processor to train the machine learning model, cause the at least one processor to: train the machine learning model based on: a first distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, wherein the first distance comprises: a distance between an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes and the center of the combined set of labeled anomaly nodes in the embedding space, and a second distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, wherein the second distance comprises: a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the combined set of labeled anomaly nodes, and a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the set of normal nodes in the embedding space.

[0030] Clause 20: The computer program product of any of clauses 15-19, wherein the one or more instructions further cause the at least one processor to: calculate the center of the combined set of labeled anomaly nodes in the embedding space based on an aggregation of embeddings of each labeled anomaly node in the combined set of labeled anomaly nodes; and calculate the center of the set of normal nodes in the embedding space based on an aggregation of embeddings of each node in the set of randomly sampled nodes.

[0031] These and other features and characteristics of the presently disclosed subject matter, as well as the methods of operation and functions of the related elements of structures and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosed subject matter. As used in the specification and the claims, the singular form of “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise.

BRIEF DESCRIPTION OF THE DRAWINGS

[0032] Additional advantages and details of the disclosed subject matter are explained in greater detail below with reference to the exemplary embodiments or aspects that are illustrated in the accompanying figures, in which:

[0033] FIG. 1 is a diagram of a non-limiting embodiment or aspect of an environment in which methods, systems, and/or computer program products, described herein, may be implemented according to the principles of the presently disclosed subject matter;

[0034] FIG. 2 is a diagram of a non-limiting embodiment or aspect of components of one or more devices of FIG. 1 ;

[0035] FIG. 3 is a flowchart of a non-limiting embodiment or aspect of a process for generating a machine learning model based on anomaly nodes of a graph; and [0036] FIGS. 4A-4F are diagrams of non-limiting embodiments or aspects of an implementation of a process for generating a machine learning model based on anomaly nodes of a graph.

DESCRIPTION

[0037] For purposes of the description hereinafter, the terms “end,” “upper,” “lower,” “right,” “left,” “vertical,” “horizontal,” “top,” “bottom,” “lateral,” “longitudinal,” and derivatives thereof shall relate to the disclosed subject matter as it is oriented in the drawing figures. However, it is to be understood that the disclosed subject matter may assume various alternative variations and step sequences, except where expressly specified to the contrary. It is also to be understood that the specific devices and processes illustrated in the attached drawings, and described in the following specification, are simply exemplary embodiments or aspects of the disclosed subject matter. Hence, specific dimensions and other physical characteristics related to the embodiments or aspects disclosed herein are not to be considered as limiting unless otherwise indicated.

[0038] No aspect, component, element, structure, act, step, function, instruction, and/or the like used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items and may be used interchangeably with “one or more” and “at least one.” Furthermore, as used herein, the term “set” is intended to include one or more items (e.g., related items, unrelated items, a combination of related and unrelated items, and/or the like) and may be used interchangeably with “one or more” or “at least one.” Where only one item is intended, the term “one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based at least partially on” unless explicitly stated otherwise.

[0039] As used herein, the terms “communication” and “communicate” may refer to the reception, receipt, transmission, transfer, provision, and/or the like of information (e.g., data, signals, messages, instructions, commands, and/or the like). For one unit (e.g., a device, a system, a component of a device or system, combinations thereof, and/or the like) to be in communication with another unit means that the one unit is able to directly or indirectly receive information from and/or transmit information to the other unit. This may refer to a direct or indirect connection (e.g., a direct communication connection, an indirect communication connection, and/or the like) that is wired and/or wireless in nature. Additionally, two units may be in communication with each other even though the information transmitted may be modified, processed, relayed, and/or routed between the first and second unit. For example, a first unit may be in communication with a second unit even though the first unit passively receives information and does not actively transmit information to the second unit. As another example, a first unit may be in communication with a second unit if at least one intermediary unit (e.g., a third unit located between the first unit and the second unit) processes information received from the first unit and communicates the processed information to the second unit. In some non-limiting embodiments or aspects, a message may refer to a network packet (e.g., a data packet and/or the like) that includes data. It will be appreciated that numerous other arrangements are possible. [0040] As used herein, the terms “issuer institution,” “portable financial device issuer,” “issuer,” or “issuer bank” may refer to one or more entities that provide accounts to customers for conducting transactions (e.g., payment transactions), such as initiating credit and/or debit payments. For example, an issuer institution may provide an account identifier, such as a primary account number (PAN), to a customer that uniquely identifies one or more accounts associated with that customer. The account identifier may be embodied on a portable financial device, such as a physical financial instrument, e.g., a payment card, and/or may be electronic and used for electronic payments. The terms “issuer institution” and “issuer institution system” may also refer to one or more computer systems operated by or on behalf of an issuer institution, such as a server computer executing one or more software applications. For example, an issuer institution system may include one or more authorization servers for authorizing a transaction.

[0041] As used herein, the term “account identifier” may include one or more types of identifiers associated with a user account (e.g., a PAN, a card number, a payment card number, a payment token, and/or the like). In some non-limiting embodiments or aspects, an issuer institution may provide an account identifier (e.g., a PAN, a payment token, and/or the like) to a user that uniquely identifies one or more accounts associated with that user. The account identifier may be embodied on a physical financial instrument (e.g., a portable financial instrument, a payment card, a credit card, a debit card, and/or the like) and/or may be electronic information communicated to the user that the user may use for electronic payments. In some non-limiting embodiments or aspects, the account identifier may be an original account identifier, where the original account identifier was provided to a user at the creation of the account associated with the account identifier. In some non-limiting embodiments or aspects, the account identifier may be an account identifier (e.g., a supplemental account identifier) that is provided to a user after the original account identifier was provided to the user. For example, if the original account identifier is forgotten, stolen, and/or the like, a supplemental account identifier may be provided to the user. In some non-limiting embodiments or aspects, an account identifier may be directly or indirectly associated with an issuer institution such that an account identifier may be a payment token that maps to a PAN or other type of identifier. Account identifiers may be alphanumeric, any combination of characters and/or symbols, and/or the like. An issuer institution may be associated with a bank identification number (BIN) that uniquely identifies the issuer institution.

[0042] As used herein, the terms “payment token” or “token” may refer to an identifier that is used as a substitute or replacement identifier for an account identifier, such as a PAN. Tokens may be associated with a PAN or other account identifiers in one or more data structures (e.g., one or more databases and/or the like) such that they can be used to conduct a transaction (e.g., a payment transaction) without directly using the account identifier, such as a PAN. In some examples, an account identifier, such as a PAN, may be associated with a plurality of tokens for different individuals, different uses, and/or different purposes. For example, a payment token may include a series of numeric and/or alphanumeric characters that may be used as a substitute for an original account identifier. For example, a payment token “4900 0000 0000 0001 ” may be used in place of a PAN “4147 0900 0000 1234.” In some non-limiting embodiments or aspects, a payment token may be “format preserving” and may have a numeric format that conforms to the account identifiers used in existing payment processing networks (e.g., ISO 8583 financial transaction message format). In some non-limiting embodiments or aspects, a payment token may be used in place of a PAN to initiate, authorize, settle, or resolve a payment transaction or represent the original credential in other systems where the original credential would typically be provided. In some non-limiting embodiments or aspects, a token value may be generated such that the recovery of the original PAN or other account identifier from the token value may not be computationally derived (e.g., with a one-way hash or other cryptographic function). Further, in some non-limiting embodiments or aspects, the token format may be configured to allow the entity receiving the payment token to identify it as a payment token and recognize the entity that issued the token.

[0043] As used herein, the term “token requestor” may refer to an entity that is seeking to implement tokenization according to embodiments or aspects of the presently disclosed subject matter. For example, the token requestor may initiate a request that a PAN be tokenized by submitting a token request message to a token service provider. Additionally or alternatively, a token requestor may no longer need to store a PAN associated with a token once the requestor has received the payment token in response to a token request message. In some non-limiting embodiments or aspects, the requestor may be an application, a device, a process, or a system that is configured to perform actions associated with tokens. For example, a requestor may request registration with a network token system, request token generation, token activation, token de-activation, token exchange, other token lifecycle management related processes, and/or any other token related processes. In some non-limiting embodiments or aspects, a requestor may interface with a network token system through any suitable communication network and/or protocol (e.g., using HTTPS, SOAP, and/or an XML interface among others). For example, a token requestor may include card-on-file merchants, acquirers, acquirer processors, payment gateways acting on behalf of merchants, payment enablers (e.g., original equipment manufacturers, mobile network operators, and/or the like), digital wallet providers, issuers, third-party wallet providers, payment processing networks, and/or the like. In some non-limiting embodiments or aspects, a token requestor may request tokens for multiple domains and/or channels. Additionally or alternatively, a token requestor may be registered and identified uniquely by the token service provider within the tokenization ecosystem. For example, during token requestor registration, the token service provider may formally process a token requestor's application to participate in the token service system. In some non-limiting embodiments or aspects, the token service provider may collect information pertaining to the nature of the requestor and relevant use of tokens to validate and formally approve the token requestor and establish appropriate domain restriction controls. Additionally or alternatively, successfully registered token requestors may be assigned a token requestor identifier that may also be entered and maintained within the token vault. In some non-limiting embodiments or aspects, token requestor identifiers may be revoked and/or token requestors may be assigned new token requestor identifiers. In some non-limiting embodiments or aspects, this information may be subject to reporting and audit by the token service provider.

[0044] As used herein, the term “token service provider” may refer to an entity including one or more server computers in a token service system that generates, processes and maintains payment tokens. For example, the token service provider may include or be in communication with a token vault where the generated tokens are stored. Additionally or alternatively, the token vault may maintain one-to-one mapping between a token and a PAN represented by the token. In some non-limiting embodiments or aspects, the token service provider may have the ability to set aside licensed BINs as token BINs to issue tokens for the PANs that may be submitted to the token service provider. In some non-limiting embodiments or aspects, various entities of a tokenization ecosystem may assume the roles of the token service provider. For example, payment networks and issuers or their agents may become the token service provider by implementing the token services according to nonlimiting embodiments or aspects of the presently disclosed subject matter. Additionally or alternatively, a token service provider may provide reports or data output to reporting tools regarding approved, pending, or declined token requests, including any assigned token requestor ID. The token service provider may provide data output related to token-based transactions to reporting tools and applications and present the token and/or PAN as appropriate in the reporting output. In some non-limiting embodiments or aspects, the EMVCo standards organization may publish specifications defining how tokenized systems may operate. For example, such specifications may be informative, but they are not intended to be limiting upon any of the presently disclosed subject matter.

[0045] As used herein, the term “token vault” may refer to a repository that maintains established token-to-PAN mappings. For example, the token vault may also maintain other attributes of the token requestor that may be determined at the time of registration and/or that may be used by the token service provider to apply domain restrictions or other controls during transaction processing. In some non-limiting embodiments or aspects, the token vault may be a part of a token service system. For example, the token vault may be provided as a part of the token service provider. Additionally or alternatively, the token vault may be a remote repository accessible by the token service provider. In some non-limiting embodiments or aspects, token vaults, due to the sensitive nature of the data mappings that are stored and managed therein, may be protected by strong underlying physical and logical security. Additionally or alternatively, a token vault may be operated by any suitable entity, including a payment network, an issuer, clearing houses, other financial institutions, transaction service providers, and/or the like.

[0046] As used herein, the term “merchant” may refer to one or more entities (e.g., operators of retail businesses that provide goods and/or services, and/or access to goods and/or services, to a user (e.g., a customer, a consumer, a customer of the merchant, and/or the like) based on a transaction (e.g., a payment transaction)). As used herein, the term “merchant system” may refer to one or more computer systems operated by or on behalf of a merchant, such as a server computer executing one or more software applications. As used herein, the term “product” may refer to one or more goods and/or services offered by a merchant.

[0047] As used herein, the term “point-of-sale (POS) device” may refer to one or more devices, which may be used by a merchant to initiate transactions (e.g., a payment transaction), engage in transactions, and/or process transactions. For example, a POS device may include one or more computers, peripheral devices, card readers, near-field communication (NFC) receivers, radio frequency identification (RFID) receivers, and/or other contactless transceivers or receivers, contact-based receivers, payment terminals, computers, servers, input devices, and/or the like.

[0048] As used herein, the term “point-of-sale (POS) system” may refer to one or more computers and/or peripheral devices used by a merchant to conduct a transaction. For example, a POS system may include one or more POS devices and/or other like devices that may be used to conduct a payment transaction. A POS system (e.g., a merchant POS system) may also include one or more server computers programmed or configured to process online payment transactions through webpages, mobile applications, and/or the like.

[0049] As used herein, the term “transaction service provider” may refer to an entity that receives transaction authorization requests from merchants or other entities and provides guarantees of payment, in some cases through an agreement between the transaction service provider and the issuer institution. In some non-limiting embodiments or aspects, a transaction service provider may include a credit card company, a debit card company, and/or the like. As used herein, the term “transaction service provider system” may also refer to one or more computer systems operated by or on behalf of a transaction service provider, such as a transaction processing server executing one or more software applications. A transaction processing server may include one or more processors and, in some non-limiting embodiments or aspects, may be operated by or on behalf of a transaction service provider.

[0050] As used herein, the term “acquirer” may refer to an entity licensed by the transaction service provider and approved by the transaction service provider to originate transactions (e.g., payment transactions) using a portable financial device associated with the transaction service provider. As used herein, the term “acquirer system” may also refer to one or more computer systems, computer devices, and/or the like operated by or on behalf of an acquirer. The transactions may include payment transactions (e.g., purchases, original credit transactions (OCTs), account funding transactions (AFTs), and/or the like). In some non-limiting embodiments or aspects, the acquirer may be authorized by the transaction service provider to assign merchant or service providers to originate transactions using a portable financial device of the transaction service provider. The acquirer may contract with payment facilitators to enable the payment facilitators to sponsor merchants. The acquirer may monitor compliance of the payment facilitators in accordance with regulations of the transaction service provider. The acquirer may conduct due diligence of the payment facilitators and ensure that proper due diligence occurs before signing a sponsored merchant. The acquirer may be liable for all transaction service provider programs that the acquirer operates or sponsors. The acquirer may be responsible for the acts of the acquirer's payment facilitators, merchants that are sponsored by an acquirer's payment facilitators, and/or the like. In some non-limiting embodiments or aspects, an acquirer may be a financial institution, such as a bank.

[0051] As used herein, the terms “electronic wallet,” “electronic wallet mobile application,” and “digital wallet” may refer to one or more electronic devices and/or one or more software applications configured to initiate and/or conduct transactions (e.g., payment transactions, electronic payment transactions, and/or the like). For example, an electronic wallet may include a user device (e.g., a mobile device) executing an application program and server-side software and/or databases for maintaining and providing transaction data to the user device. As used herein, the term “electronic wallet provider” may include an entity that provides and/or maintains an electronic wallet and/or an electronic wallet mobile application for a user (e.g., a customer). Examples of an electronic wallet provider include, but are not limited to, Google Pay®, Android Pay®, Apple Pay®, and Samsung Pay®. In some non-limiting examples, a financial institution (e.g., an issuer institution) may be an electronic wallet provider. As used herein, the term “electronic wallet provider system” may refer to one or more computer systems, computer devices, servers, groups of servers, and/or the like operated by or on behalf of an electronic wallet provider.

[0052] As used herein, the term “payment device” may refer to a payment card (e.g., a credit or debit card), a gift card, a smartcard, smart media, a payroll card, a healthcare card, a wristband, a machine-readable medium containing account information, a keychain device or fob, an RFID transponder, a retailer discount or loyalty card, a cellular phone, an electronic wallet mobile application, a personal digital assistant (PDA), a pager, a security card, a computer, an access card, a wireless terminal, a transponder, and/or the like. In some non-limiting embodiments or aspects, the portable financial device may include volatile or non-volatile memory to store information (e.g., an account identifier, a name of the account holder, and/or the like). [0053] As used herein, the term “payment gateway” may refer to an entity and/or a payment processing system operated by or on behalf of such an entity (e.g., a merchant service provider, a payment service provider, a payment facilitator, a payment facilitator that contracts with an acquirer, a payment aggregator, and/or the like), which provides payments e.g., transaction service provider payment services, payment processing services, and/or the like) to one or more merchants. The payment services may be associated with the use of portable financial devices managed by a transaction service provider. As used herein, the term “payment gateway system” may refer to one or more computer systems, computer devices, servers, groups of servers, and/or the like operated by or on behalf of a payment gateway and/or to a payment gateway itself. As used herein, the term “payment gateway mobile application” may refer to one or more electronic devices and/or one or more software applications configured to provide payment services for transactions (e.g., payment transactions, electronic payment transactions, and/or the like).

[0054] As used herein, the terms “client” and “client device” may refer to one or more client-side devices or systems (e.g., remote from a transaction service provider) used to initiate or facilitate a transaction (e.g., a payment transaction). As an example, a “client device” may refer to one or more POS devices used by a merchant, one or more acquirer host computers used by an acquirer, one or more mobile devices used by a user, and/or the like. In some non-limiting embodiments or aspects, a client device may be an electronic device configured to communicate with one or more networks and initiate or facilitate transactions. For example, a client device may include one or more computers, portable computers, laptop computers, tablet computers, mobile devices, cellular phones, wearable devices (e.g., watches, glasses, lenses, clothing, and/or the like), PDAs, and/or the like. Moreover, a “client” may also refer to an entity (e.g., a merchant, an acquirer, and/or the like) that owns, utilizes, and/or operates a client device for initiating transactions (e.g., for initiating transactions with a transaction service provider). [0055] As used herein, the term “computing device” may refer to one or more electronic devices that are configured to directly or indirectly communicate with or over one or more networks. A computing device may be a mobile device, a desktop computer, and/or any other like device. Furthermore, the term “computer” may refer to any computing device that includes the necessary components to receive, process, and output data, and normally includes a display, a processor, a memory, an input device, and a network interface. As used herein, the term “server” may refer to or include one or more processors or computers, storage devices, or similar computer arrangements that are operated by or facilitate communication and processing for multiple parties in a network environment, such as the Internet, although it will be appreciated that communication may be facilitated over one or more public or private network environments and that various other arrangements are possible. Further, multiple computers, e.g., servers, or other computerized devices, such as POS devices, directly or indirectly communicating in the network environment may constitute a “system,” such as a merchant's POS system.

[0056] The term “processor,” as used herein, may represent any type of processing unit, such as a single processor having one or more cores, one or more cores of one or more processors, multiple processors each having one or more cores, and/or other arrangements and combinations of processing units.

[0057] As used herein, the term “system” may refer to one or more computing devices or combinations of computing devices (e.g., processors, servers, client devices, software applications, components of such, and/or the like). Reference to “a device,” “a server,” “a processor,” and/or the like, as used herein, may refer to a previously-recited device, server, or processor that is recited as performing a previous step or function, a different server or processor, and/or a combination of servers and/or processors. For example, as used in the specification and the claims, a first server or a first processor that is recited as performing a first step or a first function may refer to the same or different server or the same or different processor recited as performing a second step or a second function.

[0058] Non-limiting embodiments or aspects of the disclosed subject matter are directed to methods, systems, and computer program products for generating a machine learning model based on anomaly nodes of a graph. In some non-limiting embodiments or aspects, a graph learning system may include at least one processor programmed or configured to receive a dataset comprising a set of labeled anomaly nodes, a set of unlabeled anomaly nodes, and a set of normal nodes, for each labeled anomaly node of the set of labeled anomaly nodes: randomly sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide a set of randomly sampled nodes, generate a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes, where, when generating the plurality of new nodes, the at least one processor is programmed or configured to generate a label for each new node, wherein the label is associated with a score indicating how closely the new node represents an abnormal node, combine the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes, and train a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and a center of the set of normal nodes in the embedding space.

[0059] In some non-limiting embodiments or aspects, when generating the label for each new node, the at least one processor is programmed or configured to calculate a coefficient for a first new node of the plurality of new nodes based on an embedding of a first labeled anomaly node of the set of labeled anomaly nodes and an embedding of a first randomly sampled node of the set of randomly sampled nodes and multiply a predetermined score for an anomaly node of the set of labeled anomaly nodes by the coefficient for the first new node to provide a score for the first new node.

[0060] In some non-limiting embodiments or aspects, when calculating the coefficient for the first new node of the plurality of new nodes, the at least one processor is programmed or configured to calculate a result of a sigmoid function with the embedding of the first labeled anomaly node and the embedding of the first randomly sampled node as inputs and calculate the coefficient for the first new node based on the result of the sigmoid function.

[0061] In some non-limiting embodiments or aspects, when randomly sampling a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide the set of randomly sampled nodes, the at least one processor is programmed or configured to, for each labeled anomaly node of the set of labeled anomaly nodes, sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes according to a Metropolis-Hasting sampling algorithm to provide a set of randomly sampled nodes. In some non-limiting embodiments or aspects, the machine learning model is graph neural network machine learning model configured to provide an output that includes a prediction regarding whether a node of a graph is an anomaly.

[0062] In some non-limiting embodiments or aspects, when training the machine learning model, the at least one processor is programmed or configured to train the machine learning model based on a first distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, where the first distance comprises a distance between an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes and the center of the combined set of labeled anomaly nodes in the embedding space, and a second distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, where the second distance comprises a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the combined set of labeled anomaly nodes, and a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the set of normal nodes in the embedding space. In some nonlimiting embodiments or aspects, the at least one processor is further programmed or configured to calculate the center of the combined set of labeled anomaly nodes in the embedding space based on an aggregation of embeddings of each labeled anomaly node in the combined set of labeled anomaly nodes and calculate the center of the set of normal nodes in the embedding space based on an aggregation of embeddings of each node in the set of randomly sampled nodes.

[0063] In this way, the graph learning system may allow for providing datasets of graph data for training machine learning models that include a sufficient number of labeled and/or unlabeled examples that can be used to properly train the machine learning models to make efficient and accurate determinations.

[0064] For the purpose of illustration, in the following description, while the presently disclosed subject matter is described with respect to systems, methods, and computer program products for generating a machine learning model based on anomaly nodes of a graph, e.g., for anomaly detection associated with payment transactions, one skilled in the art will recognize that the disclosed subject matter is not limited to the non-limiting embodiments or aspects disclosed herein. For example, the systems, methods, and computer program products described herein may be used with a wide variety of settings, such as generating a machine learning model based on anomaly nodes of a graph in any suitable setting, e.g., predictions, regressions, classifications, fraud prevention, authorization, authentication, identification, feature selection, and/or the like.

[0065] Some non-limiting embodiments or aspects are described herein in connection with thresholds. As used herein, satisfying a threshold may refer to a value being greater than the threshold, more than the threshold, higher than the threshold, greater than or equal to the threshold, less than the threshold, fewer than the threshold, lower than the threshold, less than or equal to the threshold, equal to the threshold, etc.

[0066] Referring now to FIG. 1 , FIG. 1 is a diagram of a non-limiting embodiment or aspect of an environment 100 in which systems, methods, and/or computer program products, as described herein, may be implemented. As shown in FIG. 1 , environment 100 includes graph learning system 102, data source 102a, transaction service provider system 104, issuer system 106, user device 108, and communication network 1 10. Graph learning system 102, transaction service provider system 104, issuer system 106, and/or user device 108 may interconnect (e.g., establish a connection to communicate) via wired connections, wireless connections, or a combination of wired and wireless connections.

[0067] Graph learning system 102 may include one or more devices configured to communicate with transaction service provider system 104 and/or user device 106 via communication network 1 10. For example, graph learning system 102 may include a server, a group of servers, and/or other like devices. In some non-limiting embodiments or aspects, graph learning system 102 may be associated with a transaction service provider system. For example, graph learning system 102 may be operated by the transaction service provider system. In another example, graph learning system 102 may be a component of transaction service provider system 104. In some non-limiting embodiments or aspects, graph learning system 102 may be in communication with data source 102a (e.g., a data storage device), which may be local or remote to graph learning system 102. In some non-limiting embodiments or aspects, graph learning system 102 may be capable of receiving information from, storing information in, transmitting information to, and/or searching information stored in data source 102a.

[0068] Transaction service provider system 104 may include one or more devices configured to communicate with graph learning system 102, issuer system 106, and/or user device 108 via communication network 1 10. For example, transaction service provider system 104 may include a computing device, such as a server, a group of servers, and/or other like devices. In some non-limiting embodiments or aspects, transaction service provider system 104 may be associated with a transaction service provider.

[0069] Issuer system 106 may include one or more devices configured to communicate with graph learning system 102, transaction service provider system 104, and/or user device 108 via communication network 1 10. For example, issuer system 106 may include a computing device, such as a server, a group of servers, and/or other like devices. In some non-limiting embodiments or aspects, issuer system 106 may be associated with a transaction service provider system.

[0070] User device 108 may include a computing device configured to communicate with graph learning system 102, transaction service provider system 104, and/or issuer system 106 via communication network 1 10. For example, user device 108 may include a computing device, such as a desktop computer, a portable computer (e.g., tablet computer, a laptop computer, and/or the like), a mobile device (e.g., a cellular phone, a smartphone, a personal digital assistant, a wearable device, and/or the like), and/or other like devices. In some non-limiting embodiments or aspects, user device 108 may be associated with a user (e.g., an individual operating user device 108).

[0071 ] Communication network 1 10 may include one or more wired and/or wireless networks. For example, communication network 1 10 may include a cellular network (e.g., a long-term evolution (LTE) network, a third generation (3G) network, a fourth generation (4G) network, a fifth generation (5G) network, a code division multiple access (CDMA) network, and/or the like), a public land mobile network (PLMN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a telephone network (e.g., the public switched telephone network (PSTN)), a private network (e.g., a private network associated with a transaction service provider), an ad hoc network, an intranet, the Internet, a fiber optic-based network, a cloud computing network, and/or the like, and/or a combination of these or other types of networks.

[0072] The number and arrangement of systems, devices, and/or networks shown in FIG. 1 are provided as an example. There may be additional systems, devices, and/or networks; fewer systems, devices, and/or networks; different systems, devices, and/or networks; and/or differently arranged systems, devices, and/or networks than those shown in FIG. 1 . Furthermore, two or more systems or devices shown in FIG. 1 may be implemented within a single system or device, or a single system or device shown in FIG. 1 may be implemented as multiple, distributed systems or devices. Additionally or alternatively, a set of systems (e.g., one or more systems) or a set of devices (e.g., one or more devices) of environment 100 may perform one or more functions described as being performed by another set of systems or another set of devices of environment 100.

[0073] Referring now to FIG. 2, FIG. 2 is a diagram of example components of a device 200. Device 200 may correspond to one or more devices of graph learning system 102 (e.g., one or more devices of graph learning system 102), transaction service provider system 104 (e.g., one or more devices of transaction service provider system 104), issuer system 106, and/or user device 108. In some non-limiting embodiments or aspects, graph learning system 102, transaction service provider system 104, issuer system 106, and/or user device 108 may include at least one device 200 and/or at least one component of device 200.

[0074] As shown in FIG. 2, device 200 may include bus 202, processor 204, memory 206, storage component 208, input component 210, output component 212, and communication interface 214. Bus 202 may include a component that permits communication among the components of device 200. In some non-limiting embodiments or aspects, processor 204 may be implemented in hardware, software, firmware, and/or any combination thereof. For example, processor 204 may include a processor (e.g., a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), and/or the like), a microprocessor, a digital signal processor (DSP), and/or any processing component (e.g., a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), and/or the like), and/or the like, which can be programmed to perform a function. Memory 206 may include random access memory (RAM), read-only memory (ROM), and/or another type of dynamic or static storage device (e.g., flash memory, magnetic memory, optical memory, and/or the like) that stores information and/or instructions for use by processor 204.

[0075] Storage component 208 may store information and/or software related to the operation and use of device 200. For example, storage component 208 may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, a solid state disk, and/or the like), a compact disc (CD), a digital versatile disc (DVD), a floppy disk, a cartridge, a magnetic tape, and/or another type of computer-readable medium, along with a corresponding drive.

[0076] Input component 210 may include a component that permits device 200 to receive information, such as via user input (e.g., a touch screen display, a keyboard, a keypad, a mouse, a button, a switch, a microphone, a camera, and/or the like). Additionally or alternatively, input component 210 may include a sensor for sensing information (e.g., a global positioning system (GPS) component, an accelerometer, a gyroscope, an actuator, and/or the like). Output component 212 may include a component that provides output information from device 200 (e.g., a display, a speaker, one or more light-emitting diodes (LEDs), and/or the like).

[0077] Communication interface 214 may include a transceiver-like component (e.g., a transceiver, a receiver and transmitter that are separate, and/or the like) that enables device 200 to communicate with other devices, such as via a wired connection, a wireless connection, or a combination of wired and wireless connections. Communication interface 214 may permit device 200 to receive information from another device and/or provide information to another device. For example, communication interface 214 may include an Ethernet interface, an optical interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, a Wi-Fi® interface, a Bluetooth® interface, a Zigbee® interface, a cellular network interface, and/or the like.

[0078] Device 200 may perform one or more processes described herein. Device 200 may perform these processes based on processor 204 executing software instructions stored by a computer-readable medium, such as memory 206 and/or storage component 208. A computer-readable medium (e.g., a non-transitory computer-readable medium) is defined herein as a non-transitory memory device. A non-transitory memory device includes memory space located inside of a single physical storage device or memory space spread across multiple physical storage devices.

[0079] Software instructions may be read into memory 206 and/or storage component 208 from another computer-readable medium or from another device via communication interface 214. When executed, software instructions stored in memory 206 and/or storage component 208 may cause processor 204 to perform one or more processes described herein. Additionally or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein. Thus, embodiments or aspects described herein are not limited to any specific combination of hardware circuitry and software.

[0080] The number and arrangement of components shown in FIG. 2 are provided as an example. In some non-limiting embodiments or aspects, device 200 may include additional components, fewer components, different components, or differently arranged components than those shown in FIG. 2. Additionally or alternatively, a set of components (e.g., one or more components) of device 200 may perform one or more functions described as being performed by another set of components of device 200.

[0081] Referring now to FIG. 3, FIG. 3 is a flowchart of a non-limiting embodiment or aspect of a process 300 for generating a machine learning model based on anomaly nodes of a graph. In some non-limiting embodiments or aspects, one or more of the steps of process 300 may be performed (e.g., completely, partially, etc.) by graph learning system 102 (e.g., one or more devices of graph learning system 102). In some non-limiting embodiments or aspects, one or more of the steps of process 300 may be performed (e.g., completely, partially, etc.) by another device or a group of devices separate from or including graph learning system 102 (e.g., one or more devices of graph learning system 102), transaction service provider system 104 (e.g., one or more devices of transaction service provider system 104), issuer system 106, and/or user device 108.

[0082] As shown in FIG. 3, at step 302, process 300 includes receiving a dataset. The dataset may include a set of labeled anomaly nodes (e.g., a set of 5, 10, 15, 30, 50, 100, 200, 300 or more, labeled anomaly nodes), a set of unlabeled anomaly nodes (e.g., a set of 5, 10, 15, 30, 50, 100, 200, 300 or more, unlabeled anomaly nodes), and a set of normal nodes (e.g., a set of 5, 10, 15, 30, 50, 100, 200, 300 or more, normal nodes). For example, graph learning system 102 may receive a dataset including a set of labeled anomaly nodes, a set of unlabeled anomaly nodes, and a set of normal nodes. In some non-limiting embodiments or aspects, graph learning system 102 may receive the dataset from data source 102a, transaction service provider system 104, issuer system 106, and/or user device 108. In some non-limiting embodiments or aspects, a number of nodes of the set of labeled anomaly nodes may be less than a number of nodes of the set of unlabeled anomaly nodes and/or a number of nodes of the set of normal nodes. [0083] In some non-limiting embodiments or aspects, the dataset may include graph data associated with a graph. In some non-limiting embodiments or aspects, the graph may include a plurality of nodes and a plurality of edges, and the graph data may include a plurality of node embeddings associated with a number of nodes in the graph and node data associated with each node of the graph. The node data may include data associated with parameters of each node in the graph. Additionally or alternatively, the node data may include user data associated with a plurality of users and/or entity data associated with a plurality of entities. The plurality of node embeddings may include a first set of node embeddings and/or a second set of node embeddings. The first set of node embeddings may be based on the user data and/or the second set of node embeddings may be based on the entity data.

[0084] In some non-limiting embodiments or aspects, the node data may include data associated with parameters of each node in the graph. In some non-limiting embodiments or aspects, the dataset may be associated with a population of entities (e.g., users, accountholders, merchants, issuers, etc.) that includes a plurality of data instances associated with a plurality of features. In some non-limiting embodiments or aspects, the plurality of data instances may represent a plurality of transactions (e.g., electronic payment transactions) conducted by the population.

[0085] In some non-limiting embodiments or aspects, each data instance may include transaction data associated with the transaction. In some non-limiting embodiments or aspects, the transaction data may include a plurality of transaction parameters associated with an electronic payment transaction. In some non-limiting embodiments or aspects, the plurality of features may represent the plurality of transaction parameters. In some non-limiting embodiments or aspects, the plurality of transaction parameters may include electronic wallet card data associated with an electronic card (e.g., an electronic credit card, an electronic debit card, an electronic loyalty card, and/or the like), decision data associated with a decision (e.g., a decision to approve or deny a transaction authorization request), authorization data associated with an authorization response (e.g., an approved spending limit, an approved transaction value, and/or the like), a PAN, an authorization code (e.g., a personal identification number (PIN), etc.), data associated with a transaction amount (e.g., an approved limit, a transaction value, etc.), data associated with a transaction date and time, data associated with a conversion rate of a currency, data associated with a merchant type (e.g., a merchant category code that indicates a type of goods, such as grocery, fuel, and/or the like), data associated with an acquiring institution country, data associated with an identifier of a country associated with the PAN, data associated with a response code, data associated with a merchant identifier (e.g., a merchant name, a merchant location, and/or the like), data associated with a type of currency corresponding to funds stored in association with the PAN, and/or the like.

[0086] In some non-limiting embodiments or aspects, graph learning system 102 may receive the graph data including a set of nodes represented by V = {vi, V2,...v n } and/or a set of edges represented by E = {ei, e2,...e n } for an attributed network represented by G = (V, E, X). In some non-limiting embodiments or aspects, attribute for each node of the plurality of nodes may be represented by the following matrix, where Xi represents an attribute vector for node vi:

[0087] In some non-limiting embodiments or aspects, the attributed network may be represented by G=(A, X), where an adjacency matrix is represented by A = {0,1 } nxn , where Ai,j = 1 indicates that there is an edge between node Vi and node Vj and where Ai,j = 0 indicates that there is not an edge between node Vi and node Vj.

[0088] In some non-limiting embodiments or aspects, the attributed network represented by G = (A, X), where A and X each represent matrices, may contain a set of labeled anomalies represented by V L .

[0089] As shown in FIG. 3, at step 304, process 300 includes randomly sampling a node for each labeled anomaly node of the set of labeled anomaly nodes. For example, graph learning system 102 may randomly sample a node for each labeled anomaly node of the set of labeled anomaly nodes (e.g., randomly sample as many nodes as the number of nodes in the set of labeled anomaly nodes). In some nonlimiting embodiments or aspects, graph learning system 102 may, for each labeled anomaly node of the set of labeled anomaly nodes, randomly sample a node from among the set of unlabeled anomaly nodes and/or the set of normal nodes to provide a set of randomly sampled nodes.

[0090] In some non-limiting embodiments or aspects, graph learning system 102 may, for each labeled anomaly node of the set of labeled anomaly nodes, sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes according to a Metropolis-Hasting sampling algorithm to provide a set of randomly sampled nodes. For example, graph learning system 102 may determine a probability of sampling a node, represented by u, for a labeled node, represented by v, based on the following equation:

[0091]

[0092] In some non-limiting embodiments or aspects, graph learning system 102 may, for each labeled anomaly node of the set of labeled anomaly nodes, sample a node from amount the set of unlabeled anomaly nodes and the set of normal nodes based on the following algorithm:

[0093] end for

[0094] where u x represents a randomly selected node, u y represents a same of the randomly selected node u x , t e [0,1] represents a randomly generated number uniformity, ge(.) represents a graph encoder, V L represents the set of labeled anomaly nodes, q represents a proposal distribution, N represents a step size, V s represents the set of randomly sampled nodes.

[0095] As shown in FIG. 3, at step 306, process 300 includes generating a plurality of new nodes. For example, graph learning system 102 may generate a plurality of new nodes (e.g., a set of 5, 10, 15, 30, 50, 100, 200, 300 or more, new nodes). In some non-limiting embodiments or aspects, graph learning system 102 may generate a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes. In some non-limiting embodiments or aspects, when generating the plurality of new nodes, graph learning system 102 may generate a label for each new node. The label may be associated with a score indicating how closely the new node represents an abnormal node.

[0096] In some non-limiting embodiments or aspects, when generating the label for each new node, graph learning system 102 may calculate a coefficient for a first new node of the plurality of new nodes based on an embedding of a first labeled anomaly node of the set of labeled anomaly nodes and an embedding of a first randomly sampled node of the set of randomly sampled nodes. In some non-limiting embodiments or aspects, when generating the label for each new node, graph learning system 102 may multiply a predetermined score for an anomaly node of the set of labeled anomaly nodes by the coefficient for the first new node to provide a score for the first new node.

[0097] In some non-limiting embodiments or aspects, when calculating the coefficient for the first new node of the plurality of new nodes, graph learning system 102 may calculate a result of a sigmoid function with the embedding of the first labeled anomaly node and the embedding of the first randomly sampled node as inputs. In some non-limiting embodiments or aspects, when calculating the coefficient for the first new node of the plurality of new nodes, graph learning system 102 may calculate the coefficient for the first new node based on the result of the sigmoid function.

[0098] In some non-limiting embodiments or aspects, graph learning system 102 may determine the measure of uniformity for the dataset based on the number of nodes in the graph and data associated with parameters of each node in the graph. The measure of uniformity may be associated with a number of bits to encode the first set of node embeddings and the second set of node embeddings.

[0099] In some non-limiting embodiments or aspects, when generating the set of new nodes, graph learning system 102 may modify (e.g., remove or delete) one or more nodes of the dataset (e.g., the set of labeled anomaly nodes, the set of unlabeled anomaly nodes, and the set of normal nodes), the labeled anomaly nodes, and/or the set of randomly sampled nodes, to provide the new set of new nodes.

[0100] In some non-limiting embodiments or aspects, when generating the set of new nodes, graph learning system 102 may interpolate node embeddings and labels. When interpolating the node embeddings, graph learning system 102 may perform node generation process (e.g., an adaptive node mixup). For example, for a pair of training samples, (xi, yl) and (xj, yj), where x represents an input feature and y represents a label, graph learning system 102 may obtain a new node represented by based on the following, where A e [0,1] is sampled from a Beta distribution (e.g., Beta (a, a)):

[0101] In some non-limiting embodiments or aspects, when generating the set of new nodes, graph learning system 102 may perform a second node generation process (e.g., a second adaptive node mixup) to adaptively interpolate node features and node labels. For example, when generating the set of new nodes, graph learning system 102 may first, introduce an abnormality adapter represented by A a , with a confined value range [0.5, 1 ], which may preserve the anomalous information, S, and second, graph learning system 102 may regulate the abnormality adapter according to the learned representations of the labeled anomaly and the sampled node. In some non-limiting embodiments or aspects, the abnormality adapter may be calculated based on the following, where Vi is a node selected from the set of labeled anomaly nodes, where Vj is a randomly sampled node, where zi and Zj are the representations of nodes Vi and Vj from the graph neural network (GNN), respectively, and where o(.) represents the sigmoid activation functions:

[0102] In some non-limiting embodiments or aspects, if node Vj is an anomalous node of the set of labeled anomaly nodes (e.g., yj =1 ), then an inner product of the corresponding embeddings (e.g., ZiZj) is a first number and A will approximate 1. Alternatively, if node Vj is a normal node of the set of normal nodes (e.g., yj =0), then the inner product of the corresponding embeddings is a second number smaller than the first number and A will approximate 0.5.

[0103] In some non-limiting embodiments or aspects, graph learning system 102 may perform the second node generation process (e.g., the second adaptive node mixup) using the new abnormality adapter to computer the mixed embedding and label based on the following:

[0104] As shown in FIG. 3, at step 308, process 300 includes combining the plurality of new nodes with the set of labeled anomaly nodes. For example, graph learning system 102 may combine the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes (e.g., a set of 5, 10, 15, 30, 50, 100, 200, 300 or more, combined labeled anomaly nodes).

[0105] As shown in FIG. 3, at step 310, process 300 includes training a machine learning model. For example, graph learning system 102 may train a machine learning model to provide a trained machine learning model. The machine learning model may be trained (e.g., generated, trained, re-trained) to perform a task, such as, predicting an anomaly and/or characterizing distinct behaviors of different types of anomalies. The machine learning model may include a GNN machine learning model configured to provide an output that includes a prediction regarding whether a node of a graph is an anomaly. In some non-limiting embodiments or aspects, graph learning system 102 may train a GNN based on the combined set of labeled anomaly nodes.

[0106] In some non-limiting embodiments or aspects, graph learning system 102 may train a GNN based on a set of graph features to provide a trained GNN. In some non-limiting embodiments or aspects, graph learning system 102 may train a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and a center of the set of normal nodes in the embedding space.

[0107] In some non-limiting embodiments or aspects, graph learning system 102 may calculate the center of the combined set of labeled anomaly nodes in the embedding space. For example, graph learning system 102 may calculate the center of the combined set of labeled anomaly nodes based on an aggregation of embeddings of each labeled anomaly node in the combined set of labeled anomaly nodes. In some non-limiting embodiments or aspects, graph learning system 102 may calculate the center of the set of normal nodes in the embedding space. For example, graph learning system 102 may calculate the center of the set of normal nodes in the embedding space based on an aggregation of embeddings of each node in the set of randomly sampled nodes.

[0108] In some non-limiting embodiments or aspects, graph learning system 102 may train the machine learning model based on a first distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes and/or a second distance associated with an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes. The first distance may include a distance between an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes and the center of the combined set of labeled anomaly nodes in the embedding space. The second distance may include a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the combined set of labeled anomaly nodes and a distance of an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes to a center of the set of normal nodes in the embedding space. [0109] In some non-limiting embodiments or aspects, the trained GNN may include one or more layers (e.g., an input later, one or more hidden layers, an output layer, and/or the like). In some non-limiting embodiments or aspects, the one or more layers may include an intermediate layer. In some non-limiting embodiments or aspects, for the intermediate layer, a representation of a node may be based on the following, where represents an intermediate representation of node Vi at an l=th layer, and where Ni is a set of one-hop neighboring nodes of node vi:

[0110] In some non-limiting embodiments or aspects, the aggregation function represented by AGGREGATE l (.) may be used to integrate information from node Vi and the neighboring nodes of node Vi. For example, graph learning system 102 may integrate information from node Vi and the neighboring nodes of node Vi based on the aggregation function represented by AGGREGATE l (.).. In some non-limiting embodiments or aspects, the update function represented by Update (. ) may be used to apply a nonlinear transformation to the aggregated information from the neighboring nodes of node Vi and an output representation from a previous layer. For example, graph learning system 102 may apply a nonlinear transformation to the aggregated information from the neighboring nodes of node Vi and an output representation from a previous layer based on the update function represented by UPDATE (. ).

[0111] In some non-limiting embodiments or aspects, graph learning system 102 may compute a template of the set of anomalous nodes (e.g., labeled anomaly nodes) represented by V L based on the following: [0112] In some non-limiting embodiments or aspects, graph learning system 102 may compute a template of the set of normal nodes by aggregating the embeddings of the sampled nodes represented by V s based on the following:

[0113] In some non-limiting embodiments or aspects, graph learning system 102 may determine a probability that a training node is assigned to a specific class of nodes (e.g., labeled anomaly nodes, unlabeled anomaly nodes, and/or normal nodes) based on the distance between the training node and the two templates (e.g., the template of the set of anomalous nodes and a template of the set of normal nodes) for the anomalous nodes (e.g., the labeled anomaly nodes) and the template for the normal nodes based on the following:

[0114] In some non-limiting embodiments or aspects, a corresponding training objective for training a machine learning model (e.g., a GNN machine learning model) may be defined based on:

[0115] In some non-limiting embodiments or aspects, graph learning system 102 may perform an action, such as a fraud prevention procedure, a creditworthiness procedure, and/or a recommendation procedure, using a trained machine learning model. In some non-limiting embodiments or aspects, graph learning system 102 may perform a fraud prevention procedure associated with protection of an account of a user (e.g., a user associated with user device 108) based on an output of the trained machine learning model (e.g., an output that includes a prediction regarding whether a node of a graph is an anomaly). For example, if the output of the trained machine learning model indicates that the fraud prevention procedure is necessary, graph learning system 102 may perform the fraud prevention procedure associated with protection of the account of the user. In such an example, if the output of the trained machine learning model indicates that the fraud prevention procedure is not necessary, graph learning system 102 may forego performing the fraud prevention procedure associated with protection of the account of the user. In some non-limiting embodiments or aspects, graph learning system 102 may execute a fraud prevention procedure based on a classification of an input as provided by the machine learning model.

[0116] Referring now to FIGS. 4A-4F, FIGS. 4A-4F are diagrams of a non-limiting embodiment or aspect of implementation 400 relating to a process (e.g., process 300) for generating a machine learning model based on anomaly nodes of a graph. In some non-limiting embodiments or aspects, one or more of the steps of the process may be performed (e.g., completely, partially, etc.) by graph learning system 102 (e.g., one or more devices of graph learning system 102). In some non-limiting embodiments or aspects, one or more of the steps of the process may be performed (e.g., completely, partially, etc.) by another device or a group of devices separate from or including graph learning system 102 (e.g., one or more devices of graph learning system 102), transaction service provider system 104 (e.g., one or more devices of transaction service provider system 104), issuer system 106 (e.g., one or more devices of issuer system 106), and/or user device 108.

[0117] As shown by reference number 405 in FIG. 4A, graph learning system 102 may receive a dataset that includes a set of labeled anomaly nodes (e.g., a set of nodes labeled as anomalies), a set of unlabeled anomaly nodes (e.g., a set of nodes that are unlabeled but are potential anomalies), and a set of normal nodes (e.g., a set of nodes labeled as normal, such that the set of nodes are not anomalies) from data source 102a. In some non-limiting embodiments or aspects, the dataset may be associated with graph data that is with regard to a graph, and the graph data may include node data associated with a plurality of nodes of the graph and/or edge data associated with a plurality of edges of the graph.

[0118] As shown by reference number 410 in FIG. 4B, graph learning system 102 may randomly sample a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide a set of randomly sampled nodes for each labeled anomaly node of the set of labeled anomaly nodes. In some non-limiting embodiments or aspects, when randomly sampling a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide the set of randomly sampled nodes, graph learning system 102 may, for each labeled anomaly node of the set of labeled anomaly nodes, sample a node from among the set of unlabeled anomaly nodes according to a Metropolis-Hasting sampling algorithm to provide a set of randomly sampled nodes. Additionally or alternatively, when randomly sampling a node from among the set of unlabeled anomaly nodes and the set of normal nodes to provide the set of randomly sampled nodes, graph learning system 102 may sample a node from among the set of normal nodes according to the Metropolis-Hasting sampling algorithm to provide the set of randomly sampled nodes.

[0119] As shown by reference number 415 in FIG. 4C, graph learning system 102 may generate a plurality of new nodes. In some non-limiting embodiments or aspects, graph learning system 102 may generate a plurality of new nodes based on the set of labeled anomaly nodes and the set of randomly sampled nodes. In some non-limiting embodiments or aspects, graph learning system 102 may generate an embedding of a sampled node (e.g., a labeled anomaly node, an unlabeled anomaly node, or a normal node) and determine a node that is a nearest neighbor of the sampled node. In some non-limiting embodiments or aspects, graph learning system 102 may generate an embedding of the node that is a nearest neighbor of the sampled node, and graph learning system 102 may generate an embedding based on the embedding of a sampled node and the embedding of the node that is a nearest neighbor of the sampled node to provide an embedding of a new node. In some non-limiting embodiments or aspects, graph learning system 102 may generate a new node based on the embedding of a new node. For example, graph learning system 102 may generate a new node based on using a encoder or a decoder (e.g., an encoder or a decoder used to generate the embedding of a sampled node and the embedding of the node that is a nearest neighbor of the sampled node) to produce the new node. In some non-limiting embodiments, graph learning system 102 may repeat this process to generate a new node based on a plurality of sampled nodes (e.g., each sampled node, all of the sampled nodes, a subset of all of the sampled nodes, etc.) of the set of randomly sampled nodes.

[0120] As further shown by reference number 420 in FIG. 4D, graph learning system 102 may generate a label for each new node. In some non-limiting embodiments or aspects, when generating the label for each new node of the plurality of new nodes, graph learning system 102 may calculate a coefficient for a first new node of the plurality of new nodes based on an embedding of a first labeled anomaly node of the set of labeled anomaly nodes and an embedding of a first randomly sampled node of the set of randomly sampled nodes and multiply a predetermined score (e.g., a predetermined anomaly score) for an anomaly node of the set of labeled anomaly nodes by the coefficient for the first new node to provide a score (e.g., an anomaly score) for the first new node. In some non-limiting embodiments or aspects, graph learning system 102 may compare the score for the first new node to a threshold value to determine a label for the first new node. In some non-limiting embodiments or aspects, if the score of the first new node satisfies the threshold value, graph learning system 102 may generate a first label (e.g., a positive label, such as a label that indicates a node is an anomaly) for the first new node. If the score of the first new node does not satisfy the threshold value, graph learning system 102 may generate a second label (e.g., a negative label, such as a label that indicates a node is not an anomaly) for the first new node. In some non-limiting embodiments or aspects, the first label is a positive label of a binary classification, and the second label is a negative label of the binary classification, or vice versa. In some non-limiting embodiments or aspects, graph learning system 102 may repeat the above process for each new node of the plurality of new nodes (e.g., each new node of the plurality of new nodes that does not have a label).

[0121] As shown by reference number 425 in FIG. 4E, graph learning system 102 may combine the plurality of new nodes with the set of labeled anomaly nodes to provide a combined set of labeled anomaly nodes. For example, graph learning system 102 may add the plurality of new nodes with the set of labeled anomaly nodes to provide the combined set of labeled anomaly nodes.

[0122] As shown by reference number 430 in FIG. 4F, graph learning system 102 may train a Graph Neural Network (GNN) machine learning model based on an embedding. In some non-limiting embodiments or aspects, graph learning system 102 may train a machine learning model based on an embedding of each labeled anomaly node in the combined set of labeled anomaly nodes, a center of the combined set of labeled anomaly nodes in an embedding space, and/or a center of the set of normal nodes in the embedding space.

[0123] Although the disclosed subject matter has been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred embodiments or aspects, it is to be understood that such detail is solely for that purpose and that the disclosed subject matter is not limited to the disclosed embodiments or aspects, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the presently disclosed subject matter contemplates that, to the extent possible, one or more features of any embodiment or aspect can be combined with one or more features of any other embodiment or aspect.