Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
TIME-SERIES ANOMALY DETECTION FOR HEALTHCARE
Document Type and Number:
WIPO Patent Application WO/2024/074874
Kind Code:
A1
Abstract:
A computing device may receive biological signal data of an infant, the biological signal data measured from one or more sensors including near-infrared spectroscopy (NIRS) device, an electroencephalogram (EEG) device, and/or a multiparametric device. A computing device may generate one or more time series of the biological signal data. A computing device may input the one or more time series into a machine learning model. A computing device may generate a prediction by the machine learning model on whether the infant has an elevated risk of brain injury.

Inventors:
VARIANE GABRIEL (US)
Application Number:
PCT/IB2022/059579
Publication Date:
April 11, 2024
Filing Date:
October 06, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
PBSF INC (US)
VARIANE GABRIEL (US)
International Classes:
A61B5/00; G16H50/20; G16H50/50
Domestic Patent References:
WO2021202661A12021-10-07
Foreign References:
US20190246989A12019-08-15
US20170196497A12017-07-13
US20220139543A12022-05-05
US20190159675A12019-05-30
Attorney, Agent or Firm:
TSANG, Fredrick (US)
Download PDF:
Claims:
Claims:

1. A computer-implemented method, comprising: receiving biological signal data of an infant, the biological signal data measured from one or more sensors including near-infrared spectroscopy (NIRS) device, an electroencephalogram (EEG) device, and/or a multiparametric device; generating one or more time series of the biological signal data; inputting the one or more time series into a machine learning model; and generating a prediction by the machine learning model on whether the infant has an elevated risk of brain injury.

2. The computer-implemented method of claim 1, wherein the infant is a neonatal who is bom prematurely and is hospitalized.

3. The computer-implemented method of claim 1, wherein the elevated risk of brain injury is represented as a chance of having a seizure.

Description:
TIME-SERIES ANOMALY DETECTION FOR HEALTHCARE

[0001] Healthcare is one of the domains which has witnessed significant growth in the application of machine learning approaches. For instance, ICUs (Intensive Care Unit) evolves in recent years due to technological advances such as the widespread adoption of biosensors. The amount of time series data generated in healthcare is growing fast and so is the need for methods that can analyze these data. Detecting anomalies is one of the challenges that could provide meaningful insights for clinical medicine. In some embodiments, a unsupervised multivariate time-series anomaly detection may be used, which could be used for detecting anomalies on multiple signals. In general, multiple univariate time series from the same device (or more generally, an entity) forms a multivariate time series. It is preferred to analyze time series at the entity level directly using multivariate time series, rather than at any individual univariate time series. In practice, the health status of an entity cannot be reflected by any individual signal. The motivation to apply time-series anomaly detection for healthcare can be concluded as the following:

[0002] The Neonatal Intensive Care Unit is a place where life-altering decisions are constantly made in order to provide care to ill patients such as infants below 1,000g. Neonatologists compile data from a variety of sources to build a picture of a newborn’s health condition in order to ensure they receive the best medical care available. Highly trained experts use their judgment in conjunction with a continuous stream of patient data to assure that the best potential outcome is achieved for the largest number of newborns. The use of Al could enhance the decision-making process and lead to better outcomes.

[0003] The high-frequency EEG (200Hz)/ECG(150Hz) signals coming from medical equipment are hard to analyze manually. Moreover, those signals could be correlated. There is no intuitive way to capture abnormal events through those signals. Monitoring those signals automatically can help medics treat people in time.

[0004] Multimodal patient data can be collected when separate monitoring devices are linked to a centrally placed server through a network. The technique retains the benefit of synchronized timing and adds the use of primarily automatic capture, requiring minimum human input to start and stop the recording.

[0005] A multimodal approach can combine brain monitoring techniques with other vital signs and clinical information to study systemic and cerebral hemodynamics and electrographic findings early after birth, allowing a better understanding of the critically ill infant physiology.

[0006] Rule-based anomaly detection could be ineffective as there are plenty of signals to be monitored. Abnormal patterns are hard to describe with rules. Even they can be detected by rules, we need to modify existing rules and add more rules to make them effective.

[0007] In healthcare, most of the data are unlabeled or with sparse labels. Therefore, unsupervised learning is preferred. On the feature engineering side, effective representation on time-series can be generic for detecting anomalies on different diseases compared with features defined manually.

[0008] Compared with image and video signals, time-series signals can be obtained much easier. Analyzing time-series signals to capture anomalies will be more convenient for diagnosing.

NEWBORN BRAIN INJURY

[0009] Despite recent advances in perinatal care, adverse outcomes for babies at high risk of brain injury continue to be prevalent, providing a challenge for neonatal care and public health.

[0010] The most often seen illness are neonatal encephalopathy (NE) and perinatal stroke in term newborns, as well as the consequences of germinal matrix-intraventricular hemorrhage (IVH) and white matter damage (WMI) in extremely low birth weight preterm infants. All of these babies are at an increased risk of acquiring neurological impairments such as cerebral palsy, cognitive delay, epilepsy, and others. Epidemiological studies estimate that over one million infants per year will have important neurological deficits.

NEONATAL SEIZURES DETECTION CASE ANALYSIS

[0011] The neonatal period, particularly the first week after birth, is the most susceptible time of human life for seizure development. Seizures are commonly related to acute brain injury, are associated with increased mortality and impairment, and may constitute a neurological emergency, making monitoring and seizure accurate identification critical components of newborn intensive care management. Neonatal seizures occur in 1 to 3 per 1,000 live births, with substantially higher rates reported in premature neonates.

[0012] However, seizure detection in the newborn may be considered a challenge. About 80% of neonatal seizures are not correlated with any clinical signs. Moreover, clinically suspected episodes may not show corresponding electrographic evidence of seizures and may be wrongly diagnosed. Previous studies have shown that treating subclinical seizures is associated with reduced seizure burden and better neurological outcomes. Seizure overdiagnosis and treatment is potentially harmful to the developing brain as well.

[0013] Conventional continuous long-term EEG (cEEG) is considered the gold standard for neonatal seizure detection. However, there are barriers to the implementation of this technology since cEEG requires skilled technologists, experienced neurologist interpretation and may not be readily available in many centers, especially regarding continuous bedside monitoring. Continuous monitoring using a one- to two-channel amplitude-integrated EEG (aEEG) with concurrent unprocessed EEG has become an interesting option to be used as a bedside tool for monitoring neonates at high risk. Several studies and reviews have accessed the accuracy of aEEG for seizure detection. Seizure activity is characterized in aEEG by a sudden change in background activity as an abrupt rise in minimum and maximum aEEG amplitudes correlated with a stereotypical, repeating form such as spikes or sharp waves, often with high amplitude in raw EEG, with a total duration of at least ten seconds.

[0014] In FIG. 1 , we demonstrate the neonatal seizure moment.

[0015] In some embodiments, a Brain Monitoring and Neuroprotection strategies for infants at high risk on a large scale is implemented. In some embodiments, the system promotes longitudinal training and homogeneity of care by the use of standard internationally validated protocols. In some embodiments, the system uses technology and cloud computing to reach remote centers (nationally and internationally), granting specialized assistance and improving quality of care by reducing distances and breaking frontiers. In some embodiments, the system concentrates experience in order to analyze a large amount of data and use Al to create earlier diagnostic tools and new treatment algorithms. The system may store EEG in a frequency of 200Hz (ca 17 million data points, daily, per baby), video fragments for the video EEG, and data from other vital signs such as heart rate, pulse oximetry, temperature, blood pressure and regional tissue oxygenation every 5 seconds. The system may utilize multivariate time-series anomaly detection on neonatal seizures detection.

DATA FLOW

[0016] FIG. 2 illustrates how data has been transformed from multiple devices to the cloud service for training. Three types of sources have been used for collecting signal: Nearinfrared spectroscopy (NIRS) Device, GE 0mni700 device and EEG. Signals collected cover EEG, heart rate, temperature, pulse oximetry and regional tissue oxygen saturation levels. Those collected signals can be sent to cloud service and then stored for training. Users can manage models on the cloud and export the model for local inference. In the local box the collected signals will be detected with the trained model. Alert will be sent once there're anomalies detected.

How THE MODEL WORKS

[0017] Seizure moment (FIG. 1) may be defined as a segment of values. This segment should have different pattern with normal segments. Usually, a real seizure could last more than 10 seconds, and it could be inaccurate if we use single timestamp value to judge seizures as normal behaviors like blinking also triggers spike values in EEG signal and a single timestamp within the seizure moment could be normal. In seizure detection, we learn timeseries representation with contrastive learning and then build classifier with the sparse label. The learning framework could be seen in FIG. 3.

INITIAL RESULTS

[0018] Initial experiment has been evaluated on 7 newborn's data. While the representation and classifier has been trained on a couple of segments which have been collected with different babies. And then we generate representations with the trained models on the 7 newborns and calculate the possibility of each segment to be abnormal ones. Results could be seen in the following table.

[0019] As an initial result, we could achieve good results on some cases, but we need further improvement to make it more robust.

EXAMPLE MACHINE LEARNING MODELS

[0020] In various embodiments, a wide variety of machine learning techniques may be used. Examples include different forms of supervised learning, unsupervised learning, and semi-supervised learning such as decision trees, support vector machines (SVMs), regression, Bayesian networks, and genetic algorithms. Deep learning techniques such as neural networks, including convolutional neural networks (CNN), recurrent neural networks (RNN) and long short-term memory networks (LSTM), transformers, attention models, generative adversial networks (GANs) may also be used. For example, various machine learning models may be used to predict whether an infant has an elevated risk of brain injury based on the time series of data of an infant.

[0021] In various embodiments, the training techniques for a machine learning model may be supervised, semi-supervised, or unsupervised. In supervised learning, the machine learning models may be trained with a set of training samples that are labeled. For example, for a machine learning model trained to predict if an infant has an elevated risk of brain injury, the training samples may be time series of data of various infants who were known to have brain injury or normal infants that serve as control samples. The labels for each training sample may be binary or multi-class. In binary label, the label may be whether an infant had and did not have brain injury. In multi-class labels, labels may be used to indicate which type of brain injury may be resulted. In training a machine learning model for identifying brain injury, the training samples may be data of other infants. In some cases, an unsupervised learning technique may be used. The samples used in training are not labeled. Various unsupervised learning technique such as clustering may be used. For example, the data of infants that have elevated risks of brain injury may follow certain patterns and may be clustered together by an unsupervised learning technique. In some cases, the training may be semi-supervised with training set having a mix of labeled samples and unlabeled samples.

[0022] A machine learning model may be associated with an objective function, which generates a metric value that describes the objective goal of the training process. For example, the training may intend to reduce the error rate of the model in predicting whether the infants in the training samples had brain injury. In such a case, the objective function may monitor the error rate of the machine learning model. Such an objective function may be called a loss function. Other forms of objective functions may also be used, particularly for unsupervised learning models whose error rates are not easily determined due to the lack of labels. In transaction prediction, the objective function may correspond to the difference between the model’s predicted outcomes and the manually recorded outcomes in the training sets. In various embodiments, the error rate may be measured as cross-entropy loss, LI loss (e.g., the sum of absolute differences between the predicted values and the actual value), L2 loss (e.g., the sum of squared distances). [0023] Referring to FIG. 3, a structure of an example neural network is illustrated, in accordance with some embodiments. The neural network 300 may receive an input and generate an output. The neural network 300 may include different kinds of layers, such as convolutional layers, pooling layers, recurrent layers, full connected layers, and custom layers. A convolutional layer convolves the input of the layer (e.g., one or more time series) with one or more kernels to generate different types of images that are filtered by the kernels to generate feature maps. Each convolution result may be associated with an activation function. In some embodiments, a pair of convolutional layer may be followed by a recurrent layer that includes one or more feedback loop. The feedback may be used to account for spatial relationships of the features in text or temporal relationships of objects. The layers and may be followed in multiple fully connected layers that have nodes connected to each other. The fully connected layers may be used for classification and object detection. In one embodiment, one or more custom layers may also be presented for the generation of a specific format of output. Recurrent layers may be used to analyze the temporal relationships of the time series of data.

[0024] The order of layers and the number of layers of the neural network 300 may vary in different embodiments. In various embodiments, a neural network 300 includes one or more layers 302, 304, and 306, but may or may not include any pooling layer or recurrent layer. If a pooling layer is present, not all convolutional layers are always followed by a pooling layer. A recurrent layer may also be positioned differently at other locations of the CNN. For each convolutional layer, the sizes of kernels (e.g., 3x3, 5x5, 7x7, etc.) and the numbers of kernels allowed to be learned may be different from other convolutional layers. [0025] A machine learning model may include certain layers, nodes, kernels and/or coefficients. Training of a neural network, may include forward propagation and backpropagation. Each layer in a neural network may include one or more nodes, which may be fully or partially connected to other nodes in adjacent layers. In forward propagation, the neural network performs the computation in the forward direction based on outputs of a preceding layer. The operation of a node may be defined by one or more functions. The functions that define the operation of a node may include various computation operations such as convolution of data with one or more kernels, pooling, recurrent loop in RNN, various gates in LSTM, etc. The functions may also include an activation function that adjusts the weight of the output of the node. Nodes in different layers may be associated with different functions. [0026] Each of the functions in the neural network may be associated with different coefficients (e.g. weights and kernel coefficients) that are adjustable during training. In addition, some of the nodes in a neural network may also be associated with an activation function that decides the weight of the output of the node in forward propagation. Common activation functions may include step functions, linear functions, sigmoid functions, hyperbolic tangent functions (tanh), and rectified linear unit functions (ReLU). After an input is provided into the neural network and passes through a neural network in the forward direction, the results may be compared to the training labels or other values in the training set to determine the neural network’s performance. The process of prediction may be repeated for other data in the training sets to compute the value of the objective function in a particular training round. In turn, the neural network performs backpropagation by using gradient descent such as stochastic gradient descent (SGD) to adjust the coefficients in various functions to improve the value of the objective function.

[0027] Multiple rounds of forward propagation and backpropagation may be iteratively performed. Training may be completed when the objective function has become sufficiently stable (e.g., the machine learning model has converged) or after a predetermined number of rounds for a particular set of training samples. The trained machine learning model can be used for performing prediction or another suitable task for which the model is trained.

COMPUTING MACHINE ARCHITECTURE

[0028] FIG. 4 is a block diagram illustrating components of an example computing machine that is capable of reading instructions from a computer-readable medium and execute them in a processor. A computer described herein may include a single computing machine shown in FIG. 4, a virtual machine, a distributed computing system that includes multiples nodes of computing machines shown in FIG. 4, or any other suitable arrangement of computing devices.

[0029] By way of example, FIG. 4 shows a diagrammatic representation of a computing machine in the example form of a computer system 400 within which instructions 424 (e.g., software, program code, or machine code), which may be stored in a computer- readable medium for causing the machine to perform any one or more of the processes discussed herein may be executed. In some embodiments, the computing machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.

[0030] The structure of a computing machine described in FIG. 4 may correspond to any software, hardware, or combined components shown in FIG. 1, including but not limited to, the user device 110, the application publisher server 120, the access control server 130, a node of a blockchain network, and various engines, modules interfaces, terminals, and machines in various figures. While FIG. 4 shows various hardware and software elements, each of the components described in FIG. 1 may include additional or fewer elements.

[0031] By way of example, a computing machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a smartphone, a web appliance, a network router, an internet of things (loT) device, a switch or bridge, or any machine capable of executing instructions 424 that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute instructions 424 to perform any one or more of the methodologies discussed herein.

[0032] The example computer system 400 includes one or more processors (generally, processor 402) (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application-specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), a main memory 404, and a static memory 406, which are configured to communicate with each other via a bus 408. The computer system 400 may further include graphics display unit 410 (e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, or a cathode ray tube (CRT)). The computer system 400 may also include alphanumeric input device 412 (e.g., a keyboard), a cursor control device 414 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 416, a signal generation device 418 (e.g., a speaker), and a network interface device 420, which also are configured to communicate via the bus 408.

[0033] The storage unit 416 includes a computer-readable medium 422 on which is stored instructions 424 embodying any one or more of the methodologies or functions described herein. The instructions 424 may also reside, completely or at least partially, within the main memory 404 or within the processor 402 (e.g., within a processor’s cache memory) during execution thereof by the computer system 400, the main memory 404 and the processor 402 also constituting computer-readable media. The instructions 424 may be transmitted or received over a network 426 via the network interface device 420. [0034] While computer-readable medium 422 is shown in an example embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 424). The computer-readable medium may include any medium that is capable of storing instructions (e.g., instructions 424) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein. The computer-readable medium may include, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media. The computer-readable medium does not include a transitory medium such as a signal or a carrier wave.

ADDITIONAL CONFIGURATION CONSIDERATIONS

[0035] Beneficially, with various embodiments described in this disclosure, in a cryptographically proofed, cost-efficient way, smart contract (or other Web3 application) owners could add an interface to their applications to have control over the applications after being deployed to the blockchain. In addition, the application publishers could also apply security technologies to control the applications in real-time. Since the interactions would be vetted and signed by the access control system before the interaction request reaches the application on the blockchain, the access control server can block and prevent malicious or unwanted actions.

[0036] Certain embodiments are described herein as including logic or a number of components, engines, modules, or mechanisms. Engines may constitute either software modules (e.g., code embodied on a computer-readable medium) or hardware modules. A hardware engine is a tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware engines of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware engine that operates to perform certain operations as described herein.

[0037] In various embodiments, a hardware engine may be implemented mechanically or electronically. For example, a hardware engine may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware engine may also comprise programmable logic or circuitry (e.g., as encompassed within a general -purpose processor or another programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware engine mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.

[0038] The various operations of example methods described herein may be performed, at least partially, by one or more processors, e.g., processor 402, that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented engines that operate to perform one or more operations or functions. The engines referred to herein may, in some example embodiments, comprise processor- implemented engines.

[0039] The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.

[0040] Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for a similar system or process through the disclosed principles herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes, and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.