Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR IMPROVED CLASSIFICATION
Document Type and Number:
WIPO Patent Application WO/2024/083348
Kind Code:
A1
Abstract:
A data classification system is proposed for classifying data. The data classification comprises a heuristic input; and a processor in communication with the heuristic input, wherein the processor is configured to: receive a first data classification of a datum from an initial data classification system; receive one or more heuristics from the heuristic input; and generate a second data classification of the datum based on the first data classification and the one or more heuristics.

Inventors:
MUCKLEY LEO (IE)
LOOMBA RADHIKA (IE)
Application Number:
PCT/EP2022/084459
Publication Date:
April 25, 2024
Filing Date:
December 05, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
EATON INTELLIGENT POWER LTD (IE)
International Classes:
G06N5/022; G06N20/00
Other References:
JULIO VILLENA-ROMÁN; COLLADA-PÉREZ SONIA; LANA-SERRANO SARA; GONZÁLEZ-CRISTÓBAL JOSÉ C; SPAIN S A: "Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization Daedalus ? Data, Decisions and Language", TWENTY-FOURTH INTERNATIONAL FLAIRS CONFERENCE, 18 May 2011 (2011-05-18), pages 323 - 328, XP055616968
BROOKS J P ET AL: "Conjecturing-Based Computational Discovery of Patterns in Data", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 23 November 2020 (2020-11-23), XP081819264
Attorney, Agent or Firm:
NOVAGRAAF TECHNOLOGIES (FR)
Download PDF:
Claims:
CLAIMS

1. A data classification system comprising: a heuristic input; and a processor in communication with the heuristic input, wherein the processor is configured to: receive a first data classification of a datum from an initial data classification system; receive one or more heuristics from the heuristic input; and generate a second data classification of the datum based on the first data classification and the one or more heuristics.

2. The system of claim 1, wherein the heuristic input comprises an input device having a user interface configured to facilitate an input from a user, the input providing one or more heuristics.

3. The system of claim 1 or claim 2, wherein the heuristic input comprises a heuristic database comprising one or more predetermined heuristics.

4. The system of any preceding claim, wherein the processor is configured to generate the second data classification by: determining that a heuristic of the one or more heuristics is applicable to the first data classification; and applying the determined heuristic to the first data classification.

5. The system of claim 4, wherein the processor is further configured to: determine that a further heuristic of the one or more heuristics is applicable to the second data classification; and apply the determined further heuristic to the second data classification.

6. The system of any preceding claim, wherein the processor is further configured to: receive a set of first data classifications from the initial classification system; and generate a set of second data classifications based on the set of first data classifications and the one or more heuristics.

7. The system of claim 6, wherein the processor is configured to generate the set of second data classifications by:

(a) preparing the set of first data classifications;

(b) determining that one or more first heuristics are applicable to one or more corresponding first data classifications of the set of first data classifications;

(c) generating a set of intermediate data classifications by applying the one or more first heuristics to the one or more corresponding first data classifications of the set of first data classifications, thereby generating a set of intermediate data classifications; and

(d) determining that one or more further heuristics are applicable to the set of intermediate data classifications;

(e) iterating steps (c) and (d) using the set of intermediate data classifications in place of the set of first data classifications until it is determined that no further heuristics are applicable in step (d); and

(f) outputting the set of intermediate data classifications as the set of second data classifications.

8. A data classification system comprising: a data source; a heuristic input; and a processor in communication with the data source and the heuristic input, wherein the processor is configured to: receive an input datum from the data source; generate a first data classification of the datum; receive one or more heuristics from the heuristic input; and generate a second data classification of the datum based on the first data classification and the one or more heuristics. The system of Claim 8, wherein the processor is further configured to: receive a set of input data from the data source; generate a set of first data classifications corresponding to the set of input data; and generate a set of second data classifications of the set of input data based on the set of first data classifications and the one or more heuristics. The system of claim 9, wherein the processor is configured to generate the set of second data classifications by:

(a) preparing the set of first data classifications;

(b) determining that one or more first heuristics are applicable to one or more corresponding first data classifications of the set of first data classifications;

(c) generating a set of intermediate data classifications by applying the one or more first heuristics to the one or more corresponding first data classifications of the set of first data classifications, thereby generating a set of intermediate data classifications; and

(d) determining that one or more further heuristics are applicable to the set of intermediate data classifications; (e) iterating steps (c) and (d) using the set of intermediate data classifications in place of the set of first data classifications until it is determined that no further heuristics are applicable in step (d); and

(f) outputting the set of intermediate data classifications as the set of second data classifications. A data classification method carried out by a processor, comprising the steps of: receiving, from an initial data classification system, a first data classification of a datum; receiving, from a heuristic input, one or more heuristics; and generating a second data classification of the datum based on the first data classification and the one or more heuristics. The method of claim 11, wherein the second data classification is generated by: determining that a heuristic of the one or more heuristics is applicable to the first data classification; and applying the determined heuristic to the first data classification. The method of claim 12, further comprising the steps of: determining that a further heuristic of the one or more heuristics is applicable to the second data classification; and applying the determined further heuristic to the second data classification. The method of any of claims 11 to 13, further comprising the steps of: receiving a set of first data classifications from the initial classification system; and generating a set of second data classifications based on the set of first data classifications and the one or more heuristics. The method of claim 14, wherein the set of second data classifications is generated by:

(a) preparing the set of first data classifications;

(b) determining that one or more first heuristics are applicable to one or more corresponding first data classifications of the set of first data classifications;

(c) generating a set of intermediate data classifications by applying the one or more first heuristics to the one or more corresponding first data classifications of the set of first data classifications, thereby generating a set of intermediate data classifications; and

(d) determining that one or more further heuristics are applicable to the set of intermediate data classifications;

(e) iterating steps (c) and (d) using the set of intermediate data classifications in place of the set of first data classifications until it is determined that no further heuristics are applicable in step (d); and

(f) outputting the set of intermediate data classifications as the set of second data classifications.

Description:
SYSTEM AND METHOD FOR IMPROVED CLASSIFICATION

Field of the invention

The present invention relates to a system and method for improving the classification of input data.

Background to the invention

There are broadly two types of systems used to embed artificial intelligence. The first type of system is an inference-based, or rule-based, system, wherein an objective of the system is to create a direct mapping between an input, such as data representative of a real-world item, and an output, such as an item classification. The second type of system is a learningbased system, wherein an objective of the system is to learn patterns from input data. An example learning-based system may utilise probabilistic modelling.

A problem present with these two types of systems is that domain expertise may be difficult to integrate. Furthermore, it may be difficult to implement domain adaptation.

The inference-based system may utilise a set of rules to produce a pre-defined outcome based on the set of rules. This type of system may be highly dependent on context provided in a machine-readable format, and may have limited applicability. The probability of achieving a direct mapping between an input and an output may be low for one context, but high for another context. This probability inconsistency may lead to an overall reduced ease of use with this system.

The learning-based system may define its own set of rules that are based on data outputs, and may gradually develop and adapt over time in accordance with training data provided to it. However, this type of system may not include functionality for integrating domain expertise specific to a problem that the system aims to solve. Instead, the learning-based system may “learn” relevant information over time. The system’s efficacy may be relatively low whilst the system is learning.

The present disclosure has been devised to mitigate at least some of the above-mentioned problems.

Summary of the invention

In accordance with a first aspect of the present disclosure, there is provided a classification system comprising: a heuristic input; and a processor in communication with the heuristic input, wherein the processor is configured to: receive a first data classification of a datum from an initial data classification system; receive one or more heuristics from the heuristic input; and generate a second data classification of the datum based on the first data classification and the one or more heuristics.

The term ‘heuristic input’ may be understood as a means for providing one or more heuristics to the system. The term ‘heuristic’ may be understood as a domain-specific rule or technique that may facilitate a context-specific decision or classification.

The term ‘initial data classification system’ may be understood as a system configured to provide a first data classification of an input datum. The initial data classification system may utilize any suitable machine learning classification model including, but not limited to, a probabilistic model; a rule-based model; a learning-based model; or a pre-trained model. Further example machine learning classification models include Bayesian Networks for probabilistic modelling; Frequent-Pattern Growth algorithm for rule-based pattern mining; Decision Tree classifier for a learning-based model; and a pre-trained Deep Learning model, wherein the pre-trained Deep Learning model is pre-trained on a different dataset.

The term ‘first data classification’ may be understood as an initial classification of an input datum provided by the initial data classification system. The first data classification may be a classification on an input datum before context-specific classification occurs. Accordingly, the first data classification may be a classification based on data that may not provide enough information to provide a classification with a high or acceptable degree of confidence or level of detail.

The one or more heuristics may be used by the system to generate the second data classification based on the first data classification and the one or more heuristics. The present system may therefore apply a heuristic-based approach built using contextual postprocessing rules. In particular, the system may seamlessly integrate accumulated domain expertise in a highly accessible manner, without the need for additional model re-training.

The present system may be relevant for multiple domains, including but not limited to: creating rules for the purpose of classifying supply chain transactions; integrating rules into the generation and understanding of natural language; and model selection and weight selection in weighted model ensembles, such as symptom mapping for health and/or medical based application.

An advantage of this system may be the improvement of a plurality of key performance indicators, such as providing more detailed classifications, generating language with an advanced level of vocabulary, and increasing classification accuracy. Furthermore, the present system may improve a response time for reaching a particular output using the added context or heuristic without negatively impacting computation time or computation complexity.

Accordingly, the present system may provide a means for improving model efficacy for context-specific modelling, in both inference-based systems and learning-based systems.

In some embodiments, the heuristic input comprises an input device having a user interface configured to facilitate an input from a user, the input providing one or more heuristics. In this way, the one or more heuristics may be input by a user, such as a medical professional, via the heuristic input. Advantageously, the heuristic input may provide an improved means for providing the one or more heuristics to the system.

Alternatively or additionally, the heuristic input comprises a heuristic database comprising one or more predetermined heuristics. Alternatively or additionally, the heuristic input may be hard coded into a codebase of the system, or read from a configuration file. The term ‘predetermined heuristic’ may be understood as a heuristic that has been pre-made. In this way, the heuristic database may provide a historic catalogue of heuristics that the system may utilize. Advantageously, as more heuristics are added to the heuristic database, the system may further increase in classification efficacy.

Preferably, the second data classification is configured to generate the second data classification by: determining that a heuristic of the one or more heuristics is applicable to the first data classification; and applying the determined heuristic to the first data classification. For example, if an input data is representative of a screw, the first data classification of the screw may be ‘fasteners’, and the system may determine that the input data and first data classification match a predetermined heuristic. The system may then apply the predetermined heuristic such that a second data classification is generated, for example ‘turned parts’.

In some embodiments, the processor is further configured to: determine that a further heuristic of the one or more heuristics is applicable to the second data classification; and apply the determined further heuristic to the second data classification. In this way, in cases where the context changes after a heuristic is applied, the system may determine further heuristics that are applicable to the second data classification after said context has changed. Advantageously, the system may provide an improved classification efficacy.

In some embodiments, the processor is further configured to: receive a set of first data classifications from the initial classification system; and generate a set of second data classifications based on the set of first data classifications and the one or more heuristics. In this way, a first data classification may be generated for each input datum of the set of input data, and a second data classification may also be generated for each input datum of the set of input data. Advantageously, the system may provide an improved classification efficacy for a plurality of data.

In some embodiments, the processor is configured to generate the set of second data classifications by: (a) preparing the set of first data classifications; (b) determining that one or more first heuristics are applicable to one or more corresponding first data classifications of the set of first data classifications; (c) generating a set of intermediate data classifications by applying the one or more first heuristics to the one or more corresponding first data classifications of the set of first data classifications, thereby generating a set of intermediate data classifications; and (d) determining that one or more further heuristics are applicable to the set of intermediate data classifications; (e) iterating steps (c) and (d) using the set of intermediate data classifications in place of the set of first data classifications until it is determined that no further heuristics are applicable in step (d); and (f) outputting the set of intermediate data classifications as the set of second data classifications.

Advantageously, the present system may provide the technical benefit of being more computationally efficient. In an example of an alternative learning method, new domain specific information/heuristics could be added to a training feature set prior to applying the learning algorithm. The addition of new features to a learning procedure may make the method less computationally efficient as additional parameters needs to be learned. However, in the proposed system, no additional features are added and therefore no additional parameters need to be learned. The additional step (i.e., the heuristic-based step) in the proposed system may be more computationally efficient than the optimisation of additional features in the alternative learning method.

In accordance with a second aspect of the present disclosure, there is provided a data classification system comprising: a data source; a heuristic input; and a processor in communication with the data source and the heuristic input, wherein the processor is configured to: receive an input datum from the data source; generate a first data classification of the datum; receive one or more heuristics from the heuristic input; and generate a second data classification of the datum based on the first data classification and the one or more heuristics.

The term ‘data source’ may be understood as a source of input data. The data source may be a database of input data to be classified. For example, the data source may be a data library comprising inventory data corresponding to real-world objects, such as fasteners, electrical wires, and other real-world objects. The processor may import the data from the data source for classification. In this way, the system of the second aspect may provide a means for generating a first data classification, and subsequently generating a second data classification.

In some embodiments, the processor is configured to: receive a set of input data from the data source; generate a set of first data classifications corresponding to the set of input data; and generate a set of second data classifications of the set of input data based on the set of first data classifications and the one or more heuristics. In this way, a first data classification may be generated for each input datum of the set of input data, and a second data classification may also be generated for each input datum of the set of input data. Advantageously, the system may provide an improved classification efficacy for a plurality of data.

In some embodiments, the processor is configured to generate the set of second data classifications by: (a) preparing the set of first data classifications; (b) determining that one or more first heuristics are applicable to one or more corresponding first data classifications of the set of first data classifications; (c) generating a set of intermediate data classifications by applying the one or more first heuristics to the one or more corresponding first data classifications of the set of first data classifications, thereby generating a set of intermediate data classifications; and (d) determining that one or more further heuristics are applicable to the set of intermediate data classifications; (e) iterating steps (c) and (d) using the set of intermediate data classifications in place of the set of first data classifications until it is determined that no further heuristics are applicable in step (d); and (f) outputting the set of intermediate data classifications as the set of second data classifications.

In accordance with a third aspect of the present disclosure, there is provided a data classification method carried out by a processor, comprising the steps of: receiving, from an initial data classification system, a first data classification of a datum; receiving, from a heuristic input, one or more heuristics; and generating a second data classification of the datum based on the first data classification and the one or more heuristics.

In some embodiments, the second data classification is generated by: determining that a heuristic of the one or more heuristics is applicable to the first data classification; and applying the determined heuristic to the first data classification.

In some embodiments, the method further comprises steps of: determining that a further heuristic of the one or more heuristics is applicable to the second data classification; and applying the determined further heuristic to the second data classification.

In some embodiments, the method further comprises the steps of: receiving a set of first data classifications from the initial classification system; and generating a set of second data classifications based on the set of first data classifications and the one or more heuristics.

In some embodiments, the set of second data classifications is generated by: (a) preparing the set of first data classifications; (b) determining that one or more first heuristics are applicable to one or more corresponding first data classifications of the set of first data classifications; (c) generating a set of intermediate data classifications by applying the one or more first heuristics to the one or more corresponding first data classifications of the set of first data classifications, thereby generating a set of intermediate data classifications; and (d) determining that one or more further heuristics are applicable to the set of intermediate data classifications; (e) iterating steps (c) and (d) using the set of intermediate data classifications in place of the set of first data classifications until it is determined that no further heuristics are applicable in step (d); and (f) outputting the set of intermediate data classifications as the set of second data classifications. It will be appreciated that any features described herein as being suitable for incorporation into one or more aspects or embodiments of the present disclosure are intended to be generalizable across any and all aspects and embodiments of the present disclosure. Other aspects of the present disclosure can be understood by those skilled in the art in light of the description, the claims, and the drawings of the present disclosure. The foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the claims.

Brief Description of the Drawings

The invention will now be described by way of example only with reference to the following Figures in which:

Figure 1 shows a schematic view of a classification system in accordance with a first aspect of the present disclosure.

Figure 2 shows a flow diagram of a modelling method in accordance with a second aspect of the present disclosure.

Figure 3 shows a schematic view of a classification system in accordance with a third aspect of the present disclosure.

Figure 4 shows a flow diagram of a modelling method in accordance with a fourth aspect of the present disclosure.

Detailed Description Figure 1 shows a schematic view of a classification system 100 in accordance with a first aspect of the present disclosure.

The system 100 comprises a processor 102 and a heuristic input 104. The system 100 may also comprise further components as is known in the art, such as receivers, routers, transmitters, and processors. The system 100 is in communication with a data classification system 106.

The processor 102 is in communication with the heuristic input 104, and other components of the system 100 that are not shown. The processor 102 is configured to facilitate and execute various functions of the system 100.

The heuristic input 104 is a means for facilitating access to or providing one or more heuristics to the processor 102. In the present example, the heuristic input 104 is a heuristic database 104 comprising a plurality of heuristics. The plurality of heuristics are established by a domain expert, wherein each heuristic is associated with at least one first data classification.

The data classification system 106 is configured to provide one or more first data classifications of one or more input data to the system 100 using known inference-based or learning-based algorithms. The data classification system 106 may utilize any suitable data classification model for applying to the input data to provide corresponding first data classifications. The data classification model may be a rule-based model, a pre-trained model, a learning-based model, or any other suitable data classification model. Figure 2 shows a flow diagram of a modelling method 200 in accordance with a second aspect of the present disclosure. The method 200 provides an iterative approach for providing context-specific modelling of input data.

In a first step 202 of the method 200, the processor 102 receives a set of first data classifications of the input data from the data classification system 106. The first data classifications are initial classifications of the input data determined by the data classification system 106. The first data classifications may be selected from a pre-determined set of first data classifications. The first data classifications provide an initial context for classifying each input datum of the input data.

In the present example, the input data comprises: a first input datum comprising a patient age < 12, and heart defect = true; a second input datum comprising a patient age < 12, and a heart defect = false; and a third input datum comprising a patient age > 12 and a heart defect = false.

Furthermore, the processor 102 receives the set of first classifications, comprising a first data classification of the first input datum of TREATMENT = A; a first data classification of the second input datum of TREATMENT = A; and a first data classification of the third input datum of TREATMENT = A.

In step 204, the processor 102 receives, or retrieves, one or more heuristics of the plurality of heuristics from the heuristic input 104.

In the present example, the one or more heuristics comprises a first heuristic corresponding to patient age < 12, wherein if patient age < 12, and if TREATMENT = A, the classification is TREATMENT = B. The one or more heuristics further comprises a second heuristic corresponding to heart defect wherein if TREATMENT = B, and heart defect = true, the classification is TREATMENT = C.

In step 206, the processor 102 prepares the set of first data classifications. In particular, in step 206, the processor 102 initialises a queue of first data classifications based on the set of first data classifications received from the data classification system 106. The queue represents the set of first data classifications in series. In the present example, the queue is as follows: the first data classification of the first datum, the first data classification of the second datum, then the first data classification of the third datum.

Additionally, in step 206, the processor 102 initialises a first heuristics flag as ‘True’. The first heuristics flag is a global heuristics flag indicative of whether the method 200 is to take place, and when it is set to ‘False’, the method 200 stops and the first classification is retained.

The processor 102 also initialises a second heuristic flag as ‘False’, the second heuristic flag indicative of whether a heuristic has been applied in a current iteration. The processor 102 also initialises a local counter to a ground value of, for example 0, the local counter having a maximum value corresponding to the number of first data classifications.

In step 208, the processor 102 determines that one or more first heuristics are applicable to the set of first data classifications. In particular, in step 208, the processor 102 determines or checks that a heuristic is applicable to a first data classification at the end of the queue, and sets the second heuristic flag to ‘True’. If the processor 102 determines that no heuristic is applicable to the first data classification at the end of the queue, the second heuristic flag remains as ‘False’. The processor 102 then amends the queue such that the checked first data classification is moved to the front of the queue, and iterates the local counter by 1. The processor 102 repeats this until the counter reaches the maximum value, indicating that each first data classification has been checked. Accordingly, if any one of the first data classifications in the queue have an applicable heuristic, the second heuristic flag is set to ‘True’.

In the present example, the processor 102 determines that the first heuristic is applicable to the first classification of the first datum, because the first heuristic is applicable if patient age < 12 and the classification is TREATMENT = A. Accordingly, the second heuristic flag is set to ‘True’. Additionally, the processor 102 determines that the first heuristic is applicable to the first classification of the second datum, but determines that no heuristic is applicable to the first classification of the second datum because the patient age > 12.

In step 210, the processor 102 generates a set of intermediate data classifications by applying the one or more first heuristics to the appropriate first data classifications of the set of first data classifications. This step 210 may occur in parallel with step 208, in that once it is determined that a heuristic is applicable to a first data classification, the heuristic is applied to said first data classification at substantially the same time that the second heuristic flag is set to ‘True’.

In the present example, an intermediate data classification of the first datum is TREATMENT = B by applying the first heuristic to the first data classification of the first datum. Additionally, the intermediate data classification of the second datum is TREATMENT = B by applying the first heuristic to the first data classification of the second datum. Since the processor 102 determines that no heuristic was applicable to the first classification of the third datum, the intermediate data classification inherits the first data classification of TREATMENT = A for the third datum. In step 212, the processor 102 determines that one or more further heuristics are applicable to the set of intermediate data classifications. In particular, the processor 102 determines the value of the second heuristic flag. If the second heuristic flag is set to ‘T rue’, indicating that a heuristic has been applied to at least one of the first data classifications such that a context has changed and that one or more further heuristics may be applied to the intermediate data classifications, the method returns to step 208. In this way, steps 208 to 212 are iterated whenever a first or intermediate data classification has a heuristic applied. If the second heuristic is set to ‘False’, indicating that no more heuristics have been applied to any one of the first or intermediate data classifications, such that the context has not changed since a preceding iteration, the method 200 proceeds to step 216.

In the present example, the processor 102 returns to step 208 and determines that the second heuristic is applicable to the intermediate data classification of the first datum, because the second heuristic is applicable if heart defect = true and TREATMENT = B. Additionally, the processor 102 determines that no heuristic is applicable to the intermediate classification of the second datum, because heart defect = false, and no heuristic is applicable to the intermediate classification of the third datum because TREATMENT = A and heart defect = false.

Accordingly, the second heuristic flag is set to ‘True’ again. Another intermediate data classification of TREATMENT = C is generated for the first datum by applying the second heuristic to the intermediate data classification of the first datum. Since the processor 102 determines that no heuristic was applicable to the intermediate classifications of the second and third data, the intermediate data classifications inherits the preceding intermediate data classifications. When the processor 102 returns again to step 208, it will determine that no further heuristics are applicable, and the second heuristic flag is set to ‘False’.

In step 216, once the processor 102 has determined that the second heuristic flag is set to ‘False’, the processor 102 outputs a set of second data classifications corresponding to the set of intermediate classifications, wherein each second data classification corresponds to a first data classification that has had at least one heuristic applied. It shall be appreciated that some first data classifications will remain unchanged, and that the set of second data classifications may comprise unmodified first data classifications.

In the present example, the set of second data classifications includes TREATMENT = C for the first datum, TREATMENT = B for the second datum, and TREATMENT = A for the third datum.

Accordingly, the modelling method 200 provides an improved means of producing a data classification by adapting a pre-determined classification to domain knowledge.

Figure 3 shows a schematic view of a classification system 300 in accordance with a third aspect of the present disclosure. In the present example, the system 300 and the method 400 are used in the context of disease determination. This context is for illustration purposes only and the skilled person will understand that the system 300 and method 400 may be used for any suitable context.

The system 300 comprises a processor 302; a heuristic input 304; a data classification system 306; and a data source 308. The system 300 may also comprise further components as is known in the art, such as receivers, routers, transmitters, and processors.

The processor 302 is in communication with the other components of the system 300. The processor 302 is configured to facilitate and execute various functions of the system 300. The heuristic input 304 is a means for facilitating access to or providing one or more heuristics to the processor 302. In the present example, the heuristic input 304 is a heuristic database 304 comprising one or more heuristics, the one or more heuristics being preprepared by a healthcare practitioner. The one or more heuristics include a first disease heuristic; and a second disease heuristic.

The one or more heuristics may be established by a domain expert, such as a healthcare practitioner. The healthcare practitioner may have been utilizing a predictive model to determine a first classification for recommending treatments to patients. The predictive model may be effective in recommending treatments to most patient groups. However it may be less effective for at least one group, for example ages of less than 12 years old. The practitioner could propose an alternative treatment for the failed group, wherein the alternative treatment is a heuristic.

The data format of the heuristic could be added to a configuration/text file, or similar, which would be parsed by the proposed algorithm. Each field in the file could relate to a decision criterion for the second data classification to be applied. For example:

CLASSIFIED TREATMENT 1 = A

PATIENT AGE < 12

CLASSIFIED TREATMENT 2 = B

The data source 308 is configured to provide one or more input data to the data classification system 306. The data source 306 is a data source user interface 306 configured to facilitate the input of one or more input data from a user. In the present example, the user inputting the data is a medical professional. The data source 306 is configured to provide one or more data types. In the present example, the one or more data types include: a year of diagnosis; a patient gender; a patient age; a patient nationality; and a patient symptom.

The data classification system 306 is configured to provide one or more first data classifications of the one or more input data from the data source 308. In this example, the data classification system utilizes a probabilistic model for the analysis of diseases. In particular, the data classification system 306 is configured to predict a probability of a subject having a particular disease based on a set of explanatory features provided by the input data, such as the year of diagnosis, the patient gender, the patient age, the patient nationality, and the patient symptoms. The data classification system 306 comprises a patient detail layer; a symptom layer; and a disease layer.

Figure 4 shows a flow diagram of a modelling method 400 in accordance with a fourth aspect of the present disclosure.

In a first step 402 of the method 400, the processor 302 receives a set of input data from the data source user interface 306. The set of input data includes: a first input data set, a second input data set, a third input data set, and a fourth input data set. Each set of input data includes a year of diagnosis, a set of patient features, and a set of patient symptoms from the data source user interface 306. The set of patent features include a patient gender, a patient age, and a patient nationality.

The first set of input data comprises: a year of diagnosis of 2021; a patient gender of male; a patient age of 31; and a patient nationality of British. The second set of input data comprises: a year of diagnosis of 2017; a patient gender of female; a patient age of 23; and a patient nationality of French. The third set of input data comprises: a year of diagnosis of 2021; a patient gender of female; a patient age of 31; and a patient nationality of British. The fourth set of input data comprises: a year of diagnosis of 2021 ; a patient gender of male; a patient age of 31 ; and a patient nationality of Brazilian. Each set of input data comprises patient symptoms of a cough and a fever.

In step 404, the processor 302 determines a first data classification of the set of input data. In particular, the process 302 applies a classification model to the set of symptom data. In the present example, the processor 302 applies a rule-based model. However, the skilled person will appreciate that any suitable classification model may be applied to the set of input data, such as a learning-based model or a pre-trained model.

In the present example, the first data classification of each input data set is 50% influenza and 50% pneumonia.

In step 406, the processor 302 carries out the data classification method 200.

The one or more heuristics include a first disease heuristic; and a second disease heuristic. In the following example, the first disease heuristic is a ‘COVID-19’ heuristic, wherein if the first data classification is influenza, and the year data is greater than 2020, the first disease heuristic is applied such that a second data classification that the disease is COVID-19 is increased. Furthermore, if the first data classification is influenza, and the year data is less than 2020, the first disease heuristic is applied such that the second data classification remains as the first data classification. The second disease heuristic is a ‘Dengue Fever’ heuristic, wherein if the first data classification is pneumonia, and the nationality data is Brazilian, the second disease heuristic is applied such that the second data classification being Dengue Fever is increased. At step 408, the processor 302 outputs a second data classification of each input data set. In this example, the second data classification of the first set of input data is the first data classification. The second data classification of the second set of input data is 80% COVID- 19, 10% influenza, and 10% pneumonia. The second data classification of the third set of input data is 50% influenza and 10% pneumonia. The second data classification of the fourth set of input data is 80% Dengue Fever, 10% influenza, and 10% pneumonia.

Accordingly, the system 300 and method 400 provide a means for modelling one or more additional diseases, for example, COVID-19 without including additional nodes on the disease layer. Every time a node is added to a model, it creates more associations with other nodes in the preceding (e.g. symptom) layer, thereby increasing the complexity of the model. The additional associations also increase the computational time for modelling because a greater number of parameters are needed to be learned or optimized.

Figure 5 shows a schematic view of an object sorting system 500 in accordance with a fifth aspect of the present disclosure. The system 500 is in the context of warehouse organization.

The system 500 comprises a processor 502; an inventory datastore 504; a data classification system 506; a heuristic input 508; a command module 510; and a sorting means 512. In the present example, the system is implemented in a cloud environment (not shown).

The processor 502 is in communication with the inventory datastore 504; the data classification system 506; the heuristic input 508; and the command module 510. The processor 502 is configured to facilitate and execute various functions of the system 500. The inventory datastore 504 comprises inventory data associated with real-world inventory objects that are to be classified. In the present example, the inventory datastore 506 comprises inventory data associated with: a first aid kit; a fastener; and an electrical wire. It will be appreciated that the inventory datastore 506 may comprise inventory data associated with any number of inventory objects.

The data classification system 506 is configured to provide a first data classification of input data, such as the inventory data of the inventory datastore 504. In the present example, the data classification system 506 is a pre-trained initial classification model, and may be an inference-based system.

The initial classification model may be trained on historical classified objects to learn the specific patterns in the object numbers associated with the object. For example, the initial classification model may be based on an object number of an object. For example, the learned pattern may be object numbers beginning with ‘FCT, and the learned pattern classification may be ‘FASTENER’.

The heuristic input 508 is configured to provide a context-specific model with one or more heuristics, wherein the heuristics are initially input by a domain expert, such as a warehouse organisation expert.

In the present example, the heuristic input 508 comprises a first heuristic of: if classification = ‘FASTENER’, and the first part of description contains ‘SCR’, then classification = ‘FASTENER - SCREW. The heuristic input 508 also comprises a second heuristic of: if classification = ‘FASTENER - SCREW, and part of the description contains ‘SS1’, then classification = ‘FASTENER - SCREW - STAINLESS STEEL’. It will be understood that these heuristics are exemplary and further heuristics may be envisaged. The processor 502 is configured to execute the method 200 using the first data classification provided by the data classification system 506, and the one or more heuristics provided by the heuristic input 508, thereby generating a second data classification.

The command module 510 is configured to receive the second data classification from the processor 502, and generate a command based on the second data classification. In particular, the command module 510 is configured to generate a command for the sorting means 512 based on the second data classification, wherein the second data classification is indicative of a warehouse zone in which the corresponding item is to be stored.

The command module 510 is in communication with the sorting means 512. The sorting means 512 is configured to receive, and execute, the command. In particular, the sorting means 512 is configured to retrieve an item, and move the item to the relevant warehouse zone or location associated with the second data classification. In the present example, the sorting means 512 is a mobile robot 512. The sorting means 512 may comprise a plurality of mobile robots.

Figure 6 shows a flow diagram of an object sorting method 600 in accordance with a second aspect of the present disclosure.

At step 602, the processor 502 inputs a set of inventory a of the inventory datastore 504 into the data classification system 506. In the present example, the input data comprises: an input object number ‘FC112345’, and an input description of ‘SCREW SS1’.

At step 604, the data classification system 506 provides a first data classification of the input datum. In this example, the first data classification is ‘FASTENER’ because the initial classification model has a learned pattern wherein object numbers beginning with ‘FC1’ are classified as ‘FASTENER’.

At step 606, the processor 502 receives or determines one or more heuristic from the heuristic input 508. In the present example, the processor 102 receives the first heuristic of: if classification = ‘FASTENER’, and first part of description contains ‘SCR’, then classification = ‘FASTENER - SCREW.

At step 608, the processor 502 executes the modelling method 200, thereby generating a second data classification. In a first iteration of the method 200, the processor 502 determines that the first heuristic is applicable because the classification = ‘FASTENER’, and the first part of description contains ‘SCR’. Therefore, the processor 502 sets the second flag to ‘True’ and generates an intermediate classification of classification = ‘FASTENER - SCREW. In a second iteration of the method 200, the processor 502 determines that the second heuristic is applicable because the classification = ‘FASTENER - SCREW and part of the description contains ‘SS1’. The processor 502 sets the second flag to ‘True’ and generates an intermediate classification of classification = ‘FASTENER - SCREW. In a third iteration of the method 200, the processor 502 determines that no heuristics are applicable and the second flag is set to ‘False’. The second data classification is therefore output as ‘FASTENER - SCREW- STAINLESS STEEL’.

At step 610, the processor 502 provides the second data classification to the command module 510.

At step 612, the command module 510 determines or generates a command. In particular, the command module 510 determines an initial location of the object associated with the input datum, indicative of a current location of the object in the warehouse. The command module 510 generates a command in which the object is to be transported from the initial location to a final location, wherein the final location corresponds to the second data classification.

At step 614, the command module 510 provides the command to the mobile robot 512.

At step 616, the mobile robot 512 retrieves the object from the first location and moves the object to the final location.

The description provided herein may be directed to specific implementations. It should be understood that the discussion provided herein is provided for the purpose of enabling a person with ordinary skill in the art to make and use any subject matter defined herein by the subject matter of the claims.

It should be intended that the subject matter of the claims not be limited to the implementations and illustrations provided herein, but include modified forms of those implementations including portions of implementations and combinations of elements of different implementations in accordance with the claims. It should be appreciated that in the development of any such implementation, as in any engineering or design project, numerous implementation-specific decisions should be made to achieve a developers’ specific goals, such as compliance with system-related and business related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort may be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having benefit of this disclosure. Reference has been made in detail to various implementations, examples of which are illustrated in the accompanying drawings and figures. In the detailed description, numerous specific details are set forth to provide a thorough understanding of the disclosure provided herein. However, the disclosure provided herein may be practiced without these specific details. In some other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure details of the embodiments.

It should also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element. The first element and the second element are both elements, respectively, but they are not to be considered the same element.

The terminology used in the description of the disclosure provided herein is for the purpose of describing particular implementations and is not intended to limit the disclosure provided herein. As used in the description of the disclosure provided herein and appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. The term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The terms “includes,” “including,” “comprises,” and/or “comprising,” when used in this specification, specify a presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof. While the foregoing is directed to implementations of various techniques described herein, other and further implementations may be devised in accordance with the disclosure herein, which may be determined by the claims that follow. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.