Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
FIRST NODE, SECOND NODE, THIRD NODE, FOURTH NODE AND METHODS PERFORMED THEREBY FOR HANDLING SOURCE CODE
Document Type and Number:
WIPO Patent Application WO/2024/084488
Kind Code:
A1
Abstract:
A method, performed by a first node (111), for handling source code. The first node (111) obtains (201) first units of code, respective comments elicited during a review, and a mapping between the units and the comments. The first node (111) obtains (202), respective values of respective features characterizing the units and extracted from each first unit. The first node (111) also obtains (203) a set of correspondences between each unit and a respective subset of values and a class. The first node (111) also determines (204), using a first machine-learning method of anomaly detection, and based on the obtained set of correspondences, a first predictive model to predict whether or not a second unit in a source code is to elicit a comment by a reviewer. The first node (111) then outputs (208), to a second node (112) operating in the computer system (100), a first indication of the first predictive model.

Inventors:
SRIDHARA GIRIPRASAD (IN)
VUPPALA SUNIL KUMAR (IN)
Application Number:
PCT/IN2022/050936
Publication Date:
April 25, 2024
Filing Date:
October 20, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERICSSON TELEFON AB L M (SE)
SRIDHARA GIRIPRASAD (IN)
International Classes:
G06F8/70; G06F11/36
Attorney, Agent or Firm:
D J, Solomon et al. (IN)
Download PDF:
Claims:
CLAIMS:

1 . A computer-implemented method performed by a first node (111 ), the method being for handling source code, the first node (111) operating in a computer system (100), the method comprising:

- obtaining (201 ) one or more first units of source code, one or more respective comments elicited during a review of the one or more first units of source code, and a mapping between the one or more first units of source code and the one or more respective comments,

- obtaining (202) one or more first respective values of one or more first respective features characterizing the one or more first units, the one or more first respective features being extracted from each first unit of the obtained one or more first units,

- obtaining (203) a set of correspondences between each unit of the obtained one or more first units and a respective subset of one or more first respective values and a respective class, the respective class having been determined based on the respective subset of one or more first respective values, the set of correspondences being based on the obtained mapping and the obtained one or more first respective values,

- determining (204), using a first machine-learning method of anomaly detection, and based on the obtained set of correspondences, a first predictive model to predict whether or not a second unit in a source code is to elicit a comment by a reviewer,

- outputting (208), to a second node (112) operating in the computer system (100), a first indication of the first predictive model.

2. The method according to claim 1 , further comprising:

- obtaining (205) one or more second units of source code,

- obtaining (206) one or more second respective values of one or more second respective features characterizing the one or more second units, the one or more second respective features being extracted from each second unit of the obtained one or more second units, and

- determining (207), using the first predictive model and the obtained one or more second respective values, whether or not each second unit in the one or more second units of source code is to elicit a respective comment by a reviewer, and wherein the first indication indicates which one or more third units of the one or more second units of source code are predicted to elicit a respective comment by the reviewer.

3. The method according to any of claims 1-2, wherein the method further comprises:

- determining (210), using a second machine-learning method, and based on one or more fourth units of source code which, out of the one or more first units of source code, have elicited a respective comment by the reviewer, a second predictive model to predict which comment is predicted to be elicited, respectively, by the one or more third units, and

- outputting (212), at least one of:

I. a second indication indicating the one or more fourth units of source code to the third node (113), and ii. a third indication of the second predictive model to another node (114, 115) operating in the computer system (100).

4. The method according to claim 3, further comprising:

- processing (209), prior to determining (209) the second predictive model, the obtained one or more respective comments, wherein the processing (208) comprises: i. generating embeddings for each comment to normalize different word usages, ii. reducing high dimensionality using a principal component analysis for clustering the normalized one or more respective comments, ill. clustering the normalized one or more respective comments using K- Means, and iv. outputting an indication of clusters of similar comments to be used as input for determining (209) the second predictive model.

5. The method according to any of claims 3-4, further comprising:

- determining (211), using the second predictive model, which respective comment will be elicited by each third unit of the one or more third units of source code predicted to elicit a respective comment by the reviewer, and

- wherein the third indication indicates which comment is predicted to be respectively elicited by each third unit of the one or more third units. The method according to any of claims 1-5, wherein at least one of:

- each of the units of source code is one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method,

- the first indication explicitly indicates the first predictive model,

- the one or more first units are obtained in a file of source code,

- the one or more respective comments are in natural language,

- any of the respective features comprise one or more of: a statement type of a respective first line, a length of the respective first line, one or more libraries used in the respective first line, library name used, and a length of a respective identifier used in the respective first line,

- any of the respective features are one of: pre-configured and automatically extractable,

- the set of correspondences is a machine-learning table,

- the first machine-learning method of anomaly detection is one of an autoencoder and a one class support vector machine,

- the determining (204) of the first predictive model comprises a testing phase with a first group of samples, and an execution phase with a second group of samples, and

- the second machine-learning method is a k-nearest neighbours method. A computer-implemented method performed by a second node (112), the method being for handling source code, the second node (112) operating in a computer system (100), the method comprising:

- obtaining (301 ), from a first node (111) operating in the computer system (100), a first indication of a first predictive model of anomaly detection, the first predictive model being to predict whether or not a second unit in a source code is to elicit a comment by a reviewer,

- obtaining (302) one or more second units of source code,

- obtaining (303), from each second unit of the obtained one or more second units, one or more second respective values of one or more second respective features characterizing the one or more second units,

- determining (304), using the first predictive model and the obtained one or more second respective values, whether or not each second unit in the one or more second units of source code is to elicit a respective comment by a reviewer, and

- outputting (305), to a fourth node (114) configured to operate in the computer system (100), a fourth indication indicating which one or more third units of the one or more second units of source code are predicted to elicit a respective comment by the reviewer. The method according to claim 7, wherein at least one of:

- each of the units of source code is one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method,

- the first indication explicitly indicates the first predictive model,

- the one or more second units are obtained in a in a file of source code,

- the respective comment is in natural language,

- the one or more second respective features comprising one or more of: a statement type of a respective first line, a length of the respective first line, one or more libraries used in the respective first line, library name used, and a length of a respective identifier used in the respective first line,

- the one or more second respective features are one of: pre-configured and automatically extractable, and

- the first machine-learning method of anomaly detection is one of an autoencoder and a one class support vector machine. A computer-implemented method performed by a third node (113), the method being for handling source code, the third node (113) operating in a computer system (100), the method comprising:

- obtaining (401 ), from a first node (111) operating in the computer system (100), a second indication, the second indication indicating which one or more fourth units, of one or more first units of source code, have elicited a respective comment by a reviewer,

- determining (404), using a second machine-learning method, and based on the one or more fourth units of source code which, out of the one or more first units of source code, have elicited a respective comment by the reviewer, a second predictive model to predict which comment is predicted to be elicited, respectively, by one or more third units of source code, and

- outputting (405), to a fourth node (114) operating in the computer system (100), a fifth indication of the second predictive model. The method according to claim 9, further comprising at least one of:

- obtaining (402), from each first unit of the one or more first units, one or more first respective values of one or more first respective features characterizing the one or more first units, and wherein the determining (404) is based on the one or more first respective values of one or more first respective features characterizing the one or more first units,

- processing (403), prior to determining (404) the second predictive model, one or more respective comments elicited during a review of the one or more fourth units of source code, wherein the processing (403) comprises: i. generating embeddings for each comment to normalize different word usages, ii. reducing high dimensionality using a principal component analysis for clustering the normalized one or more respective comments, ill. clustering the normalized one or more respective comments using K- Means, and iv. outputting an indication of clusters of similar comments to be used as input for determining (404) the second predictive model. The method according to claim 10, wherein at least one of:

- each of the units of source code is one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method,

- the one or more respective comments are in natural language,

- the fifth indication explicitly indicates the second predictive model, and

- the second machine-learning method is a k-nearest neighbours method. A computer-implemented method performed by a fourth node (114), the method being for handling source code, the fourth node (114) operating in a computer system (100), the method comprising:

- obtaining (501 ), from a third node (113) operating in the computer system (100), a fifth indication, the fifth indication indicating a second predictive model to predict which comment is predicted to be elicited, respectively, by one or more third units of source code, and

- obtaining (502), from a second node (112) operating in the computer system (100), a fourth indication, the fourth indication indicating which one or more third units, of one or more second units of source code, are predicted to elicit a respective comment by a reviewer,

- determining (503), using the second predictive model, which respective comment will be elicited by each third unit of the one or more third units of source code predicted to elicit a respective comment by the reviewer, and outputting (504) a sixth indication, the sixth indication indicating which comment is predicted to be respectively elicited by each third unit of the one or more third units.

13. The method according to claim 12, wherein at least one of:

- each of the units of source code is one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method,

- the sixth indication indicates which comment is predicted to be respectively elicited by each third unit of the one or more third units,

- the comment predicted to be respectively elicited by each third unit is in natural language, and

- the second machine-learning method is a k-nearest neighbours method.

14. A first node (111), for handling source code, the first node (111) being configured to operate in a computer system (100), the first node (111) being further configured to:

- obtain one or more first units of source code, one or more respective comments configured to be elicited during a review of the one or more first units of source code, and a mapping between the one or more first units of source code and the one or more respective comments,

- obtain one or more first respective values of one or more first respective features configured to characterize the one or more first units, the one or more first respective features being configured to be extracted from each first unit of the one or more first units configured to be obtained,

- obtain a set of correspondences between each unit of the one or more first units configured to be obtained and a respective subset of one or more first respective values and a respective class, the respective class being configured to have been determined based on the respective subset of one or more first respective values, the set of correspondences being based on the mapping configured to be obtained and the one or more first respective values configured to be obtained,

- determine, using a first machine-learning method of anomaly detection, and based on the set of correspondences configured to be obtained, a first predictive model to predict whether or not a second unit in a source code is to elicit a comment by a reviewer,

- output, to a second node (112) configured to operate in the computer system (100), a first indication of the first predictive model. The first node (111 ) according to claim 14, further configured to:

- obtain one or more second units of source code,

- obtain one or more second respective values of one or more second respective features configured to characterize the one or more second units, the one or more second respective features being configured to be extracted from each second unit of the one or more second units configured to be obtained, and

- determine, using the first predictive model and the one or more second respective values configured to be obtained, whether or not each second unit in the one or more second units of source code is to elicit a respective comment by a reviewer, and

- wherein the first indication is configured to indicate which one or more third units of the one or more second units of source code are configured to be predicted to elicit a respective comment by the reviewer. The first node (111 ) according to any of claims 14-15, wherein the first node (111) is further configured to:

- determine, using a second machine-learning method, and based on one or more fourth units of source code which, out of the one or more first units of source code, are configured to have elicited a respective comment by the reviewer, a second predictive model to predict which comment is predicted to be elicited, respectively, by the one or more third units, and

- output, at least one of:

I. a second indication configured to indicate the one or more fourth units of source code to the third node (113), and ii. a third indication of the second predictive model to another node (114, 115) configured to operate in the computer system (100). The first node (111 ) according to claim 16, being further configured to:

- process, prior to determining the second predictive model, the one or more respective comments configured to be obtained, wherein the processing is configured to comprise: i. generating embeddings for each comment to normalize different word usages, ii. reducing high dimensionality using a principal component analysis for clustering the normalized one or more respective comments, iii. clustering the normalized one or more respective comments using K- Means, and iv. outputting an indication of clusters of similar comments to be used as input for determining the second predictive model. The first node (111 ) according to any of claims 16-17, being further configured to:

- determine, using the second predictive model, which respective comment is to be elicited by each third unit of the one or more third units of source code predicted to elicit a respective comment by the reviewer, and

- wherein the third indication is configured to indicate which comment is predicted to be respectively elicited by each third unit of the one or more third units. The first node (111 ) according to any of claims 14-18, wherein at least one of:

- each of the units of source code is configured to be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method,

- the first indication is configured to explicitly indicate the first predictive model,

- the one or more first units are configured to be obtained in a file of source code,

- the one or more respective comments are configured to be in natural language,

- any of the respective features are configured to comprise one or more of: a statement type of a respective first line, a length of the respective first line, one or more libraries used in the respective first line, library name used, and a length of a respective identifier used in the respective first line,

- any of the respective features are configured to be one of: pre-configured and automatically extractable,

- the set of correspondences is configured to be a machine-learning table,

- the first machine-learning method of anomaly detection is configured to be one of an auto-encoder and a one class support vector machine,

- the determining of the first predictive model is configured to comprise a testing phase with a first group of samples, and an execution phase with a second group of samples, and

- the second machine-learning method is configured to be a k-nearest neighbours method. A second node (112), for handling source code, the second node (112) being configured to operate in a computer system (100), the second node (112) being further configured to: - obtain, from a first node (111) configured to operate in the computer system (100), a first indication of a first predictive model of anomaly detection, the first predictive model being configured to predict whether or not a second unit in a source code is to elicit a comment by a reviewer,

- obtain one or more second units of source code,

- obtain, from each second unit of the one or more second units configured to be obtained, one or more second respective values of one or more second respective features configured to be characterizing the one or more second units,

- determine, using the first predictive model and the one or more second respective values configured to be obtained, whether or not each second unit in the one or more second units of source code is to elicit a respective comment by a reviewer, and

- output, to a fourth node (114) configured to operate in the computer system (100), a fourth indication configured to indicate which one or more third units of the one or more second units of source code are configured to be predicted to elicit a respective comment by the reviewer.

21 . The second node (112) according to claim 20, wherein at least one of:

- each of the units of source code is configured to be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method,

- the first indication is configured to explicitly indicate the first predictive model,

- the one or more second units are configured to be obtained in a in a file of source code,

- the respective comment is configured to be in natural language,

- the one or more second respective features are configured to comprise one or more of: a statement type of a respective first line, a length of the respective first line, one or more libraries used in the respective first line, library name used, and a length of a respective identifier used in the respective first line,

- the one or more second respective features are configured to be one of: preconfigured and automatically extractable, and

- the first machine-learning method of anomaly detection is configured to be one of an auto-encoder and a one class support vector machine.

22. A third node (113), for handling source code, the third node (113) being configured to operate in a computer system (100), the third node (113) being further configured to: - obtain, from a first node (111) configured to operate in the computer system (100), a second indication, the second indication being configured to indicate which one or more fourth units, of one or more first units of source code, are configured to have elicited a respective comment by a reviewer,

- determine, using a second machine-learning method, and based on the one or more fourth units of source code which, out of the one or more first units of source code, are configured to have elicited a respective comment by the reviewer, a second predictive model to predict which comment is predicted to be elicited, respectively, by one or more third units of source code, and

- output, to a fourth node (114) configured to operate in the computer system (100), a fifth indication of the second predictive model. third node (113) according to claim 22, being further configured to:

- obtain, from each first unit of the one or more first units, one or more first respective values of one or more first respective features configured to characterize the one or more first units, and wherein the determining is configured to be based on the one or more first respective values of one or more first respective features configured to characterize the one or more first units,

- process, prior to determining the second predictive model, one or more respective comments configured to have been elicited during a review of the one or more fourth units of source code, wherein the processing is configured to comprise: i. generating embeddings for each comment to normalize different word usages, ii. reducing high dimensionality using a principal component analysis for clustering the normalized one or more respective comments, ill. clustering the normalized one or more respective comments using K- Means, and iv. outputting an indication of clusters of similar comments to be used as input for determining the second predictive model. third node (113) according to claim 23, wherein at least one of:

- each of the units of source code is configured to be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method,

- the one or more respective comments are configured to be in natural language, - the fifth indication is configured to explicitly indicate the second predictive model, and

- the second machine-learning method is configured to be a k-nearest neighbours method. A fourth node (114), for handling source code, the fourth node being configured to operate in a computer system (100), the fourth node (114) being further configured to:

- obtain, from a third node (113) configured to operate in the computer system (100), a fifth indication, the fifth indication being configured to indicate a second predictive model to predict which comment is predicted to be elicited, respectively, by one or more third units of source code, and

- obtain, from a second node (112) configured to operate in the computer system (100), a fourth indication, the fourth indication being configured to indicate which one or more third units, of one or more second units of source code, are predicted to elicit a respective comment by a reviewer,

- determine, using the second predictive model, which respective comment is to be elicited by each third unit of the one or more third units of source code predicted to elicit a respective comment by the reviewer, and

- output a sixth indication, the sixth indication being configured to indicate which comment is predicted to be respectively elicited by each third unit of the one or more third units. The method according to claim 25, wherein at least one of:

- each of the units of source code is configured to be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method,

- the sixth indication is configured to indicate which comment is predicted to be respectively elicited by each third unit of the one or more third units,

- the comment predicted to be respectively elicited by each third unit is configured to be in natural language, and

- the second machine-learning method is configured to be a k-nearest neighbours method. A computer program (1005), comprising instructions which, when executed on at least one processing circuitry (1001), cause the at least one processing circuitry (1001 ) to carry out the method according to any of claims 1-6.

28. A computer-readable storage medium (1006), having stored thereon a computer program (1005), comprising instructions which, when executed on at least one processing circuitry (1001), cause the at least one processing circuitry (1001) to carry out the method according to any of claims 1-6.

29. A computer program (1105), comprising instructions which, when executed on at least one processing circuitry (1101), cause the at least one processing circuitry (1101 ) to carry out the method according to any of claims 7-8.

30. A computer-readable storage medium (1106), having stored thereon a computer program (1105), comprising instructions which, when executed on at least one processing circuitry (1101), cause the at least one processing circuitry (1101) to carry out the method according to any of claims 7-8.

31. A computer program (1205), comprising instructions which, when executed on at least one processing circuitry (1201), cause the at least one processing circuitry (1201 ) to carry out the method according to any of claims 9-11 .

32. A computer-readable storage medium (1206), having stored thereon a computer program (1205), comprising instructions which, when executed on at least one processing circuitry (1201), cause the at least one processing circuitry (1201) to carry out the method according to any of claims 9-11 .

33. A computer program (1305), comprising instructions which, when executed on at least one processing circuitry (1301), cause the at least one processing circuitry (1301 ) to carry out the method according to any of claims 12-13.

34. A computer-readable storage medium (1306), having stored thereon a computer program (1305), comprising instructions which, when executed on at least one processing circuitry (1301), cause the at least one processing circuitry (1301) to carry out the method according to any of claims 12-13.

Description:
FIRST NODE, SECOND NODE, THIRD NODE, FOURTH NODE AND METHODS PERFORMED THEREBY FOR HANDLING SOURCE CODE

TECHNICAL FIELD

The present disclosure relates generally to a first node and methods performed thereby, for handling source code. The present disclosure also relates generally to a second node and methods performed thereby, for handling source code. The present disclosure additionally relates generally to a third node and methods performed thereby, for handling source code. The present disclosure further relates generally to a fourth node and methods performed thereby, for handling source code. The present disclosure also relates generally to a computer programs and computer-readable storage mediums, having stored thereon the computer programs to carry out these methods.

BACKGROUND

Computer systems in a communications network may comprise one or more nodes, which may also be referred to simply as nodes. A node may comprise one or more processors which, together with computer program code may perform different functions and actions, a memory, a receiving port and a sending port. A node may be, for example, a server. Nodes may perform their functions entirely on the cloud.

Software quality assurance (SQA) may be understood as the practice of monitoring software engineering processes, methods, and work products to ensure compliance against defined standards. SQA may include standards and procedures that managers, administrators or developers may use to review and audit software products and activities in order to verify that the software may meet quality criteria which may link it to standards. SQA may be provided in different complementary ways such as: software testing, static analysis of source code to detect potential problems such as potential null pointer exceptions, e.g., Lint/Sonarqube, and manual code review by other experienced developers. Manual code review has its own place in SQA and may be complementary to software testing and static analysis for errors. In modern software development, code review may be understood to be an inalienable part of SQA.

Existing methods of code review may require expert developers. These developers may have their own tasks to perform and hence, may not be available for code review, which may be understood to be a time consuming task.

Some existing methods have attempted to automate code review. Some of these existing solutions, such as the one described in [1], may use Deep Learning, e.g., Transformers, to attempt to automate code review. Such an approach, as it is based on a Deep Learning architecture, may be expensive to train and is time consuming. Further, such deep learning approaches may typically need a very large number training samples, e.g., thousands or millions.

Another approach is a code review tool based on having an existing catalogue of code review standards, such as naming convention, loops etc. [2], It is. The big disadvantage of such a catalogue is that it needs to be manually populated by subject matter experts. Such manual activities may be tedious, time consuming and error prone. Further, the catalogue may become obsolete fairly soon if the catalogue is not kept frequently checked and revised.

SUMMARY

Embodiments herein may be understood to address the gap in automation in code review.

It is an object of embodiments herein to improve the handling of source code.

According to a first aspect of embodiments herein, the object is achieved by a computer- implemented method performed by a first node. The method is for handling source code. The first node operates in a computer system. The first node obtains one or more first units of source code, one or more respective comments elicited during a review of the one or more first units of source code, and a mapping between the one or more first units of source code and the one or more respective comments. The first node also obtains one or more first respective values of one or more first respective features characterizing the one or more first units. The one or more first respective features are extracted from each first unit of the obtained one or more first units. The first node further obtains a set of correspondences between each unit of the obtained one or more first units and a respective subset of one or more first respective values and a respective class. The respective class has been determined based on the respective subset of one or more first respective values. The set of correspondences is based on the obtained mapping and the obtained one or more first respective features. The first node also determines, using a first machine-learning method of anomaly detection, and based on the generated set of correspondences, a first predictive model. The first predictive model is to predict whether or not a second unit in a source code, that is, a new second unit of source code, is to elicit a comment by a reviewer. The first node then outputs, to the second node operating in the computer system, a first indication of the first predictive model.

According to a second aspect of embodiments herein, the object is achieved by a computer-implemented method performed by the second node. The method is for handling source code. The second node operates in the computer system. The second node obtains, from the first node operating in the computer system, the first indication of the first predictive model of anomaly detection. The first predictive model is to predict whether or not the second unit in the source code is to elicit a comment by a reviewer. The second node obtains the one or more second units of source code. The second node also obtains, from each second unit of the obtained one or more second units, the one or more second respective values of the one or more second respective features characterizing the one or more second units. The second node also determines, using the first predictive model and the obtained one or more second respective values, whether or not each second unit in the one or more second units of source code is to elicit a respective comment by a reviewer. The second node then outputs, to a fourth node configured to operate in the computer system, a fourth indication. The fourth indication indicates which one or more third units of the one or more second units of source code are predicted to elicit a respective comment by the reviewer.

According to a third aspect of embodiments herein, the object is achieved by a computer- implemented method performed by a third node. The method is for handling source code. The third node operates in the computer system. The third node obtains, from the first node operating in the computer system, a second indication. The second indication indicates which one or more fourth units, of the one or more first units of source code, have elicited a respective comment by a reviewer. The third node, determines, using a second machinelearning method, and based on the one or more fourth units of source code which, out of the one or more first units of source code, have elicited a respective comment by the reviewer, the second predictive model. The second predictive model is to predict which comment is predicted to be elicited, respectively, by one or more third units of source code. The third node then outputs, to the fourth node operating in the computer system, a fifth indication of the second predictive model

According to a fourth aspect of embodiments herein, the object is achieved by a computer-implemented method performed by the fourth node. The method is for handling source code. The fourth node operates in the computer system. The fourth node obtains, from the third node operating in the computer system, the fifth indication. The fifth indication indicates the second predictive model to predict which comment is predicted to be elicited, respectively, by the one or more third units of source code. The fourth node also obtains, from the second node operating in the computer system, the fourth indication. The fourth indication indicates which one or more third units, of the one or more second units of source code, are predicted to elicit a respective comment by a reviewer. The fourth node also determines, using the second predictive model, which respective comment will be elicited by each third unit of the one or more third units of source code predicted to elicit a respective comment by the reviewer. The fourth node also outputs a sixth indication. The sixth indication indicates which comment is predicted to be respectively elicited by each third unit of the one or more third units. According to a fifth aspect of embodiments herein, the object is achieved by the first node. The first node is for handling source code. The first node is configured to operate in the computer system. The first node is configured to obtain the one or more first units of source code, the one or more respective comments configured to be elicited during the review of the one or more first units of source code, and the mapping between the one or more first units of source code and the one or more respective comments. The first node is further configured to obtain the one or more first respective values of the one or more first respective features configured to characterize the one or more first units. The one or more first respective features are configured to be extracted from each first unit of the one or more first units configured to be obtained. The first node is also configured to obtain the set of correspondences between each unit of the one or more first units configured to be obtained and the respective subset of the one or more first respective values and the respective class. The respective class is configured to have been determined based on the respective subset of the one or more first respective values. The set of correspondences is configured to be based on the mapping configured to be obtained and the one or more first respective values configured to be obtained. The first node is further configured to determine, using the first machine-learning method of anomaly detection, and based on the set of correspondences configured to be obtained, the first predictive model. The first predictive model is to predict whether or not the second unit in the source code is to elicit a comment by a reviewer. The first node is additionally configured to output, to the second node configured to operate in the computer system, the first indication of the first predictive model.

According to a sixth aspect of embodiments herein, the object is achieved by the second node. The second node is for handling source code. The second node is configured to operate in the computer system. The second node is configured to obtain, from the first node configured to operate in the computer system, the first indication. The first indication is of the first predictive model of anomaly detection. The first predictive model is configured to predict whether or not the second unit in the source code is to elicit the comment by the reviewer. The second node is further configured to obtain the one or more second units of source code. The second node is further configured to obtain, from each second unit of the one or more second units configured to be obtained, the one or more second respective values of one or more second respective features configured to be characterizing the one or more second units. The second node is further configured to determine, using the first predictive model and the one or more second respective values configured to be obtained, whether or not each second unit in the one or more second units of source code is to elicit a respective comment by a reviewer. The second node is additionally configured to output, to the fourth node configured to operate in the computer system, the fourth indication. The fourth indication is configured to indicate which one or more third units of the one or more second units of source code are configured to be predicted to elicit a respective comment by the reviewer.

According to a seventh aspect of embodiments herein, the object is achieved by the third node. The third node is for handling source code. The third node is configured to operate in the computer system. The third node is configured to obtain, from the first node configured to operate in the computer system, the second indication. The second indication is configured to indicate which one or more fourth units, of one or more first units of source code, are configured to have elicited a respective comment by a reviewer. The third node is further configured to determine, using the second machine-learning method, and based on the one or more fourth units of source code which, out of the one or more first units of source code, are configured to have elicited a respective comment by the reviewer, the second predictive model. The second predictive model is to predict which comment is predicted to be elicited, respectively, by the one or more third units of source code. The third node is additionally configured to output, to the fourth node configured to operate in the computer system, the fifth indication of the second predictive model.

According to an eighth aspect of embodiments herein, the object is achieved by the fourth node. The fourth node is for handling source code. The fourth node is configured to operate in the computer system. The fourth node is configured to obtain, from the third node configured to operate in the computer system, the fifth indication. The fifth indication is configured to indicate the second predictive model to predict which comment is predicted to be elicited, respectively, by the one or more third units of source code. The fourth node is further configured to obtain, from the second node configured to operate in the computer system, the fourth indication. The fourth indication is configured to indicate which one or more third units, of the one or more second units of source code, are predicted to elicit a respective comment by a reviewer. The fourth node is further configured to determine, using the second predictive model, which respective comment is to be elicited by each third unit of the one or more third units of source code predicted to elicit a respective comment by the reviewer. The fourth node is additionally configured to output the sixth indication. The sixth indication is configured to indicate which comment is predicted to be respectively elicited by each third unit of the one or more third units.

According to a ninth aspect of embodiments herein, the object is achieved by a computer program, comprising instructions which, when executed on at least one processing circuitry, cause the at least one processing circuitry to carry out the method performed by the first node.

According to a tenth aspect of embodiments herein, the object is achieved by a computer-readable storage medium, having stored thereon the computer program, comprising instructions which, when executed on at least one processing circuitry, cause the at least one processing circuitry to carry out the method performed by the first node.

According to an eleventh aspect of embodiments herein, the object is achieved by a computer program, comprising instructions which, when executed on at least one processing circuitry, cause the at least one processing circuitry to carry out the method performed by the second node.

According to a twelfth aspect of embodiments herein, the object is achieved by a computer-readable storage medium, having stored thereon the computer program, comprising instructions which, when executed on at least one processing circuitry, cause the at least one processing circuitry to carry out the method performed by the second node.

According to a thirteenth aspect of embodiments herein, the object is achieved by a computer program, comprising instructions which, when executed on at least one processing circuitry, cause the at least one processing circuitry to carry out the method performed by the third node.

According to a fourteenth aspect of embodiments herein, the object is achieved by a computer-readable storage medium, having stored thereon the computer program, comprising instructions which, when executed on at least one processing circuitry, cause the at least one processing circuitry to carry out the method performed by the third node.

According to a fifteenth aspect of embodiments herein, the object is achieved by a computer program, comprising instructions which, when executed on at least one processing circuitry, cause the at least one processing circuitry to carry out the method performed by the fourth node.

According to a sixteenth aspect of embodiments herein, the object is achieved by a computer-readable storage medium, having stored thereon the computer program, comprising instructions which, when executed on at least one processing circuitry, cause the at least one processing circuitry to carry out the method performed by the fourth node.

By obtaining the one or more first units of source code, the one or more respective comments and the mapping, the first node may then be enabled to derive a predictive model of when a unit of source code may elicit a comment.

By obtaining the one or more first respective values of the one or more first respective features, the first node may then be enabled to, using the extracted feature values and the input mapping between reviews and source units, obtain, e.g., build, a familiar machine learning set of correspondences, e.g., a table, with feature values for each unit and output class indicating whether that the respective first unit may have review, or may not have review. That is, the first node may be able to learn to output a first class, class 1 , indicating whether a particular unit of source code may have a review or not. Further, the first node may be able to learn to output a second class, class 2, indicating the review comment keyword associated with the unit of source code, for example, “80 characters”, “identifier name” and so on.

By obtaining the set of correspondences, the first node may be enabled to represent a unit of source code, e.g., a source code line, for machine learning to derive a predictive model of when a unit of source code may elicit a comment, using an anomaly detection approach.

By determining the first predictive model, the first node may be enabled to learn which units of source code, e.g., lines, may elicit a review comment, and thereby, once the first predictive model may have been trained, predict if a new unit of source code may elicit a review comment using the first predictive model, without needing that the new source code be actually reviewed.

By exploiting the insight that a code review comment may be understood to be an anomaly, and formulating the problem of determining whether source code may elicit a comment as an anomaly detection learning problem, the first node may enable to obtain a significantly higher number of training samples. That is, the first node may be enabled to not be restricted to the set of collected review comments alone, which may not be sufficient for learning. For example, using machine learning to predict which unit may have a review comment, e.g., 5 out of 100 units may be obtained as examples since the majority of the units may be understood to be well written and may not have a review comment. Using anomaly detection according to embodiments herein, all the 100 units may be used as examples to learn from. That is, the 95 units that did not have a review, may also help in learning and all may be utilized. Therefore, with the advantage of additional training samples, embodiments herein may be enabled to build better machine learning models that may yield better machine learning models that may be trained to predict code reviews. That is, machine learning models that may provide greater accuracy in predicting source code statements that may elicit a code review.

Further, the first node may also be enabled to then derive another predictive model to predict the actual review comment for a particular unit of source code that may have been predicted to receive a review comment.

By outputting the first indication, the first node may be understood to enable to improve the quality of the source code by identifying which unit of source code may elicit a comment without requiring actual review by a reviewer, and to perform this, in an expedited manner. Additionally, the first node may be enabled to then derive another predictive model to predict the actual review comment that a particular second unit of source code may elicit.

By obtaining the first indication from the first node, the second node may be enabled to use the indicated first predictive model to predict whether or not a second unit of the obtained one or more second units, that is, a new unit, in a source code is to elicit a comment by a reviewer, without needing that the new source code be actually reviewed.

By outputting the fourth indication to the fourth node indicating which one or more third units of the one or more second units of source code are predicted to elicit a respective comment, the second node may enable the fourth node to then apply a predictive model to predict which comment is predicted to be elicited, respectively, by one or more third units of source code, that is, new, unreviewed units of source code.

By the third node obtaining the second indication from the first node indicating which one or more fourth units, of one or more first units of source code, have elicited a respective comment by a reviewer, the third node is enabled to determine, e.g., train, the second predictive model to predict which comment is predicted to be elicited, respectively, by one or more third units of source code.

By the third node then outputting the determined second predictive model to the fourth node, the third node may enable the fourth node to apply the second predictive model to predict which comment is predicted to be elicited, respectively, by one or more third units of source code, that is, the new, unreviewed units of source code.

BRIEF DESCRIPTION OF THE DRAWINGS

Examples of embodiments herein are described in more detail with reference to the accompanying drawings, and according to the following description.

Figure 1 is a schematic diagram illustrating an embodiments of a computer system, according to embodiments herein.

Figure 2 is a flowchart depicting a method in a first node, according to embodiments herein. Figure 3 is a flowchart depicting a method in a second node, according to embodiments herein.

Figure 4 is a flowchart depicting a method in a third node, according to embodiments herein. Figure 5 is a flowchart depicting a method in a fourth node, according to embodiments herein. Figure 6 is a schematic diagram illustrating a non-limiting example of the method performed by the first node, according to embodiments herein.

Figure 7 is a schematic diagram illustrating another non-limiting example of some aspects of the method performed by the first node, according to embodiments herein.

Figure 8 is a schematic diagram illustrating another non-limiting example of some aspects of the method performed by the first node, according to embodiments herein.

Figure 9 is a schematic diagram illustrating another non-limiting example of some aspects of the method performed by the first node, according to embodiments herein. Figure 10 is a schematic block diagram illustrating two non-limiting examples, a) and b), of a first node, according to embodiments herein.

Figure 11 is a schematic block diagram illustrating two non-limiting examples, a) and b), of a second node, according to embodiments herein.

Figure 12 is a schematic block diagram illustrating two non-limiting examples, a) and b), of a third node, according to embodiments herein.

Figure 13 is a schematic block diagram illustrating two non-limiting examples, a) and b), of a fourth node, according to embodiments herein.

DETAILED DESCRIPTION

Certain aspects of the present disclosure and their embodiments may provide solutions to the challenges discussed in the Background and Summary sections. There are, proposed herein, various embodiments which address one or more of the issues disclosed herein.

As a summarized overview, embodiments herein may be understood to relate to automated code review.

Code review may be understood to be an important part of SQA and may be complementary to other SQA techniques such as testing and static analysis for bugs. Certain aspects may only be found via code review and hence it may be understood to be an inalienable part of QA. Typically, commercial software is well written, by and large, and hence most of the lines of code in a program will not have a code review comment associated with them. Thus, a code review comment may be understood to be an anomaly.

Embodiments herein may be understood to be based on the insight that code review comments may be provided for only a small subset of the overall units of source code, e.g., lines, in a project in a well written commercial software program. Accordingly, a code review comment may be understood to be an anomaly. Hence, embodiments herein may be understood to formulate the problem of prediction of whether a particular unit of source code may elicit a comment as an anomaly detection problem, which may be solved using techniques such as auto-encoders, one class Support Vector Machines (SVM) and so on. Embodiments herein may identify code features that may cause a reviewer to provide a review comment. Embodiments herein may then comprise nodes which may learn what code features may elicit a review comment from an existing mapping of code review comments and code. At runtime, embodiments herein may then predict whether a code line may elicit a review comment.

By exploiting the above insight and formulating the problem of determining whether source code may elicit a comment as an anomaly detection problem, embodiments herein may enable to obtain a number of training examples, that is, may enable to not be restricted to a set of collected review comments alone, which may not be sufficient for learning. Therefore, with the advantage of additional training samples, embodiments herein may yield a better machine learning model that may be trained to predict code reviews.

Some of the embodiments contemplated will now be described more fully hereinafter with reference to the accompanying drawings, in which examples are shown. In this section, the embodiments herein will be illustrated in more detail by a number of exemplary embodiments. Other embodiments, however, are contained within the scope of the subject matter disclosed herein. The disclosed subject matter should not be construed as limited to only the embodiments set forth herein ; rather, these embodiments are provided by way of example to convey the scope of the subject matter to those skilled in the art. It should be noted that the exemplary embodiments herein are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments.

Figure 1 is a schematic diagram depicting a computer system 100, in which embodiments herein may be implemented. Comprised in the computer system 100 is a first node 111 , which may perform a method according to embodiments herein. The first node 111 may be understood as a first computer system or server. In some embodiments, the first node 111 may also have a capability to train The computer system 100 also comprises a second node 112. The second node 112 may be understood to be a second computer system or server. The computer system 100 may also comprise a third node 113, a fourth node 114 and another node, such as a fifth node 115. In some examples, the another node may refer to any of the first node 111 , the second node 112, the third node 1 13 and the fifth node 115. The third node 1 13 may be understood to be a third computer system or server. The fourth node 1 14 may be understood to be a fourth computer system or server. The fifth node 115 may be understood to be a fifth computer system or server.

The first node 1 11 may have the capability to determine, e.g., derive, train or calculate, one or more mathematical models using machine learning, which may be stored, in a respective database or memory. In some embodiments, the first node 111 may also have a capability to execute the one or more machine learning models it may have trained. One of the one or more machine learning models may be based on a first machine-learning method of anomaly detection. Another of the one or more machine learning models may be based on a second machine-learning method which may be a k-nearest neighbours method.

The second node 112 may have the capability to execute the one of the one more machine learning models based on the first machine-learning method of anomaly detection. The third node 113 may have the capability to determine, e.g., derive, train or calculate, a mathematical model using machine learning, which may be stored, in a respective database or memory. The machine learning model may be based on the second machine-learning method which may be the k-nearest neighbours method.

The fourth node 114 may have the capability to execute the machine learning model based on the second machine-learning method which may be the k-nearest neighbours method.

Any of the first node 111 , the second node 112, the third node 113 and the fourth node 114 may be implemented as a standalone server in e.g., a host computer in the cloud. In other examples, any of the first node 111 , the second node 112, the third node 113 and the fourth node 114 may be a distributed node or distributed server, such as a virtual node in the cloud, and may perform some of its respective functions locally, e.g., by a client manager, and some of its functions in the cloud, by e.g., a server manager. In other examples, any of the first node 111 , the second node 112, the third node 113 and the fourth node 114 may perform its functions entirely on the cloud. Yet in other examples, any of the first node 111 , the second node 112, the third node 113 and the fourth node 114 may also be implemented as processing resource in a server farm. Any of the first node 111 , the second node 112, the third node 113 and the fourth node 114 may be under the ownership or control of a service provider or may be operated by the service provider or on behalf of the service provider.

In some examples, any of the first node 111 , the second node 112, the third node 113 and the fourth node 114 may be co-located or be the same node. Any of the first node 111 , the second node 112, the third node 113 and the fourth node 114 may be located in the cloud. In other examples, the first node 111 and the second node 112 may be located in separate geographical locations.

It may be understood that the computer system 100 may comprise additional nodes than those depicted in Figure 1.

The capabilities and functions of each of these nodes will be described later, along with the description of the method performed by the first node 111 , the second node 112, the third node 113 and the fourth node 114.

The first node 111 may communicate with the second node 112 over a first link 121 , e.g., a radio link, or a wired link. The second node 112 may communicate with the third node 113 over a second link 122, e.g., a radio link, or a wired link. The third node 113 may communicate with the fourth node 114 over a third link 123, e.g., a radio link, or a wired link. The fourth node 114 may communicate with the fifth node 115 over a second link 124, e.g., a radio link, or a wired link. The first node 111 may communicate with the fourth node 114 over a fifth link 125, e.g., a radio link, or a wired link. The fourth node 114 may communicate with the second node 1 12 over a sixth link 126, e.g., a radio link, or a wired link. The first node 111 may communicate with the third node 1 13 over a seventh link 127, e.g., a radio link, or a wired link.

Any of the first link 121 , the second link 122, the third link 123, the fourth link 124, the fifth link 125 and the sixth link 126 may be a direct link or may be comprised of a plurality of individual links, wherein it may go via one or more computer systems or one or more core networks, which are not depicted in Figure 1 , or it may go via an optional intermediate network. The intermediate network may be one of, or a combination of more than one of, a public, private or hosted network; the intermediate network, if any, may be a backbone network or the Internet; in particular, the intermediate network may comprise two or more sub-networks, which is not shown in Figure 1 .

In general, the usage of “first”, “second”, “third”, “fourth”, “fifth” and/or “sixth” herein may be understood to be an arbitrary way to denote different elements or entities, and may be understood to not confer a cumulative or chronological character to the nouns they modify, unless otherwise noted in the text.

Generally, all terms used herein are to be interpreted according to their ordinary meaning in the relevant technical field, unless a different meaning is clearly given and/or is implied from the context in which it is used. All references to a/an/the element, apparatus, component, means, step, etc. are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any methods disclosed herein do not have to be performed in the exact order disclosed, unless a step is explicitly described as following or preceding another step and/or where it is implicit that a step must follow or precede another step. Any feature of any of the embodiments disclosed herein may be applied to any other embodiment, wherever appropriate. Likewise, any advantage of any of the embodiments may apply to any other embodiments, and vice versa. Other objectives, features and advantages of the enclosed embodiments will be apparent from the following description.

Several embodiments are comprised herein. It should be noted that the examples herein are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments.

In the following description, that an action may be optional may be understood to mean that the action may be not necessary and/or that it may be performed elsewhere, e.g., by a different node, for example in the computer system 100. Embodiments of a computer-implemented method, performed by the first node 1 11 , will now be described with reference to the flowchart depicted in Figure 2. The method may be understood to be for handling source code. The first node 111 may be operating in the computer system 100.

The method may comprise the actions described below. In some embodiments some of the actions may be performed. In some embodiments, all the actions may be performed. In Figure 2, optional actions are indicated with dashed boxes. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. It should be noted that the examples herein are not mutually exclusive. Components from one example may be tacitly assumed to be present in another example and it will be obvious to a person skilled in the art how those components may be used in the other examples.

Action 201

In this Action 201 , the first node 111 obtains one or more first units of source code, one or more respective comments elicited during a review of the one or more first units of source code, and a mapping between the one or more first units of source code and the one or more respective comments.

The one or more first units may be understood as a training set of units, for machine learning purposes. Each of the units of source code may be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method. Each of the units of source code may be a class, a class member or a class field in examples wherein, e.g., the source code may be in an object oriented programming language. In some particular examples, a unit of source code may be a line of source code.

It may be understood that not all of the one or more first units of source code may have elicited a comment. In the expression “one or more respective comments”, respective may be understood to denote that one or more comments may be elicited in reference to a particular first unit of source code.

Obtaining may be understood as e.g., collecting, recording, retrieving, gathering, and/or receiving. The obtaining may be, for example, from one or more databases, e.g., a source code repository, a review comment repository, comprised in, or accessible by, the first node 111.

The one or more first units may be obtained in a file of source code. For example, in this Action 201 , the first node 11 1 may obtain all the source code files for a project, review comments for the source code files, and a mapping between the review comments and the source code lines which they may address

The obtaining may be online, periodic, or both. The one or more respective comments may be in natural language. Examples of the one or more respective comments may be, e.g.,: “Avoid multiple threads if possible as it may cause deadlock in production”, “Use Google GSON library instead of Jackson”, “Use logging instead of console write”, “Avoid library XYZ as it has a GPL license which cannot be used in commercial software”, “Please provide a more descriptive name for the method/field/local variable”, “Please ensure line text is within 80 characters”, “Refactor the method to make it shorter”, etc..

Embodiments herein may be understood to be programming language agnostic. The source code and comments may be written in any programming language.

By obtaining the one or more first units of source code, the one or more respective comments and the mapping in this Action 201 , the first node 111 may then be enabled to derive a predictive model of when a unit of source code may elicit a comment.

Action 202

Each of the one or more first units of source code may have first respective features. The first respective features may be understood to be machine learning features. The one or more first respective features may be configurable by a source code expert, based on experience. For example, in embodiments wherein the first unit may be a line, any of the respective features, e.g., the one or more first respective features and/or one or more second respective features which will be described later, may comprise one or more of: a) a statement type of a respective first line, e.g., assignment, variable declaration, if statement, etc., b) length of the unit, e.g., a length of the respective first line, c) one or more libraries used in the respective first unit, e.g., respective first line, d) library name used, e.g., specific library names used in the first unit, such as for example, Google GSON, and e) a length of a respective identifier used in the respective first unit, e.g., respective first line, e.g., for each identifier used on the line. Another example, if the unit is a method declaration, may be method length.

In this Action 202, the first node 11 1 obtains one or more first respective values of the one or more first respective features characterizing the one or more first units. The one or more first respective features have been extracted from each first unit of the obtained one or more first units. In some examples, this Action 202, the first node 111 may itself extract, from each first unit of the obtained one or more first units, the one or more first respective values of the one or more first respective features characterizing the one or more first units. For example, the first node 111 may, in this Action 202, extract feature values for the machine learning features described above for each line in the input source files.

In other examples, the first node 111 may receive the one or more respective values from a different node, which may have performed the extraction of the one or more first respective features. Any of the respective features, e.g., the one or more first respective features, may be one of: pre-configured and automatically extractable.

By obtaining the one or more first respective values of the one or more first respective features in this Action 202, the first node 111 may then be enabled to, using the above extracted feature values and the input mapping between reviews and source units, build a familiar machine learning set of correspondences, e.g., a table, with feature values for each unit, e.g., line, and output class indicating whether that the respective first unit may have review, or may not have review. That is, the first node 111 may be able to learn to output a first class, class 1 , indicating whether a particular unit of source code may have a review or not. Further, the first node 111 may be able to learn to output a second class, class 2, indicating the review comment keyword associated with the unit of source code, for example, “80 characters”, “identifier name” and so on.

Action 203

In this Action 203, the first node 111 , obtains a set of correspondences between each unit of the obtained one or more first units and a respective subset of one or more first respective values and a respective class. The respective class has been determined based on the respective subset of one or more first respective values. The set of correspondences may be based on the obtained mapping and the obtained one or more first respective values.

In some examples, in this Action 203, the first node 111 may generate itself, based on the obtained mapping and the extracted one or more first respective values, the set of correspondences between each unit of the obtained one or more first units and the respective subset of one or more first respective values and the respective class. The respective class may have been determined by the first node 111 based on the respective subset of one or more first respective values.

In other examples, the first node 111 may receive the set of correspondences from a different node, which may perform the generation of the set of correspondences. The respective class may have been determined from the stored information in the one or more databases, e.g., a source code repository, the first node 111 may have obtained the one or more first units from.

The set of correspondences may be a machine-learning table, which may be also referred to as a machine learning feature table. A non-limiting example of such a table may be as depicted in Table 1 . It may be noted that not all features are shown, only a sample of the features are shown in Table 1 .

Table 1.

By obtaining the set of correspondences in this Action 203, the first node 111 may be enabled to represent a unit of source code, e.g., a source code line, for machine learning to derive a predictive model of when a unit of source code may elicit a comment, using an anomaly detection approach.

Action 204

In this Action 204, the first node 111 determines, using a first machine-learning method of anomaly detection, and based on the obtained set of correspondences, a first predictive model. The first predictive model is to predict whether or not a second unit in a source code, that is, a new second unit of source code, is to elicit a comment by a reviewer. The reviewer may be understood to be a person, e.g., a technical expert in programming or a machine.

Determining may be understood as building, generating, calculating, training, etc...

The first machine-learning method of anomaly detection may be one of an auto-encoder, e.g., AutoEncoder, and a one class support vector machine (SVM).

The determining in this Action 204 of the first predictive model may comprise a testing phase with a first group of samples, and an execution phase with a second group of samples. The two phases may be iterated with new samples until a certain level of acceptable accuracy may be achieved.

By determining the first predictive model in this Action 204, the first node 111 may be enabled to learn which units of source code, e.g., lines, may elicit a review comment, and thereby, once the first predictive model may have been trained, predict if a new unit of source code, e.g., a line, may elicit a review comment using the first predictive model, without needing that the new source code be actually reviewed.

By exploiting the insight that a code review comment may be understood to be an anomaly, and formulating the problem of determining whether source code may elicit a comment as an anomaly detection learning problem, the first node 111 may enable to obtain a significantly higher number of training samples, that is, the first node 111 may be enabled to not be restricted to the set of collected review comments alone, which may not be sufficient for learning. Therefore, with the advantage of additional training samples, embodiments herein may be enabled to build better machine learning models that may yield better machine learning models that may be trained to predict code reviews. That is, machine learning models that may provide greater accuracy in predicting source code statements that may elicit a code review.

Further, the first node 111 may also be enabled to then derive another predictive model to predict the actual review comment for a particular unit of source code that may have been predicted to receive a review comment, as will be explained later.

Action 205

Once the first node 111 may have determined the first predictive model, e.g., trained it to obtain an acceptable level of accuracy, e.g., an acceptable level of predictive error, the first predictive model may be used, by the first node 111 or another network node, to make actual predictions.

In this Action 205, the first node 111 may obtain, one or more second units of source code. That is, the first node 111 may obtain or receive, a fresh batch of units of source code, which may not have been yet reviewed, to predict if they may elicit or not, one or more comments by a reviewer. The first node 111 mat therefore be tasked with predicting whether each of the one or more second units of source code may be predicted to result in a comment, or not. The one or more second units may be understood as a test set of units, that is, a set of units to implement, execute or use the first predictive model, once it may have been trained.

Action 206

In this Action 206, the first node 111 may obtain one or more second respective values of one or more second respective features characterizing the one or more second units, in a similar way as it was described for Action 202. The one or more second respective features have been obtained from each second unit of the obtained one or more second units

In some examples, in this Action 206, the first node 111 may itself extract, from each second unit of the obtained one or more second units, the one or more second respective values of the one or more second respective features characterizing the one or more second units. The one or more second respective features may be understood to not comprise, that is, to exclude which comment may have been elicited by a reviewer, as the one or more second units may be understood not have been reviewed. In other examples, the first node 111 may receive the one or more second respective values from a different node, which may have performed the extraction.

The obtained one or more second respective values of the one or more second respective features characterizing the one or more second units may then be used as input to the first predictive model, once trained, in order to predict whether or not each second unit in the one or more second units of source code is to elicit a respective comment by a reviewer.

Action 207

In this Action 207, the first node 11 1 may determine, using the first predictive model and the obtained, e.g., extracted, one or more second respective values, whether or not each second unit in the one or more second units of source code is to elicit a respective comment by a reviewer.

Determining may be understood as calculating, deriving or similar.

By determining whether or not each second unit in the one or more second units of source code may be predicted to elicit a respective comment by a reviewer using the first predictive model in this Action 207, the first node 11 1 may then be enabled to predict whether or not each second unit in the one or more second units of source code may elicit a respective comment by a reviewer with higher accuracy. Further, the first node 111 may also be enabled to then derive another predictive model to predict the actual review comment for a particular second unit of source code, e.g., second line that may have been predicted to receive a review comment, as will be explained later.

Action 208

In this Action 208, the first node 11 1 outputs, to the second node 112 operating in the computer system 100, a first indication of the first predictive model.

Outputting may be understood as e.g., providing, for example, in examples wherein the first node 11 1 may the same node as the second node 112, or sending or transmitting, e.g., via the first link 121.

In some embodiments, the first indication may explicitly indicate the first predictive model.

In some embodiments, the first indication may indicate which one or more third units of the one or more second units of source code may be predicted to elicit a respective comment by the reviewer.

The first indication may be for example, in the form of a stored model that may be serialized or deserialized, e.g., pickle format.

By outputting the first indication in this Action 208, the first node 111 may be understood to enable to improve the quality of the source code by identifying which unit of source code may elicit a comment without requiring actual review by a reviewer, and to perform this, in an expedited manner. Additionally, the first node 111 may be enabled to then derive another predictive model to predict the actual review comment that a particular second unit of source code may elicit, as will be explained later.

Action 209

Once a unit of source code may have been predicted to elicit a review comment, the first node 111 may need to additionally predict the actual review comment text, or at least the key phrases. To ultimately enable to determine that prediction, the first node 111 may gather the rows, e.g., data, in the obtained set of correspondences, that is, the machine learning feature value table which may have review comments, out of the one or more first units of source code, that is, out of the training set or corpus. The first node 111 may then need to process the obtained one or more respective comments. This may be understood to be because different reviewers may have different ways of suggesting similar review comments. For example, one reviewer may write “change the log level to debug”, while another may write, “modify log level”. The first node 111 may need to unify such reviews, as they may be understood to be similar. The first node 111 may also need to group the different reviews so that the first node 111 may suggest and/or generate the appropriate review comment. Hence the first node 111 may use a pre-processing step, as described in this Action 209.

In this Action 209, the first node 111 may process the obtained one or more respective comments, e.g., “change log level to error”, “modify log level here”, “use a descriptive identifier name”, “use a meaningful variable name”, etc...

For performance of this Action 209, the one or more respective comments may comprise the comments elicited by a subset of the one or more first units of source code. This subset may be referred to herein as the one or more fourth units of source code.

The processing in this Action 209 may comprise generating embeddings for each comment to normalize different word usages. An embedding may be understood as a representation of words for text analysis, typically in the form of a real-valued vector that may encode the meaning of the word ,such that the words that may be closer in the vector space may be expected to be similar in meaning. For generating the embeddings, Word2Vec or Glove or any other similar method may be used. For generating the embeddings, the first node 111 may use as input the one or more respective comments.

The processing in this Action 209 may further comprise reducing high dimensionality using a principal component analysis for clustering the normalized one or more respective comments. Dimensionality may be understood as the minimum number of coordinates that may be needed to specify any point within a mathematical space or object. It may be understood that adding more details may increase dimensionality and the accuracy of representation, but may also increase computational costs. The first node 111 may reduce the high dimensionality of Word2Vec, e.g., using Principal Component Analysis, for better clustering.

The processing in this Action 209 may then comprise clustering the normalized one or more respective comments using K-Means. The clustering may be performed after the dimensionality may have been reduced, in embodiments wherein this may have been found to be necessary, which may be understood to not be every case. The optimal value of K may be found using the elbow method.

The processing in this Action 209 may additionally comprise outputting an indication of clusters of similar comments to be used as input for determining a second predictive model, as will be described in Action 210. For example, based on the examples of input comments provided above, a first cluster may be “Change log level”, a second cluster may be “Use descriptive identifier name”, etc.

By processing the obtained one or more respective comments in this Action 209, the first node 111 may group the units of source code with similar code reviews into clusters. This may then enable that ultimately, given a second unit of source code which may have been classified as requiring a review in Action 207, e.g., by the OneClassSVM or other anomaly detection embodiments, that is, give a third unit of source code, the first node 111 may then be enabled to use a Machine Learning Algorithm such as the K-Nearest Neighbour (KNN) to find the most similar units of source code to it. The review comment for these units of source code may then be selected as the review comment for a third unit of source code which may have been predicted to elicit a comment.

Action 210

In this Action 210, the first node 111 may determine, using a second machine-learning method, and based on one or more fourth units of source code out of the one or more first units of source code which may have elicited a respective comment by the reviewer, a second predictive model. The second predictive model may be understood to be to predict which comment may be predicted to be elicited, respectively, by the one or more third units. That is, the training of the second predictive model may be with units of source code which may have been reviewed by a reviewer, so that in the future, the review comments which may be elicited by elicited by unreviewed units of source code may be predicted.

The processing of Action 209 may be performed prior to the determining in this Action 210 of the second predictive model. That is, Action 210 may be performed once the processing of Action 209 for the actual review comment generation may have been completed. The determining in this Action 210 may be based on the one or more first respective values of one or more first respective features characterizing the one or more first units, as obtained in Action 202, particularly, the one or more first respective values of the one or more first respective features characterizing the one or more fourth units. In other words, the first node 111 may have extracted, for each unit in the corpus which may have a source code review comment, the machine learning features for the source unit as described in Action 202.

The second machine-learning method may be a k-nearest neighbours (KNN) method.

Action 211

In this Action 211 , in order to generate and/or predict the actual review comment, the first node 111 may determine, using the second predictive model, which respective comment may be elicited by each third unit of the one or more third units of source code predicted to elicit a respective comment by the reviewer, as may have been determined in Action 207. In other words, the input to the second predictive model, once trained to reach an acceptable level of accuracy, may be the one or more third units of source code which may have been predicted to elicit a code review comment in the prediction made by the first predictive model described earlier. As mentioned earlier, these one or more third units may be understood to not have been actually manually reviewed by a reviewer.

In order to perform this Action 211 , the first node 111 may use, for each third unit of the obtained one or more third units, one or more third respective values of one or more third respective features characterizing the one or more third units, as obtained, e.g., extracted in Action 206. The one or more third respective features may be understood to not comprise, that is, to exclude which comment may have been elicited by a reviewer. The obtained one or more third respective values of one or more third respective features characterizing the one or more third units may then be used in the second predictive model in order to predict which respective comment may be elicited by each third unit by the reviewer.

Given a third unit of source code line which may have been predicted to receive a review comment in Action 207, the first node 111 may, in this Action 211 , find the most similar units of source code to it. The first node 111 may, for this purpose, use the same features as obtained, e.g., extracted, before from each second unit of source code to predict if the second units of source code may be predicted to have a review comment. With the second machine learning method, e.g., KNN, or any ML algorithm, the nearest or most similar fourth units of source code to each of the new, unreviewed, one or more third units of source code may be found. In other words, the first node 111 may find the source unit, e.g., feature vector, in the corpus of one or more fourth units of source code which may be nearest to the input source unit feature vector. That is, the first node 111 may find the data point closest, that is, most similar, to the one or more third respective values of one or more third respective features of the input third unit of source code. The comment for of the nearest neighbor may then be selected as the review comment for the new unit of source code, that is, the third unit of source code.

Action 212

In this Action 212, the first node 111 may output at least one of the following. According to a first option, the first node 111 may output a second indication indicating the one or more fourth units of source code to the third node 113. According to a second option, the first node 111 may output a third indication of the second predictive model to another node operating in the computer system 100, for example, the fourth node 114 and/or the fifth node 115.

As explained before, outputting may be understood as e.g., providing, for example, in examples wherein the first node 111 may the same node as the third node 113 or the another node, or sending or transmitting, e.g., via the seventh link 127 to the third node 113, and/or via the fifth link 125 to the fourth node 114.

The third indication may indicate which comment may be predicted to be respectively elicited by each third unit of the one or more third units, e.g., when outputted on the first node 111.

The third indication may be for example, in the form of the actual review comment text for a source line. That is, the review comment associated with the nearest neighbour. In another example, the third indication may be one or more key phrases denoting the review comment.

In some embodiments, the first node 111 may train the first predictive model and the second predictive model, and also run the first predictive model and the second predictive model to make predictions once the predictive models may have been trained. In other embodiments, the first node 111 may train the first predictive model, and then, once the first predictive model may have a certain degree of accuracy, the first node 111 may send it to the second node 112 to have the second node 112 make the actual predictions using the first predictive model.

Similarly, the second predictive model may be trained by a different node, such as the third node 113. The third node 113 may then, once the second predictive model may have been trained, send it to the fourth node 114 to have the fourth node 114 make the actual predictions using the second predictive model. Each of the second node 112, the third node 113 and the fourth node 114 may perform their corresponding tasks, as described in relation to Figure 3, Figure 4 and Figure 5, respectively, in a similar manner to that described for the first node 111. Embodiments of a computer-implemented method performed by the second node 112, will now be described with reference to the flowchart depicted in Figure 3. The method may be understood to be for handling source code. The second node 112 may operate in the computer system 100.

The method comprises the following actions. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. It should be noted that the examples herein are not mutually exclusive. Components from one example or embodiment may be tacitly assumed to be present in another example or embodiment, and it will be obvious to a person skilled in the art how those components may be used in the other examples.

The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 111 and will thus not be repeated here to simplify the description. For example, each of the units of source code may be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method.

Action 301

In this Action 301 , the second node 112 obtains, from the first node 111 operating in the computer system 100, the first indication of the first predictive model of anomaly detection. The first predictive model is to predict whether or not a second unit in a source code is to elicit a comment by a reviewer.

The obtaining, e.g., receiving, of the first indication may be, for example, via the first link 121.

In some embodiments, the first indication may explicitly indicate the first predictive model.

The first machine-learning method of anomaly detection may be one of an auto-encoder, and a one class support vector machine (SVM).

Action 302

In this Action 302, the second node 112 obtains the one or more second units of source code. That is, new units of source code which may not have been manually reviewed by a reviewer.

Obtaining may be understood as e.g., collecting, recording, retrieving, gathering, and/or receiving. The one or more seconds units may be obtained in a file of source code. Action 303

In this Action 303, the second node 112 obtains, from each second unit of the obtained one or more second units, the one or more second respective values of the one or more second respective features characterizing the one or more second units. Obtaining in this Action 303 may comprise extracting of the one or more second respective values by the second node 112 itself, or receiving the one or more second respective values by another node which may have performed the extraction, such as from the first node 111.

The one or more second respective features may comprise one or more of: a statement type of a respective first line, a length of the respective first line, one or more libraries used in the respective first line, library name used, and a length of a respective identifier used in the respective first line.

The one or more second respective features may be one of: pre-configured and automatically extractable.

Action 304

In this Action 304, the second node 112 determines, using the first predictive model and the obtained one or more second respective values, whether or not each second unit in the one or more second units of source code is to elicit a respective comment by a reviewer.

Determining may be understood as calculating, deriving or similar. The determining performed in this Action 304 may be understood as an execution of the first predictive model for a particular instance or use case.

Action 305

In this Action 305, the second node 112 outputs, to the fourth node 114 configured to operate in the computer system 100, a fourth indication. The fourth indication indicates which one or more third units of the one or more second units of source code are predicted to elicit a respective comment by the reviewer.

The respective comments may be in natural language.

Outputting may comprise sending, e.g., via the sixth link 126.

Embodiments of a computer-implemented method, performed by the third node 113, will now be described with reference to the flowchart depicted in Figure 4. The method may be understood to be for handling source code. The third node 113 may be operating in the computer system 100.

The method may comprise the actions described below. In some embodiments some of the actions may be performed. In some embodiments, all the actions may be performed. In Figure 4, optional actions are indicated with dashed boxes. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. It should be noted that the examples herein are not mutually exclusive.

Components from one example may be tacitly assumed to be present in another example and it will be obvious to a person skilled in the art how those components may be used in the other examples.

The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 1 11 and will thus not be repeated here to simplify the description. For example, each of the units of source code may be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method.

Action 401

In this Action 401 , the third node 113, obtains, from the first node 1 11 operating in the computer system 100, the second indication. The second indication indicates which one or more fourth units, of the one or more first units of source code, have elicited a respective comment by a reviewer.

Obtaining may be understood as receiving e.g., via the seventh link 127.

Action 402

In this Action 402, the third node 113 may obtain, from each first unit of the one or more first units, the one or more first respective values of the one or more first respective features characterizing the one or more first units. For example, the third node 1 13 may, in this Action 402, extract itself the one or more first respective features. Alternatively, the third node 1 13 may receive the one or more first respective values from a different node which may have performed the extraction, e.g., the first node 11 1.

The third node 113 may also obtain, e.g., generate, a set of correspondences, e.g., a machine-learning table, between each unit of the obtained one or more first units and a respective subset of one or more first respective values and a respective class, in a similar manner as described in Action 203 for the first node 1 11.

For examples in which the third node 113 may extract the one or more first respective values itself, the third node 113 may have obtained the one or more first units and/or the one or more fourth units, e.g., in Action 401 . Action 403

In this Action 403, the third node 113 may process the obtained one or more respective comments elicited during the review of the one or more first units of source code, which may be understood to include the one or more fourth units.

The processing in this Action 403 may comprise: 1) generating the embeddings for each comment to normalize different word usages, ii) reducing the high dimensionality using the principal component analysis for clustering the normalized one or more respective comments, iii) clustering the normalized one or more respective comments using K-Means, and iv) outputting the indication of clusters of similar comments to be used as input for determining the second predictive model, as will be described in Action 404.

Action 404

In this Action 404, the third node 113, determines, using the second machine-learning method, and based on the one or more fourth units of source code which, out of the one or more first units of source code, have elicited a respective comment by the reviewer, the second predictive model. The second predictive model is to predict which comment is predicted to be elicited, respectively, by the one or more third units. That is, by those new units of source code that may not have yet been reviewed, and which may be predicted, out of the one or more second units of source code, to elicit a comment, e.g., based on the first predictive model.

The one or more respective comments may be in natural language.

The second machine-learning method may be the k-nearest neighbours (KNN) method.

The processing of Action 403 may be performed prior to the determining in this Action 404 of the second predictive model.

The determining in this Action 404 may be based on the one or more first respective values of the one or more first respective features characterizing the one or more first units, as obtained in Action 402, particularly, the one or more first respective values of the one or more first respective features characterizing the one or more fourth units.

Action 405

In this Action 405, the third node 113 outputs, to the fourth node 114 operating in the computer system 100, a fifth indication of the second predictive model. The fifth indication may then be understood to enable the fourth node 114 to execute the trained second predictive model with new units of source code, e.g., the one or more third units of source code, to perform actual predictions, that is, predictions not for training purposes. The fifth indication may explicitly indicate the second predictive model.

Embodiments of a computer-implemented method performed by the fourth node 114, will now be described with reference to the flowchart depicted in Figure 5. The method may be understood to be for handling source code. The fourth node 114 may operate in the computer system 100.

The method comprises the following actions. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. It should be noted that the examples herein are not mutually exclusive. Components from one example or embodiment may be tacitly assumed to be present in another example or embodiment, and it will be obvious to a person skilled in the art how those components may be used in the other examples.

The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 111 and will thus not be repeated here to simplify the description. For example, each of the units of source code may be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method.

Action 501

In this Action 501 , the fourth node 114, obtains, from the third node 113 operating in the computer system 100, the fifth indication. The fifth indication indicates the second predictive model to predict which comment is predicted to be elicited, respectively, by the one or more third units of source code. That is, new units of source code that may not yet have been manually reviewed by a reviewer, but which, may have been predicted to elicit a comment, e.g., by the first predictive model.

Obtaining may be understood as receiving e.g., via the third link 123.

Alternatively, the fourth node 114 may obtain the third indication from the first node 111 , e.g., via the fifth link 125.

The second machine-learning method may be the k-nearest neighbours method.

Action 502

In this Action 502, the fourth node 114, obtains, from the second node 112 operating in the computer system 100, the fourth indication. The fourth indication indicates which one or more third units, of the one or more second units of source code, are predicted to elicit a respective comment by a reviewer. Obtaining may be understood as receiving e.g., via the sixth link 126.

Action 503

In this Action 503, the fourth node 114 determines, using the second predictive model, which respective comment will be elicited by each third unit of the one or more third units of source code predicted to elicit a respective comment by the reviewer. That is, the fourth node 114 executes the second predictive model with the new units of source code received, in a similar manner as to how it was described in Action 210.

Action 504

In this Action 504, the fourth node 114 outputs, e.g, itself, or to the third node 113 or to another node 111 , 112, 115 operating in the computer system 100, a sixth indication. The sixth indication indicates which comment is predicted to be respectively elicited by each third unit of the one or more third units.

The sixth indication may be understood to be equivalent to the third indication, but as output by the fourth node 114. The sixth indication may indicate which comment may be predicted to be respectively elicited by each third unit of the one or more third units.

The comment predicted to be respectively elicited by each third unit is in natural language.

Figure 6 is a flowchart illustrating a non-limiting example of the method that may be performed by the first node 111 according to embodiments herein, wherein the units of source code are lines. As depicted in the chart, according to Action 201 , the first node 111 may then obtain source code files for a project, review comments for the source code files and mapping between review comments, which may then be used as input data to train the first predictive model. According to Action 202, the first node 111 may then extract feature values for the features for each unit in the input source files. Next, using the above extracted feature values and the input mapping between reviews and source units, the first node 111 may, according to Action 203, build the familiar machine learning table with feature values for each line and output class, e.g., has review/does not have review. The first node 111 may then, using the above data, in accordance with Action 204, use one class SVM, or other embodiments such as AutoEncoder, to learn which units will elicit a review comment and build a model. Using the above model, the first node 111 may then, according to Action 207, predict if a code line will elicit a review comment. The first node 111 may then, according to Action 211 use another model to predict the actual review comment for a code line that has been predicted to receive a review comment. Figure 7 is a schematic diagram illustrating a non-limiting example of some aspects of the method performed by the first node 111 As depicted in Figure 7, according to Action 201 , the first node 111 may obtain source code files for a project 701 and review comments for the source code files mapped to code units 702. According to Action 202, the first node 111 may then extract feature values for the features for each unit in the input source files. In this example, the units are lines. Next, using the above extracted feature values and the input mapping between reviews and source units, the first node 111 may, according to Action 203, build the familiar machine learning table 703 with feature values for each line and output class, e.g., has review/does not have review. The output 704 of this analysis is depicted in the graphical representation at the bottom of Figure 7, wherein the vertical axis represents a first feature, e.g., number of libraries used in a line, and the horizontal axis represents a second feature, e.g., length of the line. Solid dots represent units with no code review and white dots represent source units with code review. As may be appreciated in this example, source units with code review are an anomaly.

Figure 8 is another flowchart illustrating a non-limiting example of some aspects of the processing Action 209 that may be performed by the first node 111 , or the third node 113, according to embodiments herein. As depicted in the chart, for the processing of Action 209, In this Action 209, the first node 111 may process the obtained one or more respective comments using as input all existing reviews, e.g., “change log level to error”, “modify log level here”, “use a descriptive identifier name”, “use a meaningful variable name”, etc... Then, the first node 111 may, according to Action 209. i, generate Word embeddings for each review comment to normalize different word usages, that is, synonyms, using Word2Vec or Glove or any other. The first node 111 may then, according to Action 209. ii, reduce the high dimensionality of Word2Vec, e.g., using Principal Component Analysis, for better clustering. The first node 111 may then, according to Action 209. iii, cluster the normalized one or more respective comments using K-Means clustering. The optimal value of K may be found using the elbow method.

The first node 111 may then, according to Action 409. iv, output clusters of similar review comments, such as “Cluster 1 : Change log level”, “Cluster 2: Use descriptive identifier name”, etc.

Figure 9 is a schematic diagram illustrating a non-limiting example of how the first node 111 may, once the processing Action 209 for an actual review comment generation may have been completed, generate/predict the actual review comment which may elicited by new units of source code, according to Action 210. In this non-limiting example, the units are lines. The processing Action 209 may have grouped into clusters code lines with similar code reviews. The clusters are represented in Figure 9 by Figures of different shapes, filled with a solid or patterned black color. For example, the rectangle black shapes 901 in Figure 9, represent code lines that have the review comment “Use GSON instead of Jackhaus library”, the dotted circles 902 have the review comment “Handle exception”, the striped triangles 903 represent code lines that have the review comment “Change log level”, and the striped diamond shapes 904 represent code lines that have the review comment “Use more descriptive identifier name”. Now, given a code line, the white rectangle 905, which has been classified as requiring a review by the OneClassSVM or other anomaly detection embodiments, the first node 111 may use a Machine Learning Algorithm such as the KNN to find the most similar code lines to it. For KNN, the same features as extracted before from the code line to predict if the code line should have a review comment may be used. With KNN, or any ML algorithm, the nearest or most similar code lines may be found, in this case, the other rectangles in Figure 9, as code lines falling withing a K Nearest Neighbor boundary 906. The review comment for these lines may then be selected as the review comment for the new line of code.

Figure 10 depicts an example of the arrangement that the first node 111 may comprise to perform the method described in Figure 2, and/or Figures 6-9 in some embodiments. The first node 111 may be configured to operate in the computer system 100. The first node 111 may be understood to be for handling source code.

Several embodiments are comprised herein. It should be noted that the examples herein are not mutually exclusive. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. In Figure 10, an optional unit is indicated with a dashed box.

The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the first node 111 and will thus not be repeated here. For example, each of the units of source code may be configured to be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method.

The first node 111 is configured to, e.g., by means of an obtaining unit within the first node 111 , obtain the one or more first units of source code, the one or more respective comments configured to be elicited during the review of the one or more first units of source code, and the mapping between the one or more first units of source code and the one or more respective comments. The first node 111 is further configured to, e.g., by means of the obtaining unit within the first node 111 , obtain the one or more first respective values of the one or more first respective features configured to characterize the one or more first units. The one or more first respective features are configured to be extracted from each first unit of the one or more first units configured to be obtained.

The first node 111 is also configured to, e.g., by means of the obtaining unit within the first node 111 , obtain, e.g., generate the set of correspondences between each unit of the one or more first units configured to be obtained and the respective subset of one or more first respective values and the respective class. The respective class is configured to have been determined based on the respective subset of the one or more first respective values. The set of correspondences is configured to be based on the mapping configured to be obtained and the one or more first respective values configured to be obtained. The respective class may be configured to have been determined by the first node 111 based on the respective subset of the one or more first respective values.

The first node 111 is further configured to, e.g., by means of a determining unit within the first node 111 , determine, using the first machine-learning method of anomaly detection, and based on the set of correspondences configured to be obtained, the first predictive model to predict whether or not the second unit in the source code is to elicit a comment by a reviewer.

The first node 111 is additionally configured to, e.g., by means of an outputting unit within the first node 111 , output, to the second node 112 configured to operate in the computer system 100, the first indication of the first predictive model.

In some embodiments, the first node 111 may be further configured to, e.g., by means of the obtaining unit within the first node 111 , obtain the one or more second units of source code.

In some embodiments, the first node 111 may be further configured to, e.g., by means of the obtaining unit within the first node 111 , obtain, e.g., extract the one or more second respective values of the one or more second respective features configured to characterize the one or more second units. The one or more second respective features are configured to be extracted from each second unit of the one or more second units configured to be obtained.

In some embodiments, the first node 111 may be further configured to, e.g., by means of the determining unit, determine, using the first predictive model and the one or more second respective values configured to be obtained, whether or not each second unit in the one or more second units of source code is to elicit a respective comment by a reviewer.

The first indication may be configured to indicate which one or more third units of the one or more second units of source code may be configured to be predicted to elicit a respective comment by the reviewer. In some embodiments, the first node 111 may be further configured to, e.g., by means of the determining unit, determine, the a second machine-learning method, and based on the one or more fourth units of source code which, out of the one or more first units of source code, may be configured to have elicited a respective comment by the reviewer, may be second predictive model. The second predictive model may be understood to be to predict which comment may be predicted to be elicited, respectively, by the one or more third units.

In some embodiments, the first node 111 may be further configured to, e.g., by means of the outputting unit within the first node 111 , output, at least one of: i) the second indication configured to indicate the one or more fourth units of source code to the third node 113, and ii) the third indication of the second predictive model to another node 114, 115 configured to operate in the computer system 100.

In some embodiments, the first node 111 may be further configured to, e.g., by means of a processing unit within the first node 111 , process, prior to determining the second predictive model, the one or more respective comments configured to be obtained. The processing may be configured to comprise: i) generating embeddings for each comment to normalize different word usages, ii) reducing high dimensionality using the principal component analysis for clustering the normalized one or more respective comments, iii) clustering the normalized one or more respective comments using K-Means, and iv) outputting the indication of clusters of similar comments to be used as input for determining the second predictive model.

In some embodiments, the first node 111 may be further configured to, e.g., by means of the determining unit within the first node 111 , determine, using the second predictive model, which respective comment may be to be elicited by each third unit of the one or more third units of source code predicted to elicit a respective comment by the reviewer.

The third indication may be configured to indicate which comment may be predicted to be respectively elicited by each third unit of the one or more third units.

In some embodiments, at least one of the following options may apply. According to a first option, each of the units of source code may be configured to be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method. According to a second option, the first indication may be configured to explicitly indicate the first predictive model. According to a third option, the one or more first units may be configured to be obtained in a file of source code. According to a fourth option, the one or more respective comments may be configured to be in natural language. According to a fifth option, any of the respective features may be configured to comprise one or more of: the statement type of the respective first line, the length of the respective first line, the one or more libraries used in the respective first line, the library name used, and the length of a respective identifier used in the respective first line. According to a sixth option, any of the respective features may be configured to be one of: pre-configured and automatically extractable. According to a seventh option, the set of correspondences may be configured to be a machine-learning table. According to an eighth option, the first machine-learning method of anomaly detection may be configured to be one of an auto-encoder and a one class support vector machine. According to a ninth option, the determining of the first predictive model may be configured to comprise the testing phase with the first group of samples, and the execution phase with the second group of samples. According to a tenth option, the second machinelearning method may be configured to be a k-nearest neighbours method.

The embodiments herein in the first node 111 may be implemented through one or more processors, such as a processing circuitry 1001 in the first node 111 depicted in Figure 10, together with computer program code for performing the functions and actions of the embodiments herein. A processor, as used herein, may be understood to be a hardware component. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the first node 111. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the first node 111.

The first node 111 may further comprise a memory 1002 comprising one or more memory units. The memory 1002 is arranged to be used to store obtained information, store data, configurations, schedulings, and applications etc. to perform the methods herein when being executed in the first node 111.

In some embodiments, the first node 111 may receive information from, e.g., any of the second node 112, the third node 113, the fourth node 114, the fifth node, the another node and/or another structure in the computer system 100, through a receiving port 1003. In some embodiments, the receiving port 1003 may be, for example, connected to one or more antennas in first node 111. In other embodiments, the first node 111 may receive information from another structure in the computer system 100 through the receiving port 1003. Since the receiving port 1003 may be in communication with the processing circuitry 1001 , the receiving port 1003 may then send the received information to the processing circuitry 1001 . The receiving port 1003 may also be configured to receive other information.

The processing circuitry 1001 in the first node 111 may be further configured to transmit or send information to e.g., any of the second node 112, the third node 113, the fourth node 114, the fifth node, the another node and/or another structure in the computer system 100, through a sending port 1004, which may be in communication with the processing circuitry 1001 , and the memory 1002. Those skilled in the art will also appreciate that the units comprised within the first node 111 described above as being configured to perform different actions, may refer to a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g., stored in memory, that, when executed by the one or more processors such as the processing circuitry 1001 , perform as described above. One or more of these processors, as well as the other digital hardware, may be included in a single Application-Specific Integrated Circuit (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a System-on-a-Chip (SoC).

Also, in some embodiments, the different units comprised within the first node 111 described above as being configured to perform different actions described above may be implemented as one or more applications running on one or more processors such as the processing circuitry 1001.

Thus, the methods according to the embodiments described herein for the first node 111 may be respectively implemented by means of a computer program 1005 product, comprising instructions, i.e., software code portions, which, when executed on at least one processing circuitry 1001 , cause the at least one processing circuitry 1001 to carry out the actions described herein, as performed by the first node 111. The computer program 1005 product may be stored on a computer-readable storage medium 1006. The computer- readable storage medium 1006, having stored thereon the computer program 1005, may comprise instructions which, when executed on at least one processing circuitry 1001 , cause the at least one processing circuitry 1001 to carry out the actions described herein, as performed by the first node 111. In some embodiments, the computer-readable storage medium 1006 may be a non-transitory computer-readable storage medium, such as a CD ROM disc, or a memory stick. In other embodiments, the computer program 1005 product may be stored on a carrier containing the computer program 1005 just described, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 1006, as described above.

The first node 111 may comprise a communication interface configured to facilitate, or an interface unit to facilitate, communications between the first node 111 and other nodes or devices, e.g., any of the second node 112, the third node 113, the fourth node 114, the fifth node, the another node and/or another structure in the computer system 100. The interface may, for example, include a transceiver configured to transmit and receive radio signals over an air interface in accordance with a suitable standard.

In other embodiments, the first node 111 may comprise a radio circuitry 1007, which may comprise e.g., the receiving port 1003 and the sending port 1004. The radio circuitry 1007 may be configured to set up and maintain at least a wireless connection with the any of the second node 112, the third node 113, the fourth node 114, the fifth node, the another node and/or another structure in the computer system 100. Circuitry may be understood herein as a hardware component.

Hence, embodiments herein also relate to the first node 111 operative to operate in the computer system 100. The first node 111 may comprise the processing circuitry 1001 and the memory 1002, said memory 1002 containing instructions executable by said processing circuitry 1001 , whereby the first node 111 is further operative to perform the actions described herein in relation to the first node 111 , e.g., in Figure 2, and/or Figures 6-9.

Figure 11 depicts an example of the arrangement that the second node 112 may comprise to perform the method described in Figure 3. The second node 112 may be configured to operate in the computer system 100. The second node 112 may be understood to be for handling source code.

Several embodiments are comprised herein. It should be noted that the examples herein are not mutually exclusive. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. In Figure 11 , an optional unit is indicated with a dashed box.

The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the second node 112 and will thus not be repeated here. For example, each of the units of source code may be configured to be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method.

The second node 112 is configured to, e.g., by means of an obtaining unit within the second node 112, obtain from the first node 111 configured to operate in the computer system 100, the first indication of the first predictive model of anomaly detection. The first predictive model is configured to predict whether or not a second unit in the source code is to elicit the comment by the reviewer.

The second node 112 is further configured to, e.g., by means of the obtaining unit within the second node 112, obtain the one or more second units of source code.

The second node 112 is further configured to, e.g., by means of the obtaining unit within the second node 112, obtain, from each second unit of the one or more second units configured to be obtained, the one or more second respective values of one or more second respective features configured to be characterizing the one or more second units. The second node 112 is further configured to, e.g., by means of a determining unit within the second node 112, determine, using the first predictive model and the one or more second respective values configured to be obtained, whether or not each second unit in the one or more second units of source code is to elicit a respective comment by a reviewer.

The second node 112 is additionally configured to, e.g., by means of an outputting unit within the second node 112, output, to the fourth node 114 configured to operate in the computer system 100, the fourth indication configured to indicate which one or more third units of the one or more second units of source code are configured to be predicted to elicit a respective comment by the reviewer.

In some embodiments, at least one of the following options may apply. According to a first option, each of the units of source code may be configured to be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method. According to a second option, the first indication may be configured to explicitly indicate the first predictive model. According to a third option, the one or more second units may be configured to be obtained in the file of source code. According to a fourth option, the respective comment may be configured to be in natural language. According to a fifth option, the one or more second respective features may be configured to comprise one or more of: the statement type of the respective first line, the length of the respective first line, the one or more libraries used in the respective first line, the library name used, and the length of the respective identifier used in the respective first line. According to a sixth option, the one or more second respective features may be configured to be one of: pre-configured and automatically extractable. According to a seventh option, the first machine-learning method of anomaly detection may be configured to be one of an auto-encoder and a one class support vector machine.

The embodiments herein in the second node 112 may be implemented through one or more processors, such as a processing circuitry 1101 in the second node 112 depicted in Figure 11 , together with computer program code for performing the functions and actions of the embodiments herein. A processor, as used herein, may be understood to be a hardware component. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the second node 112. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the second node 112.

The second node 112 may further comprise a memory 1102 comprising one or more memory units. The memory 1102 is arranged to be used to store obtained information, store data, configurations, schedulings, and applications etc. to perform the methods herein when being executed in the second node 112.

In some embodiments, the second node 112 may receive information from, e.g., any of the first node 111 , the third node 113, the fourth node 114, the fifth node, the another node and/or another structure in the computer system 100, through a receiving port 1103. In some embodiments, the receiving port 1103 may be, for example, connected to one or more antennas in second node 112. In other embodiments, the second node 112 may receive information from another structure in the computer system 100 through the receiving port 1103. Since the receiving port 1103 may be in communication with the processing circuitry 1101 , the receiving port 1103 may then send the received information to the processing circuitry 1101. The receiving port 1103 may also be configured to receive other information.

The processing circuitry 1101 in the second node 112 may be further configured to transmit or send information to e.g., any of the first node 111 , the third node 113, the fourth node 114, the fifth node, the another node and/or another structure in the computer system 100, through a sending port 1104, which may be in communication with the processing circuitry 1101 , and the memory 1102.

Those skilled in the art will also appreciate that the units comprised within the second node 112 described above as being configured to perform different actions, may refer to a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g., stored in memory, that, when executed by the one or more processors such as the processing circuitry 1101 , perform as described above. One or more of these processors, as well as the other digital hardware, may be included in a single Application-Specific Integrated Circuit (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a System-on-a-Chip (SoC).

Also, in some embodiments, the different units comprised within the second node 112 described above as being configured to perform different actions described above may be implemented as one or more applications running on one or more processors such as the processing circuitry 1101.

Thus, the methods according to the embodiments described herein for the second node 112 may be respectively implemented by means of a computer program 1105 product, comprising instructions, i.e., software code portions, which, when executed on at least one processing circuitry 1101 , cause the at least one processing circuitry 1101 to carry out the actions described herein, as performed by the second node 112. The computer program 1105 product may be stored on a computer-readable storage medium 1106. The computer- readable storage medium 1106, having stored thereon the computer program 1105, may comprise instructions which, when executed on at least one processing circuitry 1101 , cause the at least one processing circuitry 1101 to carry out the actions described herein, as performed by the second node 112. In some embodiments, the computer-readable storage medium 1106 may be a non-transitory computer-readable storage medium, such as a CD ROM disc, or a memory stick. In other embodiments, the computer program 1105 product may be stored on a carrier containing the computer program 1105 just described, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 1106, as described above.

The second node 112 may comprise a communication interface configured to facilitate, or an interface unit to facilitate, communications between the second node 112 and other nodes or devices, e.g., any of the first node 111 , the third node 113, the fourth node 114, the fifth node, the another node and/or another structure in the computer system 100. The interface may, for example, include a transceiver configured to transmit and receive radio signals over an air interface in accordance with a suitable standard.

In other embodiments, the second node 112 may comprise a radio circuitry 1107, which may comprise e.g., the receiving port 1103 and the sending port 1104.

The radio circuitry 1107 may be configured to set up and maintain at least a wireless connection with the any of the first node 111 , the third node 113, the fourth node 114, the fifth node, the another node and/or another structure in the computer system 100. Circuitry may be understood herein as a hardware component.

Hence, embodiments herein also relate to the second node 112 operative to operate in the computer system 100. The second node 112 may comprise the processing circuitry 1101 and the memory 1102, said memory 1102 containing instructions executable by said processing circuitry 1101 , whereby the second node 112 is further operative to perform the actions described herein in relation to the second node 112, e.g., in Figure 2.

Figure 12 depicts an example of the arrangement that the third node 113 may comprise to perform the method described in Figure 4, and/or Figure 8-9 in some embodiments. The third node 113 may be configured to operate in the computer system 100. The third node 113 may be understood to be for handling source code.

Several embodiments are comprised herein. It should be noted that the examples herein are not mutually exclusive. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. In Figure 12, an optional unit is indicated with a dashed box.

The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the third node 113 and will thus not be repeated here. For example, each of the units of source code may be configured to be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method.

The third node 113 is configured to, e.g., by means of an obtaining unit within the third node 113, obtain, from the first node 111 configured to operate in the computer system 100, the second indication. The second indication is configured to indicate which one or more fourth units, of one or more first units of source code, are configured to have elicited a respective comment by a reviewer.

The third node 113 is further configured to, e.g., by means of a determining unit within the third node 113, determine, using the second machine-learning method, and based on the one or more fourth units of source code which, out of the one or more first units of source code, are configured to have elicited a respective comment by the reviewer, the second predictive model. The second predictive model is to predict which comment is predicted to be elicited, respectively, by one or more third units of source code.

The third node 113 is additionally configured to, e.g., by means of an outputting unit within the third node 113, output, to the fourth node 114 configured to operate in the computer system 100, the fifth indication of the second predictive model.

In some embodiments, the third node 113 may be further configured to, e.g., by means of the obtaining unit within the third node 113, obtain, from each first unit of the one or more first units, the one or more first respective values of the one or more first respective features configured to characterize the one or more first units. The determining is configured to be based on the one or more first respective values of one or more first respective features configured to characterize the one or more first units.

In some embodiments, the third node 113 may be further configured to, e.g., by means of a processing unit within the third node 113, process, prior to determining the second predictive model, the one or more respective comments configured to have been elicited during the review of the one or more fourth units of source code. The processing may be configured to comprise: i) generating the embeddings for each comment to normalize different word usages, ii) reducing the high dimensionality using the principal component analysis for clustering the normalized one or more respective comments, iii) clustering the normalized one or more respective comments using K-Means, and iv) outputting the indication of clusters of similar comments to be used as input for determining the second predictive model. In some embodiments, at least one of the following options may apply. According to a first option, each of the units of source code may be configured to be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method. According to a second option, the one or more respective comments may be configured to be in natural language. According to a third option, the fifth indication may be configured to explicitly indicate the second predictive model. According to a fourth option, the second machine-learning method may be configured to be a k-nearest neighbours method.

The embodiments herein in the third node 113 may be implemented through one or more processors, such as a processing circuitry 1201 in the third node 113 depicted in Figure 12, together with computer program code for performing the functions and actions of the embodiments herein. A processor, as used herein, may be understood to be a hardware component. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the third node 113. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the third node 113.

The third node 113 may further comprise a memory 1202 comprising one or more memory units. The memory 1202 is arranged to be used to store obtained information, store data, configurations, schedulings, and applications etc. to perform the methods herein when being executed in the third node 113.

In some embodiments, the third node 113 may receive information from, e.g., any of the first node 111 , the second node 112, the fourth node 114, the fifth node, the another node and/or another structure in the computer system 100, through a receiving port 1203. In some embodiments, the receiving port 1203 may be, for example, connected to one or more antennas in third node 113. In other embodiments, the third node 113 may receive information from another structure in the computer system 100 through the receiving port 1203. Since the receiving port 1203 may be in communication with the processing circuitry 1201 , the receiving port 1203 may then send the received information to the processing circuitry 1201 . The receiving port 1203 may also be configured to receive other information.

The processing circuitry 1201 in the third node 113 may be further configured to transmit or send information to e.g., any of the first node 111 , the second node 112, the fourth node 114, the fifth node, the another node and/or another structure in the computer system 100, through a sending port 1204, which may be in communication with the processing circuitry 1201 , and the memory 1202. Those skilled in the art will also appreciate that the units comprised within the third node 113 described above as being configured to perform different actions, may refer to a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g., stored in memory, that, when executed by the one or more processors such as the processing circuitry 1201 , perform as described above. One or more of these processors, as well as the other digital hardware, may be included in a single Application-Specific Integrated Circuit (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a System-on-a-Chip (SoC).

Also, in some embodiments, the different units comprised within the third node 113 described above as being configured to perform different actions described above may be implemented as one or more applications running on one or more processors such as the processing circuitry 1201.

Thus, the methods according to the embodiments described herein for the third node 113 may be respectively implemented by means of a computer program 1205 product, comprising instructions, i.e., software code portions, which, when executed on at least one processing circuitry 1201 , cause the at least one processing circuitry 1201 to carry out the actions described herein, as performed by the third node 113. The computer program 1205 product may be stored on a computer-readable storage medium 1206. The computer- readable storage medium 1206, having stored thereon the computer program 1205, may comprise instructions which, when executed on at least one processing circuitry 1201 , cause the at least one processing circuitry 1201 to carry out the actions described herein, as performed by the third node 113. In some embodiments, the computer-readable storage medium 1206 may be a non-transitory computer-readable storage medium, such as a CD ROM disc, or a memory stick. In other embodiments, the computer program 1205 product may be stored on a carrier containing the computer program 1205 just described, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 1206, as described above.

The third node 113 may comprise a communication interface configured to facilitate, or an interface unit to facilitate, communications between the third node 113 and other nodes or devices, e.g., any of the first node 111 , the second node 112, the fourth node 114, the fifth node, the another node and/or another structure in the computer system 100. The interface may, for example, include a transceiver configured to transmit and receive radio signals over an air interface in accordance with a suitable standard.

In other embodiments, the third node 113 may comprise a radio circuitry 1207, which may comprise e.g., the receiving port 1203 and the sending port 1204. The radio circuitry 1207 may be configured to set up and maintain at least a wireless connection with the any of the first node 111 , the second node 112, the fourth node 114, the fifth node, the another node and/or another structure in the computer system 100. Circuitry may be understood herein as a hardware component.

Hence, embodiments herein also relate to the third node 113 operative to operate in the computer system 100. The third node 113 may comprise the processing circuitry 1201 and the memory 1202, said memory 1202 containing instructions executable by said processing circuitry 1201 , whereby the third node 113 is further operative to perform the actions described herein in relation to the third node 113, e.g., in Figure 4, and/or Figure 8-9.

Figure 13 depicts an example of the arrangement that the fourth node 114 may comprise to perform the method described in Figure 5, and/or Figure 9 in some embodiments. The fourth node 114 may be configured to operate in the computer system 100. The fourth node 114 may be understood to be for handling source code.

Several embodiments are comprised herein. It should be noted that the examples herein are not mutually exclusive. One or more embodiments may be combined, where applicable. All possible combinations are not described to simplify the description. Components from one embodiment may be tacitly assumed to be present in another embodiment and it will be obvious to a person skilled in the art how those components may be used in the other exemplary embodiments. In Figure 13, an optional unit is indicated with a dashed box.

The detailed description of some of the following corresponds to the same references provided above, in relation to the actions described for the fourth node 114 and will thus not be repeated here. For example, each of the units of source code may be configured to be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method.

The fourth node 114 is configured to, e.g., by means of an obtaining unit within the fourth node 114, obtain, from the third node 113 configured to operate in the computer system 100, the fifth indication. The fifth indication is configured to indicate the second predictive model to predict which comment is predicted to be elicited, respectively, by the one or more third units of source code.

The fourth node 114 is further configured to, e.g., by means of the obtaining unit within the fourth node 114, obtain, from the second node 112 configured to operate in the computer system 100, the fourth indication. The fourth indication is configured to indicate which one or more third units, of the one or more second units of source code, are predicted to elicit a respective comment by a reviewer. The fourth node 114 is further configured to, e.g., by means of a determining unit within the fourth node 114, determine, using the second predictive model, which respective comment is to be elicited by each third unit of the one or more third units of source code predicted to elicit a respective comment by the reviewer.

The fourth node 114 is additionally configured to, e.g., by means of an outputting unit within the fourth node 114, output the sixth indication. The sixth indication is configured to indicate which comment is predicted to be respectively elicited by each third unit of the one or more third units.

In some embodiments, at least one of the following options may apply. According to a first option, each of the units of source code may be configured to be one of: a line, a statement, an entire block of code, a class, a class member, a class field and a method. According to a second option, the sixth indication may be configured to indicate which comment may be predicted to be respectively elicited by each third unit of the one or more third units. According to a third option, the comment predicted to be respectively elicited by each third unit may be configured to be in natural language. According to a fourth option, the second machine-learning method may be configured to be a k-nearest neighbours method.

The embodiments herein in the fourth node 114 may be implemented through one or more processors, such as a processing circuitry 1301 in the fourth node 114 depicted in Figure 13, together with computer program code for performing the functions and actions of the embodiments herein. A processor, as used herein, may be understood to be a hardware component. The program code mentioned above may also be provided as a computer program product, for instance in the form of a data carrier carrying computer program code for performing the embodiments herein when being loaded into the fourth node 114. One such carrier may be in the form of a CD ROM disc. It is however feasible with other data carriers such as a memory stick. The computer program code may furthermore be provided as pure program code on a server and downloaded to the fourth node 114.

The fourth node 114 may further comprise a memory 1302 comprising one or more memory units. The memory 1302 is arranged to be used to store obtained information, store data, configurations, schedulings, and applications etc. to perform the methods herein when being executed in the fourth node 114.

In some embodiments, the fourth node 114 may receive information from, e.g., any of the first node 1111 , the second node 112, the third node 113, the fifth node, the another node and/or another structure in the computer system 100, through a receiving port 1303. In some embodiments, the receiving port 1303 may be, for example, connected to one or more antennas in fourth node 114. In other embodiments, the fourth node 114 may receive information from another structure in the computer system 100 through the receiving port 1303. Since the receiving port 1303 may be in communication with the processing circuitry 1301 , the receiving port 1303 may then send the received information to the processing circuitry 1301 . The receiving port 1303 may also be configured to receive other information.

The processing circuitry 1301 in the fourth node 114 may be further configured to transmit or send information to e.g., any of the first node 1111 , the second node 112, the third node 113, the fifth node, the another node and/or another structure in the computer system 100, through a sending port 1304, which may be in communication with the processing circuitry 1301 , and the memory 1302.

Those skilled in the art will also appreciate that the units comprised within the fourth node 114 described above as being configured to perform different actions, may refer to a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g., stored in memory, that, when executed by the one or more processors such as the processing circuitry 1301 , perform as described above. One or more of these processors, as well as the other digital hardware, may be included in a single Application-Specific Integrated Circuit (ASIC), or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a System-on-a-Chip (SoC).

Also, in some embodiments, the different units comprised within the fourth node 114 described above as being configured to perform different actions described above may be implemented as one or more applications running on one or more processors such as the processing circuitry 1301.

Thus, the methods according to the embodiments described herein for the fourth node 114 may be respectively implemented by means of a computer program 1305 product, comprising instructions, i.e., software code portions, which, when executed on at least one processing circuitry 1301 , cause the at least one processing circuitry 1301 to carry out the actions described herein, as performed by the fourth node 114. The computer program 1305 product may be stored on a computer-readable storage medium 1306. The computer- readable storage medium 1306, having stored thereon the computer program 1305, may comprise instructions which, when executed on at least one processing circuitry 1301 , cause the at least one processing circuitry 1301 to carry out the actions described herein, as performed by the fourth node 114. In some embodiments, the computer-readable storage medium 1306 may be a non-transitory computer-readable storage medium, such as a CD ROM disc, or a memory stick. In other embodiments, the computer program 1305 product may be stored on a carrier containing the computer program 1305 just described, wherein the carrier is one of an electronic signal, optical signal, radio signal, or the computer-readable storage medium 1306, as described above. The fourth node 114 may comprise a communication interface configured to facilitate, or an interface unit to facilitate, communications between the fourth node 114 and other nodes or devices, e.g., any of the first node 1111 , the second node 112, the third node 113, the fifth node, the another node and/or another structure in the computer system 100. The interface may, for example, include a transceiver configured to transmit and receive radio signals over an air interface in accordance with a suitable standard.

In other embodiments, the fourth node 114 may comprise a radio circuitry 1307, which may comprise e.g., the receiving port 1303 and the sending port 1304.

The radio circuitry 1307 may be configured to set up and maintain at least a wireless connection with the any of the first node 1111 , the second node 112, the third node 113, the fifth node, the another node and/or another structure in the computer system 100. Circuitry may be understood herein as a hardware component.

Hence, embodiments herein also relate to the fourth node 114 operative to operate in the computer system 100. The fourth node 114 may comprise the processing circuitry 1301 and the memory 1302, said memory 1302 containing instructions executable by said processing circuitry 1301 , whereby the fourth node 114 is further operative to perform the actions described herein in relation to the fourth node 114, e.g., in Figure 5, and/or Figure 9.

When using the word "comprise" or “comprising”, it shall be interpreted as non- limiting, i.e. meaning "consist at least of".

The embodiments herein are not limited to the above described preferred embodiments. Various alternatives, modifications and equivalents may be used. Therefore, the above embodiments should not be taken as limiting the scope of the invention.

Generally, all terms used herein are to be interpreted according to their ordinary meaning in the relevant technical field, unless a different meaning is clearly given and/or is implied from the context in which it is used. All references to a/an/the element, apparatus, component, means, step, etc. are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any methods disclosed herein do not have to be performed in the exact order disclosed, unless a step is explicitly described as following or preceding another step and/or where it is implicit that a step must follow or precede another step. Any feature of any of the embodiments disclosed herein may be applied to any other embodiment, wherever appropriate. Likewise, any advantage of any of the embodiments may apply to any other embodiments, and vice versa. Other objectives, features and advantages of the enclosed embodiments will be apparent from the following description. As used herein, the expression “at least one of:” followed by a list of alternatives separated by commas, and wherein the last alternative is preceded by the “and” term, may be understood to mean that only one of the list of alternatives may apply, more than one of the list of alternatives may apply or all of the list of alternatives may apply. This expression may be understood to be equivalent to the expression “at least one of:” followed by a list of alternatives separated by commas, and wherein the last alternative is preceded by the “or” term.

Any of the terms processor and circuitry may be understood herein as a hardware component.

As used herein, the expression “in some embodiments” has been used to indicate that the features of the embodiment described may be combined with any other embodiment or example disclosed herein.

As used herein, the expression “in some examples” has been used to indicate that the features of the example described may be combined with any other embodiment or example disclosed herein.

References List

1 . Towards Automating Code Review Activities (2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)) DOI 10.1109/ICSE43902.2021 .00027

2. US10877869B1 [Method and System for implementing a Code Review Tool] [Dec

2020]