Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
COMPUTER-ASSISTED METHOD AND SYSTEM FOR SORTING OBJECTS
Document Type and Number:
WIPO Patent Application WO/2023/143706
Kind Code:
A1
Abstract:
Aspects concern a computer-assisted method and system for sorting objects. The computer-assisted method/system makes use of ML/AI algorithms for the identification of regions of interest from the image associated with the object to be sorted and a part of the sorter used to pick an object for sorting, and then determines based on the identified regions of interest whether the object is picked up for sorting; and thereafter determines if the object picked up for sorting is an object or object type assigned to the sorter.

Inventors:
YAN WAI (SG)
JEON JIN HAN (SG)
NGO CHI TRUNG (SG)
ANDALAM SIDHARTA (SG)
MOOKHERJEE DEBOSMIT (SG)
Application Number:
PCT/EP2022/051681
Publication Date:
August 03, 2023
Filing Date:
January 26, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BOSCH GMBH ROBERT (DE)
International Classes:
G06V10/25; B07C7/00; G06V10/75; G06V10/764; G06V20/52; G06V40/10
Foreign References:
CN113441411A2021-09-28
US20190243343A12019-08-08
Other References:
KJELLSTRÖM HEDVIG ET AL: "Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects", 1 January 2008, ARXIV.ORG, PAGE(S) 336 - 349, XP047530116
ANONYMOUS: "Hands - mediapipe", 2 October 2020 (2020-10-02), XP055959991, Retrieved from the Internet [retrieved on 20220912]
LIU, W.ANGUELOV, D.ERHAN, D.SZEGEDY, C.REED, S.E.FU, CBERG, A.: "SSD: Single Shot MultiBox Detector", ECCV, 2016
LUGARESI, C.TANG, J.NASH, H.MCCLANAHAN, C.UBOWEJA, E.HAYS, M.ZHANG, F.CHANG, C.YONG, M.G.LEE, J.: "MediaPipe: A Framework for Building Perception Pipelines", ARXIV, ABS/1906.08172, 2019
BOCHKOVSKIY, A.WANG, C.LIAO, H.: "YOLOv4: Optimal Speed and Accuracy of Object Detection", ARXIV, ABS/2004.10934, 2020
HUANG, Z.WANG, J.: "DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection", INF. SCI., vol. 522, 2020, pages 241 - 258
WOJKE, N.BEWLEY, APAULUS, D.: "Simple online and realtime tracking with a deep association metric", 2017 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP, 2017, pages 3645 - 3649, XP033323255, DOI: 10.1109/ICIP.2017.8296962
DANG, T.L.NGUYEN, G.CAO, T., OBJECT TRACKING USING IMPROVED DEEP SORT YOLOV3 ARCHITECTURE, 2020
Download PDF:
Claims:
CLAIMS

1. A computer-assisted method for sorting objects comprising the steps of: a. receiving an image of a sorting area, the image comprising an object to be sorted and a part of a sorter associated with picking the object; b. identifying a first region of interest from the image associated with the object to be sorted and a second region of interest from the image associated with the part of the sorter; c. determining based on the first region of interest and the second region of interest, whether the object is picked up for sorting; wherein if the object is determined to be picked up for sorting; d. determining if the object picked up for sorting is an object assigned to the sorter.

2. The method of claim 1, wherein the part of the sorter associated with picking the object is a hand of the sorter and the step of determining whether the object is picked up for sorting includes the step of determining whether the hand of the sorter is in an open position or a closed position.

3. The method of claim 2, wherein the step of determining whether the object is picked up includes the step of determining whether a unique identifier associated with the object is present.

4. The method of claims 2 and 3, wherein if the part of the sorter associated with picking the object is determined to be in the closed position and the bounding box and tracking identifier is determined to be not present when compared with an associated earlier image previously present, the object is determined to be picked up.

5. The method of any one of claims 2 to 4, wherein at least one of: (i) the step of determining whether the hand of the sorter is in an open position or a closed position, and (ii) the step of determining if the object picked up for sorting is an assigned object includes a deep learning step, the deep learning step based on an artificial intelligence algorithm.

6. The method of claim 5, wherein the step of determining whether the hand of the sorter is in the open position or the closed position includes the steps of: (i) detecting a part of the hand using an object recognition technique; (ii) defining a region of interest (ROI) around the part of the hand; (iii) generating a landmark model of the human hand which returns 3-dimensional (3D) hand key points on the ROI of the hand; and (iv) analyzing the landmark model to determine if the human hand is in the open position or the close position.

7. The method of claim 5 or 6, wherein the step of determining if the object picked up for sorting is an assigned object includes the steps of: (i) receiving a plurality of images associated with the plurality of objects over a predetermined period; (ii) comparing the relevant features of an object in a first image with the features of the object in successive images; and (iii) determining if an object is present or absent.

8. The method of any one of the preceding claims, further comprising the step of alerting the sorter via a wearable device if the object picked up for sorting is not assigned to the sorter.

9. A computer-assisted sorting system comprising an image capturing module positioned or arranged to obtain an image of a sorting area, the image comprising an object to be sorted and a part of a sorter associated with picking the object; a processor arranged in data or signal communication with the image capturing module to receive the image and extract a first region of interest from the image associated with the object to be sorted and a second region of interest from the image associated with the part of the sorter; determine, based on the first region of interest and the second region of interest, whether the object is picked up for sorting, and if the object is determined to be picked up for sorting, determine if the object picked up for sorting is an object assigned to the sorter.

10. The system of claim 9, wherein the processor includes a gesture recognition module configured to determine whether the part of the sorter associated with picking the object is in an open position or a closed position.

11. The system of claim 10, wherein the processor includes an object detection/tracking module configured to determine whether a unique identifier associated with the object is present.

12. The system of claims 10 and 11, wherein if the part of the sorter associated with picking the object is determined to be in the closed position and the bounding box and tracking identifier is determined to be not present, the object is determined to be picked up.

13. The system of any one of claims 11 or 12, wherein at least one of the gesture recognition module and the object detection/tracking module comprises a deep learning module.

14. The system of claim 13, wherein the gesture recognition module is operable to (i) detect a part of the hand using an object recognition technique; (ii) define a region of interest (ROI) around the part of the hand; (iii) generate a landmark model of the human hand which returns 3-dimensional (3D) hand key points on the ROI of the hand; and (iv) analyze the landmark model to determine if the human hand is in the open position or the close position.

15. The system of claim 13 or 14, wherein the object detection/tracking module is operable to (i) receive a plurality of images associated with the plurality of objects over a predetermined period; (ii) compare the relevant features of an object in a first image with the features of the object in successive images; and (iii) determine if an object is present or absent.

16. The system of any one of claims 9 to 15, further comprises a wearable device configured to alert the sorter if the object picked up for sorting is not associated with the sorter.

17. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out the steps of a. receiving an image of a sorting area, the image comprising an object to be sorted and a part of a sorter associated with picking the object; b. identifying a first region of interest from the image associated with the object to be sorted and a second region of interest from the image associated with the part of the sorter; c. determining based on the first region of interest and the second region of interest, whether the object is picked up for sorting; wherein if the object is determined to be picked up for sorting; d. determining if the object picked up for sorting is an object assigned to the sorter.

Description:
COMPUTER-ASSISTED METHOD AND SYSTEM FOR SORTING OBJECTS

TECHNICAL FIELD

[0001] The disclosure relates to a computer-assisted method and a computer-assisted system for sorting objects.

BACKGROUND

[0002] As global waste increases at an increasing rate, there is a growing need to recycle or reuse used objects such as plastic packaging or bottles to reduce carbon emissions. Used objects are typically sorted at collection points or at dedicated sorting facilities.

[0003] One solution for sorting used objects is based on manual labor in the form of human workers or sorters. The workers are trained to recognize different plastic types. At a sorting area that may include a conveyor belt, each sorter along the conveyor belt is assigned to pick one particular type of plastic waste and place it into a respective container or receptacle for sorting.

[0004] However, there are at least two drawbacks associated with using human workers as sorters. Firstly, for objects such as plastic packaging/bottles, human workers have to be trained to recognize various plastic types on a conveyor belt- this may incur additional training cost for sorting facilities. Secondly, human workers may not always be able to distinguish different plastic type with similar shape, color with their naked eyes, resulting in many unsorted or wrongly classified plastics being dumped into landfills or sent to recycling plants. The unsorted or wrongly classified objects may degrade the quality of recycled product and/or reduce sorting efficiency and accuracy.

[0005] Another solution is a fully automatic sorting. In such a process the sorting and picking activities are performed automatically with the assistance of different plastic identification technologies like computer vision or Near Infrared (NIR) and/or robotics.

[0006] Although the automatic sorting with plastic sensing technologies and robotics can significantly improve the sorting speed, efficiency and accuracy compared to manual sorters, they are relatively more expensive for mid or small-size sorting facilities to afford, and the upfront investments to purchase the equipment may be prohibitive. Hence, the initial costs considerations may outweigh the cost savings in the long run arising from reduced manual labor cost and improved productivity.

[0007] There exists a need to provide a solution to alleviate at least one of the aforementioned problems. SUMMARY

[0008] A technical solution is provided in the form of a computer-assisted method and system for sorting objects. The solution comprises relatively affordable image capturing devices such as cameras, video cameras, camcorders to detect and track moving objects (such as plastic objects of at least two different plastic types) on a sorting area such as a conveyor belt and monitor the sorters to determine if they pick the required plastic type correctly. Optionally, if sorters pick a wrong object not assigned to them, one or more notifications will be sent to the sorters by way of an alert, such as by vibration from a wearable device worn by the sorters.

[0009] According to the present invention, a computer-assisted method for sorting objects as claimed in claim 1 is provided. A computer-assisted system for sorting objects according to the invention is defined in claim 9. A computer program comprising instructions to execute the computer-assisted method is defined in claim 17.

[0010] The dependent claims define some examples associated with the method and system, respectively.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:

- FIG. 1 is a flow chart of a computer-assisted method for sorting objects according to some embodiments;

- FIG. 2A and 2B show a system architecture of a system for sorting objects according to some embodiments;

- FIG. 3A is a flow chart of a method for recognizing hand gesture according to some embodiments; FIG. 3B to 3D illustrate outputs of a gesture recognition module in the form of a landmark model;

- FIG. 4A is a flow chart of a method for detecting and tracking an object according to some embodiments; FIG. 4B shows a lab setup for detecting and tracking one or more objects on a sorting area according to some embodiments; FIG. 4C and FIG. 4D show specific examples of deep learning framework associated with the detection and tracking of one or more objects;

- FIG. 5A to FIG. 5E illustrate the combined hand gesture recognition and object recognition to form a hybrid model for determining if an object is picked for sorting; and - FIG. 6 shows a schematic illustration of a processor 210 for sorting objects according to some embodiments.

DETAILED DESCRIPTION

[0012] The following detailed description refers to the accompanying drawings that show, by way of illustration, specific details and embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure. Other embodiments may be utilized and structural, logical changes may be made without departing from the scope of the disclosure. The various embodiments are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments.

[0013] Embodiments described in the context of one of the systems or methods are analogously valid for the other systems or methods.

[0014] Features that are described in the context of an embodiment may correspondingly be applicable to the same or similar features in the other embodiments. Features that are described in the context of an embodiment may correspondingly be applicable to the other embodiments, even if not explicitly described in these other embodiments. Furthermore, additions and/or combinations and/or alternatives as described for a feature in the context of an embodiment may correspondingly be applicable to the same or similar feature in the other embodiments.

[0015] In the context of some embodiments, the articles “a”, “an” and “the” as used with regard to a feature or element include a reference to one or more of the features or elements. [0016] As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

[0017] As used herein, the term “object” includes any object, particularly recyclable or reusable object that may be sorted according to type. For example, plastic objects may be sorted according to whether they are High Density Poly Ethylene (HD PE) plastic objects, Polypropylene (PP), Polystyrene (PS), Low-density polyethylene (LDPE), Polyvinyl chloride (PVC) or Polyethylene terephthalate (PET) plastic objects. Such objects may include bottles, jars, containers, plates, bowls etc. of various shapes, sizes, and forms (example, partially compressed, distorted).

[0018] As used herein, the term “associate”, “associated”, “associate”, and “associating” indicate a defined relationship (or cross-reference) between at least two items. For instance, a part of the sorter (e.g. a hand) used to pick an object for sorting may be the part of the sorter associated with picking an object. A captured image associated with an object may include a defined region of interest which focuses on the object for further processing via object detection algorithm (s).

[0019] As used herein, the term “sort” broadly includes at least one of classification, categorization and arrangement.

[0020] As used herein, the term “sorter” includes a human tasked to sort an object according to a type assigned to the human. For example, a first human worker may be assigned to sort HDPE plastic bottles and a second human worker may be assigned to sort PET plastic bottles. Correspondingly, a part of a sorter associated with picking the object may include a hand or a foot of a human. It is appreciable that the part of a sorter associated with picking the object may also include a mechanical device such as a mechanical claw or a synthetic hand to assist a sorter.

[0021] As used herein, the term “module” refers to, or forms part of, or include an Application Specific Integrated Circuit (ASIC); an electronic circuit; a combinational logic circuit; a field programmable gate array (FPGA); a processor (shared, dedicated, or group) that executes code; other suitable hardware components that provide the described functionality; or a combination of some or all of the above, such as in a system-on-chip. The term module may include memory (shared, dedicated, or group) that stores code executed by the processor.

[0022] FIG. 1 shows a flow chart of a computer-assisted method for sorting objects 100 according to an embodiment. The method 100 comprises the steps of: a. receiving an image of a sorting area (step S102), the image comprising an object to be sorted and a part of a sorter associated with picking the object; b. identifying a first region of interest from the image associated with the object to be sorted and a second region of interest from the image associated with the part of the sorter (step S104); c. determining based on the first region of interest and the second region of interest, whether the object is picked up for sorting (step S106); wherein if the object is determined to be picked up for sorting; d. determining if the object picked up for sorting is an object assigned to the sorter (step S108).

[0023] The step of receiving the image of the sorting area (step S102) may include receiving a video data file comprising multiple images of the sorting area over a predetermined time. The step of identifying regions of interest (step S104) may include the use of computer image processing techniques and algorithms. The steps of determination (step S106 and/or step S108) may include analysis and predictive computer algorithms such as machine learning (ML) artificial intelligence (Al) algorithms. [0024] Steps S102 to S108 are shown in a specific order. However, it is contemplated that other arrangements are possible. Steps may also be combined in some cases. For example, steps S106 and S108 may be combined.

[0025] In some embodiments, the method 100 may be implemented as at least part of a sorting system 200 for sorting plastic objects. FIG. 2A illustrates such a sorting system 200 having a conveyor belt 202. Objects to be sorted 204 are placed on the conveyor belt 202 and these objects are denoted as A, B, and C. In FIG. 2A, there is shown two sorters in the form of human workers SI and S2. The human worker SI may be assigned to pick and sort an object A, and the human worker S2 is assigned to pick and sort an object B. The object A may belong to a class of plastic (e.g. HDPE) and the object B may belong to another class of plastic (e.g. PET).

[0026] Image capturing devices 206 are positioned at suitable locations in the vicinity of the sorting area to capture images associated with each human worker SI, S2 and the objects 204. One or more image capturing devices 206 may be in the form of an RGB camera or video-recorder (to capture sequences of images known as frames). In the embodiment shown in FIG. 2A, one image capturing device 206a is assigned to a sorting area 208a associated with sorter SI and another image capturing device 206b is assigned to another sorting area 208b associated with sorter S2. Although the illustrated embodiment shows one image capturing device 206 associated with each of the sorter SI and S2, it is appreciable that one image capturing device 206 may be assigned to two or more sorters, or two or more image capturing devices 206 may be assigned to one sorter.

[0027] FIG. 2B shows a high-level system architecture diagram of the sorting system 200 according to some embodiments. One or more images captured by each image capturing device(s) 206 are sent to a processor 210 for processing and classification. The output from the processor 210 is an indication/value associated with whether the object A, B or C is correctly picked up by the assigned sorter SI, S2 etc. In some embodiments, the output of the processor 210 may be in the form of a binary ‘l’denoting the correct object picked up by the assigned sorter or ‘0’ denoting a wrong object picked up by a sorter. Alternatively, the output of the processor 210 may be in the form of a binary ‘0’denoting the correct object picked up by the assigned sorter or ‘1’ denoting a wrong object picked up by a sorter.

[0028] In some embodiments, the processor 210 may comprise a gesture recognition module 212, an object detection/tracking module 214, and a classification module 216. The gesture recognition module 212 and the object detection/tracking module 214 may include respective input interface configured to receive and parse image data. Output from the gesture recognition module 212 may be in the form of whether the part of a sorter associated with picking the object (e.g. a human hand) is in an open state or a closed state. Output from the object detection/tracking module 214 may be in the form of whether one or more unique identifiers associated with each of the object A, B or C is present or absent. It is contemplated that the processor 210 may be a single server or may be a distributed computer network across different computing resources/platforms.

[0029] The classification module 216 may be operable to receive the inputs from the gesture recognition module 212 and the object detection/tracking module 214, so as to provide a final output indicating whether the object A, B or C is correctly picked up by the assigned sorter SI, S2. In some embodiments, one dedicated processor 210 is assigned to each sorter. The dedicated processor 210 may be pre-configured or pre-programmed to identify one type of object for further purpose. For example, a dedicated processor 210 is assigned to sorter SI to pick or retrieve object type associated with object A from the sorting area 202. Another dedicated processor 210 is assigned to sorter S2 to pick or retrieve object type associated with object B from the sorting area 202. It is contemplated that other configurations of the processor 210 are possible using different computing resources/platforms such that one computing resource is dedicated for sorter SI, and another computing resource is dedicated for sorter S2.

[0030] The image capturing devices 206 may be operable to capture images continuously in the form of a video file comprising multiple image frames, or at pre-determined intervals. Data from the image capturing devices 206 may then be extracted and sent to the gesture recognition module 212 and the object detection/tracking module 214 for processing.

[0031] In some embodiments, processing by the processor 210 may be performed realtime.

[0032] FIG. 3A to FIG. 3D show some embodiments for implementing the gesture recognition module 212, based on the assumption that the part associated with a sorter for picking an object for sorting is a human hand. The gesture recognition module 212 may include hardware and/or software components to implement a machine learning (ML) and/or an artificial intelligence (Al) algorithm to perform a method for recognizing or predicting hand gesture 300 comprising the steps of (i) detecting the hand (or part thereof) using an object recognition technique (step S302) ; (ii) defining a region of interest ( RO I) around the hand (step S304); (iii) generating a landmark model of the hand which returns 3-dimensional (3D) hand key points on the ROI of the hand (step S306); and (iv) analyzing the landmark model to determine if the hand is in an open position or close position (step S308). In some embodiments, the part of the hand may include a palm of the hand, and/or a back portion of the hand. It is appreciable that the gesture recognition module 212 is able to perform hand gesture even if the hand (or part thereof) of the sorter is covered by a glove.

[0033] An example of the ML/AI algorithm that may be suitable is a known framework [1] for building multimodal applied machine learning pipelines. The framework may be used for building multimodal (e.g. video, audio, any time series data), cross platform (i.e. Android, iOS, web, edge devices) applied ML pipelines.

[0034] The step of detecting the hand (step S302) may include training the ML/AI algorithm using a suitable feature identification/detection model, such as the single-shot detector (SSD) model [2]. A feature detector may be trained to recognize/identify the hand. The step of defining a ROI (step S304) may include modelling a feature detector using a bounding box (anchors), which defines a region of interest on the captured or obtained image from the image capturing device 206. In some embodiments, an encoder-decoder feature extractor is used, and the focal loss is minimized during training in order to support a large number of anchors. The model is able to achieve an average precision of 95.7 %.

[0035] Once the region of interest (ROI) is defined around the hand via the bounding box formed around the hand, the step of generating a landmark model of the hand (step S306) may include the steps of generating 3-dimensional (3D) hand key points on the ROI of the hand and performing localization of a predetermined number of key points, for example twenty-one three-dimensional key points inside the hand detected [3]. FIG. 3B shows the generated twenty-one landmarks in the form of dots numbered from 0 to 20. Each of the twenty-one key points may be assigned a unique identifier. Distances between two or more of the twenty-one key points may be associated or correlated with different states of the hand between an open position and a closed position (see also FIG. 5A and 5B).

[0036] The step of analyzing the landmark model (step S308) may include training the ML/AI algorithm to determine or predict if the human hand is in an open position or close position. In some embodiments, the training dataset comprises approximately 30,000 images which are each annotated with the twenty-one three-dimensional key points. In order for the ML/AI algorithm to be able to consider different lighting or background conditions so as to generalize the features, synthetic hand models on several different backgrounds are rendered to increase the variety of training images. The ML/AI algorithm may also include a step of performing regression (direct coordinate prediction) in order to predict the localization of the three-dimensional key points. The trained ML/AI algorithm is shown to be able to predict the three-dimensional key points even when the hands are partially visible in the image capturing device. [0037] FIG. 3C and FIG. 3D show an embodiment illustrating an analysis of the hand tracking model. Once the twenty-one key points (see small dots 310 on hand model) are output by the ML/AI algorithm, the position (via mapped x-y coordinates) of each of the twenty-one key point are determined and used in order to find the distance between selected key points. The hypotenuse between the required x and y coordinates are then used to calculate the distance between the coordinates (see big dots 312). FIG. 3C shows the analysis based on a single finger of the human hand, and FIG. 3D shows the analysis based on multiple fingers of the human hand.

[0038] FIG. 4A shows an embodiment for implementing the object detection/tracking module 214, based on the assumption that the objects to be sorted comprise a plurality of plastic objects of different shapes and sizes belonging to at least two classes/types of plastics, for instance HDPE and PET. The object detection/tracking module 214 may include hardware and/or software components to implement a machine learning (ML) and/or an artificial intelligence (Al) algorithm to perform a method for detecting and tracking an object 400 comprising the steps of: receiving, over a predetermined period, a plurality of images associated with the plurality of objects (step S402); comparing the relevant features of an object in a first image with the features of the object in successive images, for example, a second, third, fourth images, etc. (step S404) ; and determining if an object is present or absent (step S406).

[0039] The step of receiving a plurality of images associated with the plurality of objects over a predetermined period (step S402) may be implemented via the image capturing device 206 continuous capturing of images of the sorting area, for example the conveyor belt 202. The captured images may be sent real-time to the object detection/tracking module 214 for parsing of image data and further processing. Alternatively, any captured images may be stored in a buffer (within the image capturing device 206 or externally) and sent at predetermined intervals to the object detection/tracking module 214.

[0040] The step of comparing the relevant features of an object in a first image with the features of the object in successive images (step S404) may include the step of identifying each object on the conveyor belt 202 and the step of assigning a unique identifier to each of the plurality of objects. For each plastic object detected or identified, a feature detector such as bounding box may be placed around the identified object. In addition, the class that the object belongs to (type of plastic) will be recorded, and a unique tracking identifier assigned to the object.

[0041] The step of determining if an object is present or absent (step S406) may include the step of tracking if the tracking identifier and/or the bounding box of the object is present or has disappeared over the successive images. The time and/or image (frame) at which assigned tracking identifier disappears may be stored in a database.

[0042] It is contemplated that the object detection/tracking module 214 may be tested and/or trained before actual operation. FIG. 4B shows a lab setup comprising HDPE and PET plastic objects for training of the ML/AI algorithms of the module 214. The objects 204 include bottles, jars, containers, etc. of various shapes and sizes. The plastic objects were placed on a conveyor belt 202 and moved along the conveyor belt at a predetermined speed, such as, but not limited to, 0.45 meters per second (m/s). An image capturing device in the form of an RGB camera is placed at suitable location, e.g. above a portion of the conveyor belt 202, and the data of the plastics on the conveyor belt was continuously recorded using the camera as the conveyor belt 202 moves. The output from the camera is the multiple RGB frames of the objects on the belt as shown in FIG. 4B.

[0043] The collected data may be used to form a dataset, the dataset organized into train dataset and test dataset. In some embodiments, the train dataset included 85% (435 images) of the total data and the test dataset included the rest 15% (77 images). In some embodiments, augmentation may be applied to the train dataset so as to increase the variation of the train dataset. Such augmentation includes, but is not limited to, 90-degree rotations, 5% cropping, 2-pixel (px) blur, and horizontal and vertical flipping so as to generalize the deep learning model. After augmentation, the train dataset may be increased to 1,200 images.

[0044] In some embodiments, a one-stage feature detector such as the YOLOv4 (You Only Look Once) model [4] may be used for data training. The backbone model used for the YOLOv4 in such embodiments is a CSPDarknet53 model which is in turn based upon the DenseNet classification model. Backbones are the end of image classification models which are used to make predictions. The backbone is used for feature formation of object detection models. FIG. 4C shows the architecture of the object detection using a YOLOv4 algorithm. It is contemplated that advanced versions of the YOLO architecture, for example the YOLOv5, may also be used. The object detection model comprises a convolutional neural network based down-sampling block for compression and formation of a backbone, a dense connection block for connecting layers of the neural network, and a spatial pyramid pooling block operable to increase the receptive field and separate out the most important features from the backbone.

[0045] The YOLOv4 object detection model may be trained with a predetermined epochs and batch size, for example 2000 epochs (pass) with a batch (sample) size of 64. The training process was monitored by conducting evaluation on the test dataset images after every 100 epochs. The metric that is used to calculate the accuracy of the predictions was MAP (Mean Average Precision) which calculates the percentage of correct predictions by the YOLOv4 model by comparing the number of correct predictions to the actual number of ground truths [6]. The MAP metric was calculated over an IOU (Intersection Over Union) of 0.5. The IOU is a metric which calculates the overlapping area between the predicted bounding box and the ground truth bounding box [6]. After training, the model achieved a MAP of 94.8%.

[0046] After the object detection model is trained, a real-time tracking algorithm with a deep association metric known as the DeepSORT [7] tracking algorithm is used to track the detected objects throughout the conveyor belt. The DeepSORT algorithm uses feature matching, where it checks the relevant features of an object in the first frame and compares it with the features of the objects in the successive frames. The DeepSORT algorithm tracks the objects when the features match according within a threshold in the following continuous (successive) frames. When tracking, the algorithm generates a unique tracking id for each object (also referred to as object ID). An example of the architecture of object ID assignment and tracking using the combination of YOLOv4 object detection model and the DeepSORT algorithm is shown in FIG. 4D [8]. The algorithm tracks each of the plurality of objects 204 when the features match within a threshold in the following continuous image frames. When tracking, the algorithm generates a unique tracking identifier for each object.

[0047] After the respective outputs are obtained from the gesture recognition module 212 and the object detection/tracking module 214, the outputs are combined by the classification module 216 to determine if a sorter picks a correct plastic type assigned to sorter.

[0048] It is contemplated that other computer vision/ image processing techniques known to a skilled person may be combined/supplemented to form further embodiments to supplement and/or replace the ML/AI algorithms. For example, instead of a one-stage feature detector, a multi-stage feature detector may be envisaged.

[0049] FIG. 5A shows a state diagram associated with the prediction of whether the sorter’s hand is opened or closed based on the output provided by the gesture recognition module 212. The distance between the key points is considered in order to predict if the hand is open (state 502) or closed (state 504). The hand is considered to be closed when the distance between all the 10 key points (five fingers) is below or equals to a pre-determined threshold set for each finger. The toggling between state 502 and state 504 is based on whether the distance between the key points exceeds or is below the key points distance, which may be re-calculated at every pre-determined interval.

[0050] FIG. 5B illustrates an opened position and a closed position on a human hand with key points on each finger. [0051] FIG. 5C shows a state diagram associated with the prediction of whether an object is determined to be present or absent based on the output provided by the object detection and tracking module 214. An object is determined to be present and in state 506 if the bounding box and tracking identifier are both present. An object is determined to be absent and in state 508 if the bounding box and/or tracking identifier disappear (that is, over the course of tracking, the identifier was present in some image frames before and then disappear in subsequent image frames).

[0052] The output from both state diagrams shown in FIG. 5A and 5C may be combined to determine if a sorter picks the correctly assigned plastic type. FIG. 5D shows the state diagram for this combination state diagram based on the output provided by the classification module 216. An object is determined not to be picked and in state 510 regardless of whether the hand is opened or closed (state 502 or 504), as long as the tracking identifier is present (state 506). An object is also determined not to be picked and in state 510 if the hand is opened (state 502) and the tracking identifier is missing (state 508). An object is determined to be picked and in state 512 if the hand is closed (state 504) and the tracking identifier is missing (state 508).

[0053] FIG. 5E illustrates the image or video capture of an object being picked by a sorter for sorting, corresponding to a state 510 (left side frame) to a state 512 (right side frame). The gesture recognition module 212 is shown to be capable of generating a landmark model of the back of the hand to determine is the hand is at an open or closed position.

[0054] It is contemplated that the object detection and tracking identifier is used to check if each of the object disappears over the predetermined period as indicated by the image frames and output the class of the object. This is based on the assumption that during picking of an object, the hand covers part of or the entirety of the object, and therefore, the object detection/tracking module 214 is not able to detect the object anymore. In other words, the bounding box and the tracking identifier of the object disappears, and the data is used to predict the disappearing object and its class, as shown in FIG. 5E.

[0055] In some embodiments, the processor 210 may include hardware components such as server computer(s) arranged in a distributed or non-distributed configuration to implement characterization databases. The hardware components may be supplemented by a database management system configured to compile one or more industry-specific characteristic databases. In some embodiments, the industry-specific characteristic databases may include analysis modules to correlate one or more dataset with an industry. Such analysis modules may include an expert rule database, a fuzzy logic system, or any other artificial intelligence module. [0056] In some embodiments, the method 100 further comprises a step of alerting the sorter via a wearable device if the object picked up for sorting is not assigned to the sorter. The wearable device may include a smart device arranged in data or signal communication with the processor 210. The alert may be in the form of a text notification, a sound, a vibration, a multimedia file, or combinations of the aforementioned. In some embodiments, the wearable device may be in the form of a ‘smart glove(s)’ capable of human machine interface, and transmitting and receiving data relating to the relative position of the sorter’s fingers and the grasped object. In some embodiments, the wearable device may be in the form of a smart watch.

[0057] In some embodiments, after an object is picked by a sorter, the processor may comprise an additional module for auditing the sorted objects, and generation of one or more reports thereafter.

[0058] FIG. 6 shows a server computer system 600 according to an embodiment. The server computer system 600 includes a communication interface 602 (e.g. configured to receive captured images from the image capturing device(s) 206). The server computer 600 further includes a processing unit 604 and a memory 606. The memory 606 may be used by the processing unit 604 to store, for example, data to be processed, such as data associated with the captured images and intermediate results output from the modules 212, 214, and/or final results output from the module 216. The server computer is configured to perform the method of FIG. 1, FIG. 3A, and/or FIG. 4A. It should be noted that the server computer system 600 can be a distributed system including a plurality of computers.

[0059] In some embodiments, the unique identifier as assigned to each object include information relating to the class of the object and the assigned sorter.

[0060] In some embodiments, a computer-readable medium is provided including program instructions, which, when executed by one or more processors, cause the one or more processors to perform the methods according to the embodiments described above. The computer-readable medium may include a non-transitory computer-readable medium.

[0061] In some embodiments, the ML/AI algorithms may be trained using supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning, deep learning methods. In some embodiments, the ML/AI algorithms may include algorithms such as neural networks, fuzzy logic, evolutionary algorithms etc.

[0062] It should be noted that the server computer system 200 may be a distributed system including a plurality of computers.

[0063] While the disclosure has been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.

References

[1] MediaPipe Hands, URL: https://google.github.io/mediapipe/solutions/hands.html

[2] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., & Berg, A. (2016). SSD: Single Shot MultiBox Detector. ECCV.

[3] Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C., Yong, M.G., Lee, J., Chang, W., Hua, W., Georg, M., & Grundmann, M. (2019). MediaPipe: A Framework for Building Perception Pipelines. ArXiv, abs/1906.08172.

[4] Bochkovskiy, A., Wang, C., & Liao, H. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. ArXiv, abs/2004.10934.

[5] Huang, Z., & Wang, J. (2020). DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection. Inf. Sci., 522, 241-258.

[6] Yohanandan, S. (2020, June 9). Map (mean average precision) might confuse you! Medium, https://towardsdatascience.com/map-mean-average-precision-mi ght-confuse-you- 5956flbfa9e2.

[7] Wojke, N., Bewley, A., & Paulus, D. (2017). Simple online and realtime tracking with a deep association metric. 2017 IEEE International Conference on Image Processing (ICIP), 3645- 3649.

[8] Dang, T.L., Nguyen, G., & Cao, T. (2020). OBJECT TRACKING USING IMPROVED DEEP SORT YOLOV3 ARCHITECTURE.