LEE JUNTAE (US)
CHANG SIMYUNG (US)
ARCHIT PARNAMI ET AL: "Few-Shot Keyword Spotting With Prototypical Networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 25 July 2020 (2020-07-25), XP091244533, DOI: 10.1145/3529399.3529443
ARCHIT PARNAMI ET AL: "Learning from Few Examples: A Summary of Approaches to Few-Shot Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 7 March 2022 (2022-03-07), XP091179121
CLAIMS WHAT IS CLAIMED IS: 1. A processor-implemented method for processing one or more data samples, comprising: determining one or more prototype representations based on a plurality of support samples associated with one or more classes of data samples, wherein each prototype representation is associated with one of the one or more classes; determining a task-agnostic open-set prototype representation, wherein the one or more prototype representations and the task-agnostic open-set prototype representation are determined in a same learned metric space; determining one or more distance metrics for each query sample of one or more query samples, wherein the one or more distance metrics are determined based on the one or more prototype representations and the task-agnostic open-set prototype representation; and classifying each query sample based on the one or more distance metrics, wherein each query sample is classified into one of the one or more classes associated with the one or more prototype representations or an open-set class associated with the task-agnostic open-set prototype representation. 2. The processor-implemented method of claim 1, wherein determining the one or more distance metrics for each query sample further comprises: determining a Euclidean distance metric between a given query sample and each prototype representation of the one or more prototype representations associated with the one or more classes; and determining a Euclidean distance metric between the given query sample and the task- agnostic open-set prototype representation. 3. The processor-implemented method of claim 2, further comprising: scaling the Euclidean distance metric between the given query sample and the task- agnostic open-set prototype representation using one or more learned scaling factors. 4. The processor-implemented method of claim 3, further comprising: scaling the Euclidean distance metric between the given query sample and each prototype representation of the one or more prototype representations using the one or more learned scaling factors. 5. The processor-implemented method of claim 3, wherein: the one or more learned scaling factors are determined as a first scalar value and a second scalar value; and the first scalar value and the second scalar value are learned based on a loss function that enforces the task-agnostic open-set prototype representation as a task-agnostic global- second-best classification for each query sample of the one or more query samples. 6. The processor-implemented method of claim 5, wherein: the one or more learned scaling factors are task-agnostic scaling factors; and the task-agnostic open-set prototype representation is a global-second best classification for a plurality of few-shot open-shot recognition (FSOSR) episodes performed over the data samples. 7. The processor-implemented method of claim 1, wherein classifying each query sample based on the one or more distance metrics comprises: determining a probability distribution over the one or more classes and the open-set class, wherein the probability distribution is determined based at least in part on a Euclidean distance metric determined between each query sample and a respective prototype associated with each class of the one or more classes and a Euclidean distance metric determined between each query sample and the task-agnostic open-set prototype representation; and classifying, based on the probability distribution, each query sample into one of the one or more classes or into the open-set class. 8. The processor-implemented method of claim 7, further comprising: performing open-set rejection (OSR) based on a set of classified query samples classified into the open-set class associated with the task-agnostic open-set prototype representation. 9. The processor-implemented method of claim 7, wherein classifying each query sample based on the one or more distance metrics further comprises: determining a probability that each query sample is included in the open-set class associated with the task-agnostic open-set prototype representation, based on the probability distribution; and comparing the determined probability to a pre-determined threshold. 10. The processor-implemented method of claim 9, further comprising: classifying a given query sample as being included in the open-set class based on a determination that the probability the given query sample is included in the open-set class is greater than the pre-determined threshold; and classifying the given query sample as being included in a closed-set class based on a determination that the probability the given query sample is included in the open-set class is not greater than the pre-determined threshold. 11. The processor-implemented method of claim 10, further comprising: classifying each query sample classified as being included in the closed-set class into a respective class of the one or more classes, wherein each query sample is classified based on maximizing a respective probability determined between each query sample and each respective class of the one or more classes. 12. The processor-implemented method of claim 11, wherein the probability is an argmax probability. 13. The processor-implemented method of claim 1, wherein classifying each query sample comprises: providing each query sample to a trained few-shot open-shot recognition (FSOSR) neural network classifier, wherein the trained FSOSR neural network classifier includes at least the task-agnostic open-set prototype representation and one or more distance scaling factors as learnable components. 14. The processor-implemented method of claim 13, wherein the trained FSOSR neural network classifier further includes one or more feature embedding networks as a learnable component. 15. The processor-implemented method of claim 14, wherein determining the one or more distance metrics for each query sample further comprises: determining an embedding for each query sample of the one or more query samples using the one or more feature embedding networks; determining the one or more prototype representations as an average embedding of a set of embeddings determined for the plurality of support samples associated with each class of the one or more classes; and determining each distance metric of the one or more distance metrics based on determining a Euclidean distance metric between the embedding determined for a given query sample and the embedding determined for each prototype representation of the one or more prototype representations associated with the one or more classes. 16. An apparatus for processing one or more data samples, comprising: at least one memory; and at least one processor coupled to the at least one memory and configured to: determine one or more prototype representations based on a plurality of support samples associated with one or more classes of data samples, wherein each prototype representation is associated with one of the one or more classes; determine a task-agnostic open-set prototype representation, wherein the one or more prototype representations and the task-agnostic open-set prototype representation are determined in a same learned metric space; determine one or more distance metrics for each query sample of one or more query samples, wherein the one or more distance metrics are determined based on the one or more prototype representations and the task-agnostic open-set prototype representation; and classify each query sample based on the one or more distance metrics, wherein each query sample is classified into one of the one or more classes associated with the one or more prototype representations or an open-set class associated with the task- agnostic open-set prototype representation. 17. The apparatus of claim 16, wherein, to determine the one or more distance metrics for each query sample, the at least one processor is further configured to: determine a Euclidean distance metric between a given query sample and each prototype representation of the one or more prototype representations associated with the one or more classes; and determine a Euclidean distance metric between the given query sample and the task- agnostic open-set prototype representation. 18. The apparatus of claim 17, wherein the at least one processor is further configured to: scale the Euclidean distance metric between the given query sample and the task- agnostic open-set prototype representation using one or more learned scaling factors. 19. The apparatus of claim 18, wherein the at least one processor is further configured to: scale the Euclidean distance metric between the given query sample and each prototype representation of the one or more prototype representations using the one or more learned scaling factors. 20. The apparatus of claim 19, wherein: the one or more learned scaling factors are determined as a first scalar value and a second scalar value; and the first scalar value and the second scalar value are learned based on a loss function that enforces the task-agnostic open-set prototype representation as a task-agnostic global- second-best classification for each query sample of the one or more query samples. 21. The apparatus of claim 20, wherein: the one or more learned scaling factors are task-agnostic scaling factors; and the task-agnostic open-set prototype representation is a global-second best classification for a plurality of few-shot open-shot recognition (FSOSR) episodes performed over the data samples. 22. The apparatus of claim 16, wherein, to classify each query sample based on the one or more distance metrics, the at least one processor is configured to: determine a probability distribution over the one or more classes and the open-set class, wherein the probability distribution is determined based at least in part on a Euclidean distance metric determined between each query sample and a respective prototype associated with each class of the one or more classes and a Euclidean distance metric determined between each query sample and the task-agnostic open-set prototype representation; and classify, based on the probability distribution, each query sample into one of the one or more classes or into the open-set class. 23. The apparatus of claim 22, wherein the at least one processor is further configured to: perform open-set rejection (OSR) based on a set of classified query samples classified into the open-set class associated with the task-agnostic open-set prototype representation. 24. The apparatus of claim 22, wherein, to classify each query sample based on the one or more distance metrics, the at least one processor is further configured to: determine a probability that each query sample is included in the open-set class associated with the task-agnostic open-set prototype representation, based on the probability distribution; and compare the determined probability to a pre-determined threshold. 25. The apparatus of claim 24, wherein the at least one processor is further configured to: classify a given query sample as being included in the open-set class based on a determination that the probability the given query sample is included in the open-set class is greater than the pre-determined threshold; and classify the given query sample as being included in a closed-set class based on a determination that the probability the given query sample is included in the open-set class is not greater than the pre-determined threshold. 26. The apparatus of claim 25, wherein the at least one processor is further configured to: classify each query sample classified as being included in the closed-set class into a respective class of the one or more classes, wherein each query sample is classified based on maximizing a respective probability determined between each query sample and each respective class of the one or more classes. 27. The apparatus of claim 26, wherein the probability is an argmax probability. 28. The apparatus of claim 16, wherein, to classify each query sample, the at least one processor is configured to: provide each query sample to a trained few-shot open-shot recognition (FSOSR) neural network classifier, wherein the trained FSOSR neural network classifier includes at least the task-agnostic open-set prototype representation and one or more distance scaling factors as learnable components. 29. The apparatus of claim 28, wherein the trained FSOSR neural network classifier further includes one or more feature embedding networks as a learnable component. 30. The apparatus of claim 29, wherein, to determine the one or more distance metrics for each query sample, the at least one processor is configured to: determine an embedding for each query sample of the one or more query samples using the one or more feature embedding networks; determine the one or more prototype representations as an average embedding of a set of embeddings determined for the plurality of support samples associated with each class of the one or more classes; and determine each distance metric of the one or more distance metrics based on determining a Euclidean distance metric between the embedding determined for a given query sample and the embedding determined for each prototype representation of the one or more prototype representations associated with the one or more classes. |
[0067] FIG. 6 is a diagram illustrating an example of a task-agnostic open-set prototype network 600 for performing FSOSR with task-agnostic (e.g., episode-agnostic or non-episode- specific) open-set recognition and/or rejection. In one illustrative example, the task-agnostic open-set prototype network 600 can perform FSOSR with adaptation to a varying open-set and/or varying open-set selection between episodes (e.g., between tasks). In some aspects, the task-agnostic open-set prototype network 600 can perform FSOSR by using a learned taskagnostic open-set prototype c ag to classify and reject open-set queries. The remaining closed- set queries can be classified into one or more closed-set classes based on using metric-based learning to learn a metric space in which distance metrics can classify the closed-set queries or other input samples.
[0068] In the context of the following discussion, an FSOSR task (e.g., also referred to as an FSOSR episode) can be given by:
[0069] Here, T represents an FSOSR task or episode; S n is the support set provided for the closed set samples is the open set (e.g., samples with a novel or unknown class, for which no support examples are given); Q is the set of queries (e.g., inference inputs) provided for the FSOSR task T; and N represents the number of classes included in the support set S n (e.g., N-way FSL is performed). The set of queries Q can include known queries Q s and unknown queries sampled from C s and respectively.
[0070] In some examples, each FSOSR episode (e.g., training task 7) can be configured based on sampling the closed and open sets C s and , respectively, where As was described previously, a closed class in a first episode may later be selected as an open class in a subsequent episode (e.g., because are randomly selected from the same class label space).
[0071] The support set S n is provided only for the closed set C s , where Here, M is the number of support samples provided for the n-th closed class and y (e.g., the label of the support sample is n. [0072] In one illustrative example, training can be performed wherein a machine learning model (e.g., a neural network model) associated with the task-agnostic open-set prototype network 6001 earns from a plurality of TV- way M-shot pseudo-FSOSR episodes (e.g., a plurality of TV- way M -shot pseudo-FSOSR tasks T, as given by Eq. (1) and described above). For example, each of the pseudo-FSOSR episodes can include TV known classes with M support examples per class (e.g., such that each pseudo-FSOSR episode includes a total of N*M support examples) and one or more pseudo-unknown (e.g., pseudo-open-set) classes without any support examples. The pseudo-episodes used in training can be designed to mimic the FSOSR inference task by subsampling classes as well as data points.
[0073] With respect to the FSOSR inference task, inference can be performed over episodes that are each associated with a support set 5 and a query' set Q. For example, FIG. 6 illustrates a support set S' (indicated as 612) that includes one or more closed sets (indicated as 613a, 613b, ... , 613n and each labeled with (K) to denote a closed set) of known classes for which support examples are provided. For example, the support set 612 can correspond to the support set S n described above (e.g., where In some aspects, the support set 612 can include a first closed set S 1 that includes M support samples for a first class, a second closed set S 2 that includes M support samples for a second class, ... , and an n-th closed set ,S N that includes M support samples for an n-th class.
[0074] FIG. 6 also illustrates a query set Q (indicated as 616) that includes both a closed set Q s of queries belonging to known classes K (e.g., indicated as 617) and an open set of queries belonging to unknown classes U without support examples (e.g., indicated as 619). It is noted that although the set of closed-set samples 613a, 613b, ... , 613n (e.g., of support set 612) and the closed-set of samples 617 (e.g., of the query set 616) may correspond to the same underlying known classes, it is not necessarily the case that the constituent samples within 613a, 613b, ... , 613n are the same as the constituent samples within 617.
[0075] In one illustrative example, the support set S (e.g., 612) can include M samples for each of the TV classes, as described above. The query set Q (e.g., 616) can include one or more queries from the TVknown classes (e.g., 617) and can further include one or more queries from Nu unknown classes (e.g., 619), where: [0076] Here, MQ can represent the number of queries for each class. In some cases, MQ can be equal to M, although MQ may also be greater than or less than M. At the time of inference, all classes in the evaluation data set may be unseen by the example task-agnostic open-set prototype network 600. In some examples, the evaluation data set can be the same as the query set 616. Inference can be performed using episodes that include N known classes with support samples (e.g., the set 617) and unknown open-set classes without support samples (e.g., the set 619). Note that in the inference episodes, an evaluation data set (e.g., 617) does not necessarily include M support examples for each of the N classes, as is the case for the support set S (e.g., 612).
[0077] As mentioned previously, the task-agnostic open-set prototype network 600 can perform FSOSR based at least in part on one or more metric-based learning approaches. For example, in an A-way M -shot episode, the task-agnostic open-set prototype network 600 can determine the class of a given query x based on the distance(s) between the query x and the closed set classes }. A representative feature (e.g., a prototype) of the class n E C s can be generated by averaging the features of the support samples in S n .
[0078] For example, the task-agnostic open-set prototype network 600 can determine or otherwise obtain N prototypes for the closed set C s by using the average of the embedded support samples of each class, n, to calculate the corresponding prototype for each class, where:
[0079] Here, / is an encoder (e.g., a feature embedding network) and In the example of FIG. 6, an encoder or feature embedding network 620 can be used to generate embeddings for each support sample included in the support set 612. In some examples, each sub-portion of the support set 612 (e.g., S 1 , ... , S N ) can be provided to a separate instance of the feature embedding network 620. In some examples, the sub-portions of the support set 612 can be provided to a single instance of the feature embedding network 620 in order to generate the corresponding embeddings for the support set 612.
[0080] As illustrated, the output of feature embedding network(s) 620 (e.g., the embeddings determined for each support sample included in support set 612) can be used to generate the N closed-set prototypes c 1 , c 2 , c N based on Eq. (2). The resulting closed-set prototypes 632 may be task-dependent prototypes, because each closed-set prototype c n is generated with a dependence on the particular selection of the closed sets within support set 612.
[0081] The feature embedding network(s) 620 may additionally be used to generate one or more embeddings based on receiving as input some or all of the query samples included in query set 616. For example, the feature embedding network(s) 620 can be used to determine the task-dependent closed-set prototypes c n 632 and may also be used to determine embeddings for each query sample of query set 616. Subsequently, metric-based classification can be performed for the closed set queries Q s 617 and the unknown/open set queries 619 based at least in part on analyzing the embeddings determined for each query (e.g., of query set 616) against the prototype embeddings c n 632, as will be described in greater depth below.
[0082] In some aspects, the feature embedding network(s) 620 may receive as input (e.g., for each episode) the support set 612 and the query set 616. In one illustrative example, feature embedding network 620 can be a neural network or other machine learning network that generates embeddings based on the support set 612 and/or query set 616 (e.g., the prototypes c n 632 can be determined by averaging the embeddings generated for each class N by the feature embedding network f 620).
[0083] Based on the task-dependent, closed-set prototypes c n , the task-agnostic open-set prototype network 600 can determine or otherwise obtain a probability distribution over the N known classes, such that the classification probability of a given query' x over each class n is given by:
[0084] In some aspects, the classification probability given in Eq. (3) can be proportional to the negative of a distance metric d(., .) (e.g., depicted in FIG. 6 as the distance metric 640). In one illustrative example, the distance metric 640 can be determined based on a Euclidean distance, d(a, a ’) = ||a - a || 2 , although various other distance metrics may also be utilized without departing from the scope of the present disclosure. Based on Eq. (3), the systems and techniques described herein can use the task-agnostic open-set prototype network 600 to minimize a negative log-probability of the true class. [0085] In one illustrative example, the probability distribution of Eq. (3) can be used to classify a given input query' x by determining the distance (e.g., using the distance metric 640/Eq. (3)) between the query example x and the prototypical representation c n 632 for each class. For example, if the input query example x is closest to class number three (e.g., of the N classes), then a relatively high probability can be determined for class three and a relatively lower probability for the remaining N-1 classes. For example, these probabilities can be determined based on Eq. (3), which itself can be determined based on the distance metric 640.
[0086] In one illustrative example, the systems and techniques described herein can further determine one or more task-agnostic open-set prototypes c ag , depicted in FIG. 6 as the taskagnostic open-set prototype 652. As will be described in greater depth below, the task-agnostic open-set prototype 652 can be a learnable prototype that can be used to classify and/or reject open-set queries across different tasks or episodes. In some aspects, the task-agnostic open-set prototype 652 can be generated and used to extend the previously described metric-based classification to the open-set queries 619 in addition to the closed-set queries 617. In some examples, the task-agnostic open-set prototype 652 is consistent with overall tasks (e.g., based on being task-agnostic).
[0087] Given in a task, the following two ordinal relations for a given input query x can be considered, based on the membership of the input query x to either the closed set Q s 617 or the unknown/open set 619: where y is the ground-truth class of x and and:
[0088] Combining Eqs. (4) and (5) provides a general relation regardless of the membership of the input query x:
[0089] In one illustrative example, Eq. (6) can be applied for all n, where Based on Eq. (6), the task-agnostic open-set prototype c ag 652 can be seen to be the nearest prototype for any query (e.g., of query set 616) except for the prototype c" corresponding to the query’s ground-truth class. In other words, when the input query x is chosen as an open-set sample (e.g., belonging to the open-set 619) the task-agnostic open-set prototype c ag 652 can reliably be used to reject the open-set sample x. As will be described in greater depth below, by learning the open-set prototype c ag 652 in a task-agnostic (e.g., episode-agnostic) manner, the performance of the example network 600 can be improved for metric-based FSOSR classification tasks.
[0090] As mentioned previously, the task-agnostic open-set prototype c ag 652 can be learned such that it represents the closest classification match for any given query sample of the query set 616, if the given query sample is selected as an open-set sample for which its true class is unavailable. In one illustrative example, the task-agnostic open-set prototype c ag 652 is learned such that it satisfies a global-second-best classification criteria for any given query sample (e.g., for a given query sample, the only classification with a greater probability based on Eq. (3) is the actual, true underlying class of the query sample when it belongs to the known closed set 617; if the query sample belongs to the unknown open set 619, the true underlying class is unavailable to the FSOSR network 600, and the task-agnostic open-set prototype c ag 652 is the best or most likely class for the open-set query sample within the context of the current episode).
[0091] The task-agnostic open-set prototype c ag 652 can be implemented as aZ)-dimensional learnable feature associated with FSOSR network 600, as mentioned previously. In some aspects, one or more learnable scaling factors can be used to better satisfy the ordinal relation of Eq. (6), such that a single c ag can be learned to satisfy Eq. (6) in a task-agnostic manner across multiple different tasks or episodes.
[0092] As illustrated in FIG. 6, the example task-agnostic open-set prototype network 600 can further include the learnable scaling factors θ w and/or 9 b . In some examples, θ w and/or 9 b can be scalar-valued. In one illustrative example, the learnable scaling factors θ w and/or 9 b can be used to determine a scaled distance metric 642, depicted in FIG. 6 as the scaled distance metric Here, the distance value d ag can be determined as the distance metric between f(x), the embedding of a query sample x, and the task-agnostic open-set prototype c ag 652 (e.g., In some aspects, the distance value d ag can be determined using the distance metric 640, which provides the distance value d ag as input to the scaled distance metric 642.
[0093] In one illustrative example, the learnable scaling factors θ w and/or θ b can be learned in conjunction or combination with the learnable task-agnostic open-set prototype 652. In some aspects, the example FSOSR network 600 can implement one or more (or all) of the feature embedding network(s) f 620, the task-agnostic open-set prototype c ag 652, the scaling factor θ w , and the scaling factor θ b as learnable components or learnable modules that are learned during training (e.g., as will be described in greater depth below).
[0094] In one illustrative example, the task-agnostic open-set prototype c ag 652 can be used to compute softmax probability outputs 660 of a given input query x for the N + 1 classes (e.g., representing the N closed-set classes + c ag ) as follows:
[0095] Here, n = 1, ... , N. In some aspects, based on the learned task-agnostic open-set prototype c ag 652 and the learned scaling factors θ w and θ b , a given input query sample x can be classified as an open-set sample or a closed-set sample using a classification block 690. For example, the given input query sample x can be classified as an open-set sample at classification block 690 based on determining that where 8 is a pre-defined threshold for classify ing open-set query samples. As illustrated in FIG. 6, if the classification block 690 determines that p ag (x) > δ is false (e.g., if classification block 690 determines instead p ag < δ), the input query sample x can otherwise be classified into the nearest closed (e.g., known) class ft, where The argmax-based classification of closed-set input query samples is depicted in FIG. 6 as the closed-set classification block 680.
[0096] As mentioned previously, the probability distributions p n (x) and p ag (x) of Eqs. (7) and (8), respectively, can be determined using the learnable components or modules given by the feature embedding network(s) f 620, the task-agnostic open-set prototype c ag 652, the scaling factor θ w , and the scaling factor θ b . In some examples, the feature embedding network(s) f 620, the task-agnostic open-set prototype c ag 652, the scaling factor θ w , and the scaling factor θ b can be learned to be conducive to such that the ordinal relation of Eq. (6) is attained or otherwise implemented. [0097] In one illustrative example, the example FSOSR network 600 can be trained to optimize its feature extractor (e.g., the feature embedding network(s) / 620), the task-agnostic open-set prototype c ag 652, and the scaling factors by minimizing a total loss function given by:
[0098] Here, represents an open-set loss. In some aspects, based on the task-agnostic open-set prototype c ag 652, the training process can consider (N + 7)-way classification, where 1, ... , N denote N classes in the closed-set(s) of the support set 613 and the closed query set 617, and (N+l) denotes the task-agnostic open-set class represented by the prototy pe c ag 652. In some examples the ground-truth class of the open-set samples can be set to N+1 during training, and Based on the above, the open-set loss can be determined as:
[0099] Here, is a hyperparameter to balance between the open and closed sets. In some examples, 1 when otherwise.
[0100] In some aspects, the example FSOSR network 600 can be explicitly forced (e.g., dunng training) to satisfy the ordinal relation of Eq. (6) by implementing a second-best loss for the closed set as follows:
[0101] Here, represents a second-best loss, where x, is a closed-set sample and represents the softmax probability of the task-agnostic open-set class, which is computed without considering its ground truth class y.
[0102] In some aspects, the examples FSOSR network 600 can additionally be trained to regularize the distances from the task-agnostic open-set prototype c ag 652 to the prototypes c n 632, and simultaneously the distances between the prototypes. In one illustrative example, the cosine entropy for each pair of the prototypes can be maximized by: [0103] Here, £ ppe represents a pairwise prototype entropy loss, where
N(N- 1 )/2 is the number of the pairs of the prototypes.
[0104] Finally, as mentioned previously, the feature extractor (e.g., the feature embedding network(s) / 620), the task-agnostic open-set prototype c ag 652, and the scaling factors can be optimized during training of the examples FSOSR network 600 by minimizing the total loss function given by Eq. (9).
[0105] FIG. 7 is a flowchart illustrating an example of a process 700 for performing few-shot open-set recognition (FSOSR) using a task-agnostic open-set prototype. At block 702, the process 700 includes determining one or more prototype representations based on a plurality of support samples associated with one or more classes of data samples.
[0106] Each prototype representation can be associated with one of the one or more classes. For example, the one or more prototype representations can be determined using the example task-agnostic open-set prototype network 600 illustrated in FIG. 6. In some examples, the one or more prototype representations can be determined by a machine learning and/or neural network encoder, such as the feature embedding network(s) 620 illustrated in FIG. 6. The one or more prototype representations can be determined based on receiving as input the plurality of support samples, wherein each support sample is associated with (e.g., labeled with) a known class. For example, the one or more prototype representations can be determined based on a plurality of support samples included in the support set 612 illustrated in FIG. 6. In some cases, the plurality of support samples can include one or more sub-portions where each sub-portion includes the support samples associated with a particular one of the one or more classes of data samples. For example, the plurality of support samples can include one or more (or all) of the sub-portions of support samples 613a, 613b, ... , 613n illustrated in FIG. 6. In some examples, the encoder (e.g., feature embedding network(s) 620) can generate a plurality of embeddings for the plurality of support samples associated with each class. A prototype representation can then be determined for each class as the mean of its embedded support samples. For example, the mean of the embedded support samples for each class can be determined based on a Euclidean distance metric to generate a set of task-dependent closed-set prototypes such as the prototypes 632 illustrated in FIG. 6.
[0107] At block 704, the process 700 includes determining a task-agnostic open-set prototype representation. For example, the task-agnostic open-set prototype representation can include the task-agnostic open-set prototype representation 652 illustrated in FIG 6. In some examples, the one or more prototype representations and the task-agnostic open-set prototype representation can be determined in a same learned metric space. For example, the one or more prototype representations and the task-agnostic open-set prototype representation can be determined in a same learned embedding space of a neural network encoder, such as the feature embedding network(s) 620 illustrated in FIG. 6 (e.g., the task-agnostic open-set prototype representation can be determined in the same embedding space associated with the feature embedding network(s) 620 used to generate embeddings associated with the one or more prototype representations). [0108] In some examples, the plurality of support samples used to generate the one or more prototype representations (e.g., the support set 612 illustrated in FIG.6) can be obtained for or otherwise associated with a single few-shot learning (FSL) episode. In some cases, the task- agnostic open-set prototype representation can be an episode-agnostic open-set prototype representation. [0109] At block 706, the process 700 includes determining one or more distance metrics for each query sample of one or more query samples, wherein the one or more distance metrics are determined based on the one or more prototype representations and the task-agnostic open-set prototype representation. For example, the one or more distance metrics can be determined using a distance function 640, as illustrated in FIG.6. In some examples, the distance metric can be a Euclidean distance between each query sample and each of the prototype representations and the task-agnostic open-set prototype representation. In some cases, the one or more query samples can be included in a query set, such as the query set 616 illustrated in FIG.6. The one or more query samples can include open-set queries (e.g., included in an open- set of queries 619) and closed-set queries (e.g., included in a closed-set of queries 617). [0110] In some examples, determining the one or more distance metrics further includes scaling the Euclidean distance metric between a given query sample and the task-agnostic open-set prototype representation using one or more learned scaling factors. For example, the Euclidean distance metric can be scaled using one or more learned scaling factors such as the learned scaling factors 642 illustrated in FIG.6. In some cases, determining the one or more distance metrics can additionally, or alternatively, include scaling the Euclidean distance metric between the given query sample and each prototype representation of the one or more prototype representations using the one or more learned scaling factors. [0111] In some cases, the one or more learned scaling factors can be determined as a first scalar value and a second scalar value. For example, the first scalar value and the second scalar value can be learned based on a loss function that enforces the task-agnostic open-set prototype representation as a task-agnostic global-second-best classification for each query sample of the one or more query samples. In some examples, the one or more learned scaling factors can be learned as task-agnostic scaling factors. In some cases, the task-agnostic open-set prototype representation may be a global-second best classification for a plurality of few-shot open-shot recognition (FSOSR) episodes performed over the data samples. [0112] At block 708, the process 700 includes classifying each query sample based on the one or more distance metrics, wherein each query sample is classified into one of the one or more classes associated with the one or more prototype representations or an open-set class associated with the task-agnostic open-set prototype representation. [0113] For example, the query samples can be classified using one or more softmax outputs, such as the N+1 softmax outputs 660 illustrated in FIG.6. In some examples, the one or more softmax outputs can include one or more probability distributions determined based on a distance metric determined between a given query sample and each of the one or more closed set prototypes and a probability distribution determined based on a distance metric (e.g., the scaled distance metric 642) determined between the given query sample and the task-agnostic open-set prototype. In some examples, the softmax classification can be based on or otherwise utilize the distance metrics described above with respect to block 706. For example, the softmax classification can classify a given query sample into one of the classes associated with the prototype representations or into an open-set class associated with the task-agnostic open- set prototype representation. [0114] In some examples, classifying each query sample based on the one or more distance metrics can include determining a probability distribution over the one or more classes and the open-set class. For example, the probability distribution can be determined based at least in part on the Euclidean distance metric determined between each query sample and a respective prototype associated with each class of the one or more classes. A probability distribution can additionally, or alternatively, be determined based at least in part on a Euclidean distance metric determined between each query sample and the task-agnostic open-set prototype representation. In some aspects, each query sample can be classified into one of the one or more closed-set classes or into the open-set class, based on the probability distribution(s). [0115] In some cases, open-set rejection (OSR) can be performed based on a set of classified query samples classified into the open-set class associated with the task-agnostic open-set prototype representation. For example, performing OSR can include determining a probability that each query sample is included in the open-set class associated with the task-agnostic open- set prototype representation, based on the probability distribution determined for the open-set class (e.g., using the task-agnostic open-set prototype representation and/or the scaled distance metric between the given query sample and the task-agnostic open-set prototype representation), and comparing the determined probability to a pre-determined threshold. In some cases, classifying each query sample based on the one or more distance metrics can further include classifying a given query sample as being included in the open-set class based on a determination that the probability the given query sample is included in the open-set class is greater than the pre-determined threshold, and classifying the given query sample as being included in a closed-set class based on a determination that the probability the given query sample is included in the open-set class is not greater than the pre-determined threshold. IN some cases, each query sample classified as being included in the closed-set class can be further into a respective class of the one or more classes. For example, each query sample can be classified based on maximizing a respective probability determined between each query sample and each respective class of the one or more classes. In some examples, the respective probability can be an argmax probability. [0116] In some examples, classifying each query sample can include providing each query sample to a trained few-shot open-shot recognition (FSOSR) neural network classifier, wherein the trained FSOSR neural network classifier includes at least the task-agnostic open-set prototype representation and one or more distance scaling factors as learnable components. In some cases, the trained FSOSR neural network classifier can further include one or more feature embedding networks as a learnable component. For example, the trained FSOSR neural network classifier can determine the one or more distance metrics for each query sample based on determining an embedding for each query sample of the one or more query samples using the one or more feature embedding networks. For example, using the embeddings, the one or more prototype representations can be determined as an average embedding of a set of embeddings determined for the plurality of support samples associated with each class of the one or more classes. Each distance metric of the one or more distance metrics can subsequently be generated based on a Euclidean distance metric between the embedding determined for a given query sample and the embedding determined for each prototype representation of the one or more prototype representations associated with the one or more classes. [0117] Further aspects and examples related to the present disclosure are included in Appendix A attached hereto. [0118] In some examples, the processes described herein (e.g., process 700 and/or any other process described herein) may be performed by a computing device, apparatus, or system. In one example, the process 700 can be performed by a computing device or system having the computing device architecture 800 of FIG.8. The computing device, apparatus, or system can include any suitable device, such as a mobile device (e.g., a mobile phone), a desktop computing device, a tablet computing device, a wearable device (e.g., a VR headset, an AR headset, AR glasses, a network-connected watch or smartwatch, or other wearable device), a server computer, an autonomous vehicle or computing device of an autonomous vehicle, a robotic device, a laptop computer, a smart television, a camera, and/or any other computing device with the resource capabilities to perform the processes described herein, including the process 700 and/or any other process described herein. In some cases, the computing device or apparatus may include various components, such as one or more input devices, one or more output devices, one or more processors, one or more microprocessors, one or more microcomputers, one or more cameras, one or more sensors, and/or other component(s) that are configured to carry out the steps of processes described herein. In some examples, the computing device may include a display, a network interface configured to communicate and/or receive the data, any combination thereof, and/or other component(s). The network interface may be configured to communicate and/or receive Internet Protocol (IP) based data or other type of data. [0119] The components of the computing device can be implemented in circuitry. For example, the components can include and/or can be implemented using electronic circuits or other electronic hardware, which can include one or more programmable electronic circuits (e.g., microprocessors, graphics processing units (GPUs), digital signal processors (DSPs), central processing units (CPUs), and/or other suitable electronic circuits), and/or can include and/or be implemented using computer software, firmware, or any combination thereof, to perform the various operations described herein. [0120] The process 700 is illustrated as a logical flow diagram, the operation of which represents a sequence of operations that can be implemented in hardware, computer instructions, or a combination thereof. In the context of computer instructions, the operations represent computer-executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the processes. [0121] Additionally, the process 700 and/or any other process described herein may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) executing collectively on one or more processors, by hardware, or combinations thereof. As noted above, the code may be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program comprising a plurality of instructions executable by one or more processors. The computer-readable or machine-readable storage medium may be non-transitory. [0122] FIG. 8 illustrates an example computing device architecture 800 of an example computing device which can implement the various techniques described herein. In some examples, the computing device can include a mobile device, a wearable device, an extended reality device (e.g., a virtual reality (VR) device, an augmented reality (AR) device, or a mixed reality (MR) device), a personal computer, a laptop computer, a video server, a vehicle (or computing device of a vehicle), or other device. The components of computing device architecture 800 are shown in electrical communication with each other using connection 805, such as a bus. The example computing device architecture 800 includes a processing unit (CPU or processor) 810 and computing device connection 805 that couples various computing device components including computing device memory 815, such as read only memory (ROM) 820 and random-access memory (RAM) 825, to processor 810. [0123] Computing device architecture 800 can include a cache of high-speed memory connected directly with, in close proximity to, or integrated as part of processor 810. Computing device architecture 800 can copy data from memory 815 and/or the storage device 830 to cache 812 for quick access by processor 810. In this way, the cache can provide a performance boost that avoids processor 810 delays while waiting for data. These and other engines can control or be configured to control processor 810 to perform various actions. Other computing device memory 815 may be available for use as well. Memory 815 can include multiple different types of memory with different performance characteristics. Processor 810 can include any general-purpose processor and a hardware or software service, such as service 1832, service 2834, and service 3836 stored in storage device 830, configured to control processor 810 as well as a special-purpose processor where software instructions are incorporated into the processor design. Processor 810 may be a self-contained system, containing multiple cores or processors, a bus, memory controller, cache, etc. A multi-core processor may be symmetric or asymmetric. [0124] To enable user interaction with the computing device architecture 800, input device 845 can represent any number of input mechanisms, such as a microphone for speech, a touch- sensitive screen for gesture or graphical input, keyboard, mouse, motion input, speech and so forth. Output device 835 can also be one or more of a number of output mechanisms known to those of skill in the art, such as a display, projector, television, speaker device, etc. In some instances, multimodal computing devices can enable a user to provide multiple types of input to communicate with computing device architecture 800. Communication interface 840 can generally govern and manage the user input and computing device output. There is no restriction on operating on any particular hardware arrangement and therefore the basic features here may easily be substituted for improved hardware or firmware arrangements as they are developed. [0125] Storage device 830 is a non-volatile memory and can be a hard disk or other types of computer readable media which can store data that are accessible by a computer, such as magnetic cassettes, flash memory cards, solid state memory devices, digital versatile disks, cartridges, random access memories (RAMs) 825, read only memory (ROM) 820, and hybrids thereof. Storage device 830 can include services 832, 834, 836 for controlling processor 810. Other hardware or software modules or engines are contemplated. Storage device 830 can be connected to the computing device connection 805. In one aspect, a hardware module that performs a particular function can include the software component stored in a computer- readable medium in connection with the necessary hardware components, such as processor 810, connection 805, output device 835, and so forth, to carry out the function. [0126] Aspects of the present disclosure are applicable to any suitable electronic device (such as security systems, smartphones, tablets, laptop computers, vehicles, drones, or other devices) including or coupled to one or more active depth sensing systems. While described below with respect to a device having or coupled to one light projector, aspects of the present disclosure are applicable to devices having any number of light projectors and are therefore not limited to specific devices. [0127] The term “device” is not limited to one or a specific number of physical objects (such as one smartphone, one controller, one processing system and so on). As used herein, a device may be any electronic device with one or more parts that may implement at least some portions of this disclosure. While the below description and examples use the term “device” to describe various aspects of this disclosure, the term “device” is not limited to a specific configuration, type, or number of objects. Additionally, the term “system” is not limited to multiple components or specific aspects. For example, a system may be implemented on one or more printed circuit boards or other substrates and may have movable or static components. While the below description and examples use the term “system” to describe various aspects of this disclosure, the term “system” is not limited to a specific configuration, type, or number of objects. [0128] Specific details are provided in the description above to provide a thorough understanding of the aspects and examples provided herein. However, it will be understood by one of ordinary skill in the art that the aspects may be practiced without these specific details. For clarity of explanation, in some instances the present technology may be presented as including individual functional blocks including functional blocks comprising devices, device components, steps or routines in a method embodied in software, or combinations of hardware and software. Additional components may be used other than those shown in the figures and/or described herein. For example, circuits, systems, networks, processes, and other components may be shown as components in block diagram form in order not to obscure the aspects in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the aspects. [0129] Individual aspects may be described above as a process or method which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process is terminated when its operations are completed, but could have additional steps not included in a figure. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination can correspond to a return of the function to the calling function or the main function. [0130] Processes and methods according to the above-described examples can be implemented using computer-executable instructions that are stored or otherwise available from computer-readable media. Such instructions can include, for example, instructions and data which cause or otherwise configure a general-purpose computer, special purpose computer, or a processing device to perform a certain function or group of functions. Portions of computer resources used can be accessible over a network. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, firmware, source code, etc. [0131] The term “computer-readable medium” includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other mediums capable of storing, containing, or carrying instruction(s) and/or data. A computer-readable medium may include a non-transitory medium in which data can be stored and that does not include carrier waves and/or transitory electronic signals propagating wirelessly or over wired connections. Examples of a non-transitory medium may include, but are not limited to, a magnetic disk or tape, optical storage media such as flash memory, memory or memory devices, magnetic or optical disks, flash memory, USB devices provided with non-volatile memory, networked storage devices, compact disk (CD) or digital versatile disk (DVD), any suitable combination thereof, among others. A computer-readable medium may have stored thereon code and/or machine-executable instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, an engine, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, or the like. [0132] In some aspects the computer-readable storage devices, mediums, and memories can include a cable or wireless signal containing a bit stream and the like. However, when mentioned, non-transitory computer-readable storage media expressly exclude media such as energy, carrier signals, electromagnetic waves, and signals per se. [0133] Devices implementing processes and methods according to these disclosures can include hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof, and can take any of a variety of form factors. When implemented in software, firmware, middleware, or microcode, the program code or code segments to perform the necessary tasks (e.g., a computer-program product) may be stored in a computer-readable or machine-readable medium. A processor(s) may perform the necessary tasks. Typical examples of form factors include laptops, smart phones, mobile phones, tablet devices or other small form factor personal computers, personal digital assistants, rackmount devices, standalone devices, and so on. Functionality described herein also can be embodied in peripherals or add-in cards. Such functionality can also be implemented on a circuit board among different chips or different processes executing in a single device, by way of further example. [0134] The instructions, media for conveying such instructions, computing resources for executing them, and other structures for supporting such computing resources are example means for providing the functions described in the disclosure. [0135] In the foregoing description, aspects of the application are described with reference to specific aspects thereof, but those skilled in the art will recognize that the application is not limited thereto. Thus, while illustrative aspects of the application have been described in detail herein, it is to be understood that the inventive concepts may be otherwise variously embodied and employed, and that the appended claims are intended to be construed to include such variations, except as limited by the prior art. Various features and aspects of the above- described application may be used individually or jointly. Further, aspects can be utilized in any number of environments and applications beyond those described herein without departing from the broader spirit and scope of the specification. The specification and drawings are, accordingly, to be regarded as illustrative rather than restrictive. For the purposes of illustration, methods were described in a particular order. It should be appreciated that in alternate aspects, the methods may be performed in a different order than that described. [0136] One of ordinary skill will appreciate that the less than (“<”) and greater than (“>”) symbols or terminology used herein can be replaced with less than or equal to (“^”) and greater than or equal to (“^”) symbols, respectively, without departing from the scope of this description. [0137] Where components are described as being “configured to” perform certain operations, such configuration can be accomplished, for example, by designing electronic circuits or other hardware to perform the operation, by programming programmable electronic circuits (e.g., microprocessors, or other suitable electronic circuits) to perform the operation, or any combination thereof. [0138] The phrase “coupled to” refers to any component that is physically connected to another component either directly or indirectly, and/or any component that is in communication with another component (e.g., connected to the other component over a wired or wireless connection, and/or other suitable communication interface) either directly or indirectly. [0139] Claim language or other language reciting “at least one of” a set and/or “one or more” of a set indicates that one member of the set or multiple members of the set (in any combination) satisfy the claim. For example, claim language reciting “at least one of A and B” or “at least one of A or B” means A, B, or A and B. In another example, claim language reciting “at least one of A, B, and C” or “at least one of A, B, or C” means A, B, C, or A and B, or A and C, or B and C, or A and B and C. The language “at least one of” a set and/or “one or more” of a set does not limit the set to the items listed in the set. For example, claim language reciting “at least one of A and B” or “at least one of A or B” can mean A, B, or A and B, and can additionally include items not listed in the set of A and B. [0140] The various illustrative logical blocks, modules, engines, circuits, and algorithm steps described in connection with the aspects disclosed herein may be implemented as electronic hardware, computer software, firmware, or combinations thereof. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, engines, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. [0141] The techniques described herein may also be implemented in electronic hardware, computer software, firmware, or any combination thereof. Such techniques may be implemented in any of a variety of devices such as general purposes computers, wireless communication device handsets, or integrated circuit devices having multiple uses including application in wireless communication device handsets and other devices. Any features described as modules or components may be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques may be realized at least in part by a computer-readable data storage medium comprising program code including instructions that, when executed, performs one or more of the methods described above. The computer-readable data storage medium may form part of a computer program product, which may include packaging materials. The computer-readable medium may comprise memory or data storage media, such as random-access memory (RAM) such as synchronous dynamic random-access memory (SDRAM), read-only memory (ROM), non-volatile random-access memory (NVRAM), electrically erasable programmable read-only memory (EEPROM), FLASH memory, magnetic or optical data storage media, and the like. The techniques additionally, or alternatively, may be realized at least in part by a computer- readable communication medium that carries or communicates program code in the form of instructions or data structures and that can be accessed, read, and/or executed by a computer, such as propagated signals or waves. [0142] The program code may be executed by a processor, which may include one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, an application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Such a processor may be configured to perform any of the techniques described in this disclosure. A general-purpose processor may be a microprocessor; but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure, any combination of the foregoing structure, or any other structure or apparatus suitable for implementation of the techniques described herein. [0143] Illustrative aspects of the disclosure include: [0144] Aspect 1: A method (e.g., a processor-implemented method) for processing one or more data samples, the method comprising: determining one or more prototype representations based on a plurality of support samples associated with one or more classes of data samples, wherein each prototype representation is associated with one of the one or more classes; determining a task-agnostic open-set prototype representation, wherein the one or more prototype representations and the task-agnostic open-set prototype representation are determined in a same learned metric space; determining one or more distance metrics for each query sample of one or more query samples, wherein the one or more distance metrics are determined based on the one or more prototype representations and the task-agnostic open-set prototype representation; and classifying each query sample based on the one or more distance metrics, wherein each query sample is classified into one of the one or more classes associated with the one or more prototype representations or an open-set class associated with the task- agnostic open-set prototype representation. [0145] Aspect 2: The processor-implemented method of Aspect 1, wherein determining the one or more distance metrics for each query sample further comprises: determining a Euclidean distance metric between a given query sample and each prototype representation of the one or more prototype representations associated with the one or more classes; and determining a Euclidean distance metric between the given query sample and the task-agnostic open-set prototype representation. [0146] Aspect 3: The processor-implemented method of Aspect 2, further comprising: scaling the Euclidean distance metric between the given query sample and the task-agnostic open-set prototype representation using one or more learned scaling factors. [0147] Aspect 4: The processor-implemented method of Aspect 3, further comprising: scaling the Euclidean distance metric between the given query sample and each prototype representation of the one or more prototype representations using the one or more learned scaling factors. [0148] Aspect 5: The processor-implemented method of any of Aspects 3 to 4, wherein: the one or more learned scaling factors are determined as a first scalar value and a second scalar value; and the first scalar value and the second scalar value are learned based on a loss function that enforces the task-agnostic open-set prototype representation as a task-agnostic global- second-best classification for each query sample of the one or more query samples. [0149] Aspect 6: The processor-implemented method of Aspect 5, wherein: the one or more learned scaling factors are task-agnostic scaling factors; and the task-agnostic open-set prototype representation is a global-second best classification for a plurality of few-shot open- shot recognition (FSOSR) episodes performed over the data samples. [0150] Aspect 7: The processor-implemented method of any of Aspects 1 to 6, wherein classifying each query sample based on the one or more distance metrics comprises: determining a probability distribution over the one or more classes and the open-set class, wherein the probability distribution is determined based at least in part on a Euclidean distance metric determined between each query sample and a respective prototype associated with each class of the one or more classes and a Euclidean distance metric determined between each query sample and the task-agnostic open-set prototype representation; and classifying, based on the probability distribution, each query sample into one of the one or more classes or into the open-set class. [0151] Aspect 8: The processor-implemented method of Aspect 7, further comprising: performing open-set rejection (OSR) based on a set of classified query samples classified into the open-set class associated with the task-agnostic open-set prototype representation. [0152] Aspect 9: The processor-implemented method of any of Aspects 7 to 8, wherein classifying each query sample based on the one or more distance metrics further comprises: determining a probability that each query sample is included in the open-set class associated with the task-agnostic open-set prototype representation, based on the probability distribution; and comparing the determined probability to a pre-determined threshold. [0153] Aspect 10: The processor-implemented method of Aspect 9, further comprising: classifying a given query sample as being included in the open-set class based on a determination that the probability the given query sample is included in the open-set class is greater than the pre-determined threshold; and classifying the given query sample as being included in a closed-set class based on a determination that the probability the given query sample is included in the open-set class is not greater than the pre-determined threshold. [0154] Aspect 11: The processor-implemented method of Aspect 10, further comprising: classifying each query sample classified as being included in the closed-set class into a respective class of the one or more classes, wherein each query sample is classified based on maximizing a respective probability determined between each query sample and each respective class of the one or more classes. [0155] Aspect 12: The processor-implemented method of Aspect 11, wherein the probability is an argmax probability. [0156] Aspect 13: The processor-implemented method of any of Aspects 1 to 12, wherein classifying each query sample comprises: providing each query sample to a trained few-shot open-shot recognition (FSOSR) neural network classifier, wherein the trained FSOSR neural network classifier includes at least the task-agnostic open-set prototype representation and one or more distance scaling factors as learnable components. [0157] Aspect 14: The processor-implemented method of Aspect 13, wherein the trained FSOSR neural network classifier further includes one or more feature embedding networks as a learnable component. [0158] Aspect 15: The processor-implemented method of Aspect 14, wherein determining the one or more distance metrics for each query sample further comprises: determining an embedding for each query sample of the one or more query samples using the one or more feature embedding networks; determining the one or more prototype representations as an average embedding of a set of embeddings determined for the plurality of support samples associated with each class of the one or more classes; and determining each distance metric of the one or more distance metrics based on determining a Euclidean distance metric between the embedding determined for a given query sample and the embedding determined for each prototype representation of the one or more prototype representations associated with the one or more classes. [0159] Aspect 16: The processor-implemented method of any of Aspects 1 to 15, wherein the plurality of support samples are obtained for a single few-shot learning (FSL) episode and the task-agnostic open-set prototype representation is an episode-agnostic open-set prototype representation. [0160] Aspect 17: An apparatus for processing one or more data samples, comprising: at least one memory; and at least one processor coupled to the at least one memory and configured to: determine one or more prototype representations based on a plurality of support samples associated with one or more classes of data samples, wherein each prototype representation is associated with one of the one or more classes; determine a task-agnostic open-set prototype representation, wherein the one or more prototype representations and the task-agnostic open- set prototype representation are determined in a same learned metric space; determine one or more distance metrics for each query sample of one or more query samples, wherein the one or more distance metrics are determined based on the one or more prototype representations and the task-agnostic open-set prototype representation; and classify each query sample based on the one or more distance metrics, wherein each query sample is classified into one of the one or more classes associated with the one or more prototype representations or an open-set class associated with the task-agnostic open-set prototype representation. [0161] Aspect 18: The apparatus of Aspect 17, wherein to determine the one or more distance metrics for each query sample, the at least one processor is further configured to: determine a Euclidean distance metric between a given query sample and each prototype representation of the one or more prototype representations associated with the one or more classes; and determine a Euclidean distance metric between the given query sample and the task-agnostic open-set prototype representation. [0162] Aspect 19: The apparatus of Aspect 18, wherein the at least one processor is further configured to: scale the Euclidean distance metric between the given query sample and the task-agnostic open-set prototype representation using one or more learned scaling factors. [0163] Aspect 20: The apparatus of Aspect 19, wherein the at least one processor is further configured to: scale the Euclidean distance metric between the given query sample and each prototype representation of the one or more prototype representations using the one or more learned scaling factors. [0164] Aspect 21: The apparatus of any of Aspects 19 to 20, wherein: the one or more learned scaling factors are determined as a first scalar value and a second scalar value; and the first scalar value and the second scalar value are learned based on a loss function that enforces the task-agnostic open-set prototype representation as a task-agnostic global-second-best classification for each query sample of the one or more query samples. [0165] Aspect 22: The apparatus of Aspect 21, wherein: the one or more learned scaling factors are task-agnostic scaling factors; and the task-agnostic open-set prototype representation is a global-second best classification for a plurality of few-shot open-shot recognition (FSOSR) episodes performed over the data samples. [0166] Aspect 23: The apparatus of any of Aspects 17 to 22, wherein to classify each query sample based on the one or more distance metric, the at least one processor is configured to: determine a probability distribution over the one or more classes and the open-set class, wherein the probability distribution is determined based at least in part on a Euclidean distance metric determined between each query sample and a respective prototype associated with each class of the one or more classes and a Euclidean distance metric determined between each query sample and the task-agnostic open-set prototype representation; and classify, based on the probability distribution, each query sample into one of the one or more classes or into the open-set class. [0167] Aspect 24: The apparatus of Aspect 23, wherein the at least one processor is further configured to: perform open-set rejection (OSR) based on a set of classified query samples classified into the open-set class associated with the task-agnostic open-set prototype representation. [0168] Aspect 25: The apparatus of any of Aspects 23 to 24, wherein to classify each query sample based on the one or more distance metrics, the at least one processor is further configured to: determine a probability that each query sample is included in the open-set class associated with the task-agnostic open-set prototype representation, based on the probability distribution; and compare the determined probability to a pre-determined threshold. [0169] Aspect 26: The apparatus of Aspect 25, wherein the at least one processor is further configured to: classify a given query sample as being included in the open-set class based on a determination that the probability the given query sample is included in the open-set class is greater than the pre-determined threshold; and classify the given query sample as being included in a closed-set class based on a determination that the probability the given query sample is included in the open-set class is not greater than the pre-determined threshold. [0170] Aspect 27: The apparatus of Aspect 26, wherein the at least one processor is further configured to classify each query sample classified as being included in the closed-set class into a respective class of the one or more classes, wherein each query sample is classified based on maximizing a respective probability determined between each query sample and each respective class of the one or more classes. [0171] Aspect 28: The apparatus of Aspect 27, wherein the probability is an argmax probability. [0172] Aspect 29: The apparatus of any of Aspects 17 to 28, wherein to classify each query sample, the at least one processor is configured to: provide each query sample to a trained few- shot open-shot recognition (FSOSR) neural network classifier, wherein the trained FSOSR neural network classifier includes at least the task-agnostic open-set prototype representation and one or more distance scaling factors as learnable components. [0173] Aspect 30: The apparatus of Aspect 29, wherein the trained FSOSR neural network classifier further includes one or more feature embedding networks as a learnable component. [0174] Aspect 31: The apparatus of Aspect 30, wherein to determine the one or more distance metrics for each query sample, the at least one processor is configured to: determine an embedding for each query sample of the one or more query samples using the one or more feature embedding networks; determine the one or more prototype representations as an average embedding of a set of embeddings determined for the plurality of support samples associated with each class of the one or more classes; and determine each distance metric of the one or more distance metrics based on determining a Euclidean distance metric between the embedding determined for a given query sample and the embedding determined for each prototype representation of the one or more prototype representations associated with the one or more classes. [0175] Aspect 32: The apparatus of any of Aspects 17 to 31, wherein the plurality of support samples are obtained for a single few-shot learning (FSL) episode and the task-agnostic open- set prototype representation is an episode-agnostic open-set prototype representation. [0176] Aspect 33: A computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations according to any of Aspects 1 to 32. [0177] Aspect 34: An apparatus for processing one or more data samples, comprising one or more means for performing operations according to any of Aspects 1 to 32.