Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
TRANSFER LEARNING FOR RADIO FREQUENCY FILTER TUNING
Document Type and Number:
WIPO Patent Application WO/2023/151953
Kind Code:
A1
Abstract:
A computer‐implemented method performed by a device (201, 1000) configured with a reinforcement learning, RL, agent (203) for radio frequency, RF, filter tuning is provided The method includes obtaining (911) a scattering parameter, S‐parameter, reading for an RF filter. The method further includes generating (913) a value for influencing a tuning mechanism of the RF filter based on the S‐parameter reading, wherein the RL agent of the target simulator has transferred learning from a RL agent trained in a source simulator to tune a first simulation of the RF filter and subsequent training of the RL agent in the target simulator. The method further includes signaling (915) the value to a controller for automatic execution of the value on the tuning mechanism of the RF filter.

Inventors:
NIMARA DOUMITROU DANIIL (SE)
MALEK MOHAMMADI MOHAMMADREZA (SE)
HUANG VINCENT (SE)
WEI JIEQIANG (SE)
SKIEBE MARTIN (DE)
Application Number:
PCT/EP2023/051805
Publication Date:
August 17, 2023
Filing Date:
January 25, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERICSSON TELEFON AB L M (SE)
International Classes:
G06N3/045; G06N3/092; G06N3/096; H01J23/20
Domestic Patent References:
WO2020242367A12020-12-03
Other References:
SIMON LINDSTÅHL: "Reinforcement Learning with Imitation for Cavity Filter Tuning: Solving problems by throwing DIRT at them", June 2019 (2019-06-01), XP055764605, Retrieved from the Internet [retrieved on 20210113]
LINDSTAHL, S.: "Reinforcement Learning with Imitation for Cavity Filter Tuning", SOLVING PROBLEMS BY THROWING DIRT AT THEM (DISSERTATION, 26 January 2022 (2022-01-26), Retrieved from the Internet
HARSCHER, R. VAHLDIECKS. AMARI: "Automated filter tuning using generalized low-pass prototype networks and gradient-based parameter extraction", IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, vol. 49, no. 12, 2001, pages 2532 - 2538, XP011038510
Attorney, Agent or Firm:
ERICSSON (SE)
Download PDF:
Claims:
CLAIMS:

1. A computer-implemented method performed by a device (201, 1000) configured with a reinforcement learning, RL, agent (203) for radio frequency, RF, filter tuning, the method comprising: obtaining (911) a scattering parameter, S-parameter, reading for an RF filter; generating (913), from the RL agent of a target simulator, a value for influencing a tuning mechanism of the RF filter based on the S-parameter reading, wherein the RL agent of the target simulator has transferred learning from a RL agent trained in a source simulator to tune a first simulation of the RF filter and subsequent training of the RL agent in the target simulator; and signaling (915) the value to a controller for automatic execution of the value on the tuning mechanism of the RF filter.

2. The method of Claim 1, wherein the source simulator is less accurate than the target simulator.

3. The method of any of Claims 1 to 2, wherein the source simulator comprises a circuit-level RF filter simulator and wherein the target simulator comprises a three- dimensional electromagnetic field simulator, 3D EM simulator, or an actual filter.

4. The method of any of Claims 1 to 3, further comprising: training (901) the RL agent in the source simulator to tune the first simulation of the RF filter based on execution of the value on the tuning mechanism from a plurality of tuning mechanisms of the first simulation of the RF filter.

5. The method of any of Claims 1 to 3, further comprising: accessing (903) a mapping comprising an association on a per tuning mechanism basis of a plurality of values of a tuning mechanism from the source simulatorto a respective circuit parameter of a circuit model of the RF filter; augmenting (905) the RL agent of the source simulator with the mapping to obtain an augmented RL agent; and generating (907), from the augmented RL agent of the source simulator, a value for influencing the tuning mechanism of the first simulation of the RF filter based on the S- parameter reading.

6. The method of Claim 5, wherein the S-parameter reading is obtained from the circuit parameter.

7. The method of any of Claims 5 to 6, wherein the RL agent of the source simulator comprises a neural network, the mapping comprises a single-layer neural network, and the RL agent of the target simulator comprises a neural network.

8. The method of any of Claims 5 to 7, wherein the RL agent of the target simulator receives the transferred learning of the augmented RL agent, and the generating (913) the value for influencing the tuning mechanism of the RF filter from the RL agent of the target simulator comprises generating the value with a subnetwork of the augmented RL agent that omits use of a portion of the augmented RL agent that includes the mapping.

9. The method of Claim 8, wherein the subsequent training in the target simulator comprises training the RL agent of the target simulator based on using the subnetwork of the augmented RL agent as initialization to train in the target simulator.

10. The method of Claim 9, wherein the training based on using the subnetwork of the augmented RL agent as initialization to train in the target simulator further comprises: obtaining an S-parameter reading for the RF filter of the target simulator; generating, from the RL agent of the target simulator, a value for influencing the tuning mechanism for at least one tuning mechanism from a plurality of tuning mechanisms of a second simulation of the RF filter or an actual RF filter of the target simulator, the generating based on the S-parameter reading and a current position of each the plurality of tuning mechanisms of the RF filter of the target simulator; receiving a reward value based on a new s-parameter reading for the RF filter of the target simulator, the new s-parameter reading corresponding to the value for tuning the tuning mechanism; and determining, based on the reward value, whether the RF filter is tuned.

11. The method of Claim 10, further comprising: when the determining whether the RF filter is tuned results in the RF filter of the target simulator is not tuned, repeating (909) the training the RL agent of the target simulator until the RF filter of the target simulator is tuned.

12. The method of any of Claims 10 to 11, wherein the reward value comprises a representation of a distance that a current configuration of the RF filter of the target simulator has to a tuned configuration.

13. The method of Claim 12, wherein the distance comprises a point-wise Euclidean distance between the current S-parameter value and an S-parameter value of the tuned configuration of the RF filter of the target simulator.

14. The method of any of Claims 1 to 13, wherein the RL agent of the source simulator comprises one of a model free reinforcement learning agent, MFRL, and a modelbased reinforcement learning agent, MBRL.

15. The method of any of Claims 1 to 14, wherein the tuning mechanism comprises a screw, and the value for influencing the tuning mechanism comprises a screw rotation value for changing a height of at least one screw of the RF filter.

16. A device (201, 1000) configured with a reinforcement learning, RL, agent (203) for RF filter tuning, the device configured to perform operations according to any of Claims 1 to 15.

17. A device (201, 1000) configured with a reinforcement learning, RL, agent (203) for RF filter tuning, the device comprising: processing circuitry (1003); memory (1005) coupled with the processing circuitry, wherein the memory includes instructions that when executed by the processing circuitry causes the device to perform operations comprising: obtain a scattering parameter, S-parameter, reading for an RF filter; generate, from the RL agent of a target simulator, a value for influencing a tuning mechanism of the RF filter based on the S-parameter reading, wherein the RL agent of the target simulator has transferred learning from a RL agent trained in a source simulator to tune a first simulation of the RF filter and subsequent training of the RL agent in the target simulator; and signal the value to a controller for automatic execution of the value on the tuning mechanism of the RF filter.

18. The device of Claim 17, the operations further comprising any of the operations of Claims 2 to 15.

19. A computer program comprising computer code to be executed by a device (201, 1000) configured with a reinforcement learning, RL, agent (203) for RF filter tuning to perform operations comprising: obtain a scattering parameter, S-parameter, reading for an RF filter; generate, from the RL agent of a target simulator, a value for influencing a tuning mechanism of the RF filter based on the S-parameter reading, wherein the RL agent of the target simulator has transferred learning from a RL agent trained in a source simulator to tune a first simulation of the RF filter and subsequent training of the RL agent in the target simulator; and signal the value to a controller for automatic execution of the value on the tuning mechanism of the RF filter.

20. The computer program of Claim 19, the operations further comprising any of the operations of Claims 2 to 15.

21. A computer program product comprising a non-transitory storage medium (1005) including program code to be executed by processing circuitry (1003) of a device (201, 1000) configured with a reinforcement learning, RL, agent (203) for RF filter tuning, whereby execution of the program code causes the device to perform operations comprising: obtain a scattering parameter, S-parameter, reading for an RF filter; generate, from the RL agent of a target simulator, a value for influencing a tuning mechanism of the RF filter based on the S-parameter reading, wherein the RL agent of the target simulator has transferred learning from a RL agent trained in a source simulator to tune a first simulation of the RF filter and subsequent training of the RL agent in the target simulator; and signal the value to a controller for automatic execution of the value on the tuning mechanism of the RF filter.

22. The computer program product of Claim 21, the operations further comprising any of the operations of Claims 2 to 15.

Description:
TRANSFER LEARNING FOR RADIO FREQUENCY FILTER TUNING

TECHNICAL FIELD

[0001] The present disclosure relates generally to a device configured with a reinforcement learning (RL) agent having transferred learning for radio frequency (RF) filter tuning, and related methods and apparatuses.

BACKGROUND

[0002] Filters used in base stations for wireless communications may be demanding in terms of filter characteristics (e.g., the frequency response of the a filter) necessitated to respond to challenging requirements such as very narrow bandwidths (e.g., less than 50 MHz) and high attenuation requirements (e.g., more than 80 dB) at frequencies close to the frequency range(s) of the passband(s).

[0003] The frequency response of a filter typically is described with the help of scattering parameters, or S-parameters. S-parameter traces of filters typically include poles, which represent a frequency point in the passband of the filter at which the input signal is not reflected and therefor can pass the filter with the least attenuation; whereas zeros (also referred to as transmission zeros) in S-parameters refer to frequency points in the stopband, or rejection band, of a filter at which no energy is transmitted.

[0004] Generally, increasing the number of poles may allow achieving higher attenuation levels, while these attenuation levels may be further increased for some frequency points (or certain frequency ranges) by the introduction of zeros.

[0005] In order to reach, e.g., a very narrow bandwidth with high rejection ratio, a selected filter topology may need many poles and at least a couple of zeros (e.g., more than six poles and two zeros). For cavity filters, which may be used in base stations in a mobile communications system, the number of poles is directly translated into the number of physical resonators of the manufactured filter. As a resonator is electromagnetically connected for some frequencies to the next resonator, a path from the input to output is created, allowing energy to flow from the input to the output at the designed frequencies while some frequencies are rejected. When a pair of non-consecutive resonators are coupled, an alternative path for the energy is created. This alternative path is related to a zero in the rejection band. [0006] In some cavity filters, each pole/resonator has a tunable structure (e.g., a screw, a rod, a knob, a peg, a bolt, a gear, etc.) which may be adjusted to endeavor to address inaccuracies in the manufacturing process, while each zero (due to consecutive or non-consecutive resonators) has another tunable structure to endeavor to control the desired coupling. The tuning of poles and zeros may be very demanding. Thus, in some approaches, tuning may be performed manually by a well-trained technician that manipulates the tunable structure and verifies the desired frequency response in a vector network analyzer (VNA).

[0007] Some approaches propose possible use of artificial intelligence (Al)/machine learning (ML) in a circuit-based simulator as a potential alternative to try to tune a filter.

SUMMARY

[0008] There currently exist certain challenges. Manual tuning of RF filters may be time consuming and expensive (e.g., thirty minutes to tune a cavity filter and costs associated with a person performing the tuning). While use of AI/ML in a circuit-based simulator may help reduce time and cost associated with manual tuning, such an approach with a circuit-based simulator may not be accurate enough with respect to the real cavity filter.

[0009] Certain aspects of the disclosure and their embodiments may provide solutions to these or other challenges.

[0010] In various embodiments, a computer-implemented method is provided that is performed by a device configured with a RL agent for radio frequency, RF, filter tuning. The method includes obtaining a scattering parameter, S-parameter, reading for an RF filter. The method further includes generating, from the RL agent of a target simulator, a value for influencing a tuning mechanism of the RF filter based on the S-parameter reading. The RL agent of the target simulator has transferred learning from a RL agent trained in a source simulator to tune a first simulation of the RF filter and subsequent training of the RL agent in the target simulator. The method further includes signaling the value to a controller for automatic execution of the value on the tuning mechanism of the RF filter. [0011] In some embodiments, the method further includes training the RL agent in the source simulator to tune the first simulation of the RF filter based on execution of the value on the tuning mechanism from a plurality of tuning mechanisms of the RF filter.

[0012] In some embodiments, the method further includes accessing a mapping comprising an association on a per tuning mechanism basis for a plurality of values of a tuning mechanism from the source simulator to a respective circuit parameter of a circuit model of the RF filter. The method further includes augmenting the RL agent of the source simulator with the mapping to obtain an augmented RL agent; and generating, from the augmented RL agent of the source simulator, a value for influencing the tuning mechanism of the first simulation of the RF filter based on the S-parameter reading.

[0013] In some embodiments, the method further includes, when determining whether the RF filter is tuned results in the RF filter of the target simulator is not tuned, repeating training the RL agent of the target simulator until the RF filter of the target simulator is tuned.

[0014] In various embodiments, a device configured with a RL agent for RF filter tuning is provided. The device includes processing circuitry; and memory coupled with the processing circuitry. The memory includes instructions that when executed by the processing circuitry causes the device to perform operations. The operations include obtain a scattering parameter, S-parameter, reading for an RF filter. The operations further include generate, from the RL agent of a target simulator, a value for influencing a tuning mechanism of the RF filter based on the S-parameter reading. The RL agent of the target simulator has transferred learning from a RL agent trained in a source simulator to tune a first simulation of the RF filter and subsequent training of the RL agent in the target simulator. The operations further include signal the value to a controller for automatic execution of the value on the tuning mechanism of the RF filter.

[0015] In various embodiments, a computer program is provided that includes program code to be executed by a device configured with a RL agent for RF filter tuning to perform operations. The operations include obtain a scattering parameter, S-parameter, reading for an RF filter. The operations further include generate, from the RL agent of a target simulator, a value for influencing a tuning mechanism of the RF filter based on the S- parameter reading. The RL agent of the target simulator has transferred learning from a RL agent trained in a source simulator to tune a first simulation of the RF filter and subsequent training of the RL agent in the target simulator. The operations further include signal the value to a controller for automatic execution of the value on the tuning mechanism of the RF filter.

[0016] In various embodiments, a computer program product including a non- transitory storage medium including program code to be executed by processing circuitry of a device is provided. Execution of the program code causes the device to perform operations. The operations include obtain a scattering parameter, S-parameter, reading for an RF filter. The operations further include generate, from the RL agent of a target simulator, a value for influencing a tuning mechanism of the RF filter based on the S-parameter reading. The RL agent of the target simulator has transferred learning from a RL agent trained in a source simulator to tune a first simulation of the RF filter and subsequent training of the RL agent in the target simulator. The operations further include signal the value to a controller for automatic execution of the value on the tuning mechanism of the RF filter.

BRIEF DESCRIPTION OF DRAWINGS

[0017] The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate certain non-limiting embodiments of inventive concepts. In the drawings:

[0018] Figure 1 is a block diagram illustrating a process used by a human to tune a cavity filter;

[0019] Figure 2 is block diagram illustrating an example embodiment of a device configured with a RL agent for RF filter tuning in accordance with some embodiments of the present disclosure;

[0020] Figures 3A-3D are curves illustrating S-parameter readings in accordance with some embodiments of the present disclosure;

[0021] Figures 4A and 4B are plots illustrating RL agent performance (reward) of a an RL agent trained from scratch (Figure 4A) and an RL agent having transferred learning in accordance with some embodiments of the present disclosure (Figure 4B); [0022] Figures 5A and 5B are plots illustrating RL reconstruction loss of an RL agent trained from scratch (Figure 5A) and an RL agent having transferred learning in accordance with some embodiments of the present disclosure (Figure 5B);

[0023] Figures 6A and 6B are plots illustrating an entropy comparison of an RL agent trained from scratch (Figure 6A) and an RL agent having transferred learning in accordance with some embodiments of the present disclosure (Figure 6B);

[0024] Figures 7A and 7B are plots illustrating RL agent gradient magnitude of an RL agent trained from scratch (Figure 7A) and an RL agent having transferred learning in accordance with some embodiments of the present disclosure (Figure 7B);

[0025] Figures 8A-8F are curves illustrating iterative S-parameter readings of an RL agent trained only on a circuit-level model from a mapping in accordance with some embodiments of the present disclosure;

[0026] Figure 9 is a flowchart illustrating operations of a device in accordance with some embodiments of the present disclosure;

[0027] Figure 10 is a block diagram of a device in accordance with some embodiments of the present disclosure; and

[0028] Figure 11 is a block diagram of a communication system in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION

[0029] Inventive concepts will now be described more fully hereinafter with reference to the accompanying drawings, in which examples of embodiments of inventive concepts are shown. Inventive concepts may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of present inventive concepts to those skilled in the art. It should also be noted that these embodiments are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present/used in another embodiment.

[0030] The following description presents various embodiments of the disclosed subject matter. These embodiments are presented as teaching examples and are not to be construed as limiting the scope of the disclosed subject matter. For example, certain details of the described embodiments may be modified, omitted, or expanded upon without departing from the scope of the described subject matter.

[0031] The following explanation of potential problems with some approaches is a present realization as part of the present disclosure and is not to be construed as previously known by others.

[0032] Some approaches propose possible use of AI/ML in a circuit-based simulator as a potential alternative to manual tuning to try to reduce tuning time per filter including, e.g., Lindstah I, S., Reinforcement Learning with Imitation for Cavity Filter Tuning : Solving problems by throwing DIRT at them (Dissertation) (2019), http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254422 (accessed on 26 January 2022); and WO2020242367 discussing a RL agent that may be able to solve 6p2z (p: poles, z: zeroes) environments. However, in some situations the is a need to improve the accuracy of AI/ML-based tuning models even further.

[0033] Such approaches to filter tuning use a circuit-based simulator (e.g., filttune), which also may not provide accurate enough models for real filter products (in other words an actual filter), may be time demanding, and/or the decrease may not be sufficient, as discussed further herein.

[0034] To try to get closer to real filter products and to endeavor to provide a RL agent that may work in a reasonable amount of time to tune real, manufactured filter products, a three-dimensional electromagnetic field simulator (3D EM simulator) may be considered (e.g., CST where Maxwell's equations are solved over a 3D body of a filter to generate more accurate results than, e.g., a circuit-based simulator). However, training a RL agent to model and tune an actual filter with a 3D EM simulator may use a large amount of samples for training, may be time consuming, and may not be practical. For example, for every timestep of training, a simulation can be run with a complex 3D EM simulator, where each simulation can take several minutes. For instance, with a CST simulator (e.g., which may be used for frequency domain simulations of RF filters (e.g., such as a 8p4z cavity filter), after optimizing a grid and a number of points where Maxwell's equation is solved, 3 minutes may be needed to get the scattering parameters (S-parameters). A MFRL may need about 700,000 samples for training, which may make training with the CST simulator impractical (or even almost impossible) because about 4 years may be needed to complete such training.

[0035] While model-based reinforcement learning, MBRL techniques may decrease sample complexity of model-free reinforcement learning, MFRL by leveraging a world model to boost training efficiency, the decrease may not be sufficient.

[0036] Thus, existing approaches for training and using an RL agent to tune a RF filter with a simulator with sufficient accuracy, efficiency, and/or within a reasonable amount of time may be lacking.

[0037] Certain aspects of the disclosure and their embodiments may provide solutions to these or other challenges.

[0038] Figure 9 is a flowchart illustrating a computer-implemented method of a device configured with a RL agent (e.g., RL agent 203) for RF filter tuning according to some embodiments of the present disclosure. The device can be device 201, 1000 as discussed further herein. The method includes obtaining (911) a scattering parameter, S-parameter, reading for an RF filter. The method further includes generating (913), from the RL agent of a target simulator, a value for influencing a tuning mechanism of the RF filter based on the S-parameter reading. The RL agent of the target simulator has transferred learning from a RL agent trained in a source simulator to tune a first simulation of the RF filter and subsequent training of the RL agent in the target simulator. The method further includes (915) signaling the value to a controller for automatic execution of the value on the tuning mechanism of the RF filter. Various operations from the flow chart of Figure 9 may be optional with respect to some embodiments of devices and related methods. For example, as discussed further herein, operations of blocks 901-907 of Figure 9 may be optional.

[0039] The term "RF filter" is used in a non-limiting manner and, as explained herein, can refer to any type of RF filter for mobile communication applications including, without limitation, cavity filters. Further, the term "tuning mechanism" is used in a non-limiting manner and, as explained herein, can refer to any type of tunable structure on an RF filter (including, without limitation, a screw, a rod, a knob, a peg, a bolt, a gear, etc.) and/or a nonmechanical RF relevant property of a waveguide/filter material of the RF filter (including, without limitations, dielectric parameters, temperature, etc.) which may be modulated using non-mechanical electric parameters such as current, chemical, etc. [0040] Potential advantages provided by certain embodiments of the present disclosure may include that based on the inclusion of transfer learning, sample complexity in the target simulator (that is, in a target domain) may be decreased and, thus, training time may be decreased. For example, training in the source simulator (that is, in a source domain) may be orders of magnitude faster than in the target simulator and, thus, may not significantly impact net training time. The method, therefore, may be particularly efficient in, e.g., an application where there is such a training time discrepancy between target and source domains.

[0041] As discussed further herein, in some embodiments, the source simulator is less accurate than the target simulator. That is, the target simulator comprises an environment that is closer to the real, physical RF filter than the source simulator (e.g., the target simulator has a degree of representation of the physical environment that is closer to the real, physical RF filter). For example, in some embodiments, the source simulator comprises a circuit-level RF filter simulator and the target simulator comprises a 3D EM simulator or an actual filter. The circuit-level RF simulator (e.g., filttune) may not provide an accurate enough model for the real filter product; and the target simulator comprising the 3D EM simulator (e.g., CST where Maxwell's equations are solved over a 3D body of the filter) may be an environment that is more accurate than the source simulator. Without transfer learning as discussed herein, a single interaction with the target simulator may be orders of magnitudes more time demanding.

[0042] Further, in some embodiments, the method further includes training (operation 901 of Figure 9) the RL agent in the source simulator to tune the first simulation of the RF filter based on execution of the value on the tuning mechanism from a plurality of tuning mechanism of the first simulation of the RF filter.

[0043] Further potential advantages provided by certain embodiments of the present disclosure may include that based on the inclusion of transfer learning from tuning that the RL agent learned in the less accurate source simulator to the RL agent in the more complex target simulator, sample complexity of the RL agent in the target simulator may be decreased; and, thus, training in the target simulator may be done within a reduced amount of time. [0044] In an example embodiment, an MBRL agent of the source simulator is trained with a simple and light circuit-model simulator, where the MBRL agent can de trained quickly (e.g., a couple of tens of milliseconds) and with an acceptable success rate (e.g., > 90% successful tuning). Then transfer learning is used to initialize a new RL agent which is being trained with the target simulator (e.g., a CST simulator). As discussed further herein, experiments for this example embodiment illustrate sample complexity decreased by at least a factor of two.

[0045] In another example embodiment, a filttune RL agent from the source simulator is used as an initial network configuration for the CST simulator. This may help to initialize RL agent training in a region which exhibits more desirable training properties, which may lead to faster training (e.g., significantly training). Embodiments of the present disclosure, however, are not so limited and may include a broader spectrum of leveraging the already trained RL agent of the source simulator. For example, in some embodiments, the RL agent of the source simulator comprises a neural network, certain layers of the neural network can be frozen, and the neural network can be trained from a layer onward. Further, in some embodiments, certain layers can be substituted (e.g., the last layer for the actor network), as the final action interpretability slightly differs across the two simulators.

[0046] Referring to Figure 9, the method of some embodiments of the present disclosure further includes the following.

[0047] A RL agent can be trained on a first simulator (e.g., a circuit-level simulator); and the trained RL agent in the first simulator can propose changes to positions of tunable structures on the RF filter. In some embodiments, training (901) the RL agent in the source simulator to tune the first simulation of the RF filter is based on execution of the value on the tuning mechanism from a plurality of tuning mechanisms of the first simulation of the RF filter.

[0048] A network configuration of the trained RL agent exceeding a performance threshold (e.g., training performance of > 90%) in the first simulator can be stored.

[0049] A mapping between a circuit model and 3D simulator can be obtained. In an example embodiment, the mapping includes a map of positions of tunable structures of the RF filter (e.g., screw rotations) to circuit parameters of a circuit model. The circuit parameters can lead to circuit-level simulations which behave very similarly to a 3D simulator such as CST. For example, in some embodiments, the method further includes accessing (903) a mapping comprising an association on a per tuning mechanism basis of a plurality of values of a tuning mechanism from the source simulator to a respective circuit parameter of a circuit model of the RF filter. In some embodiments, the mapping is described by a single layer neural network.

[0050] The trained RL agent (which may use/comprise a neural network) can be combined with the mapping, and the combined RL agent can be trained. The combined RL agent can be used to tune RF filters in a circuit-based simulator that is close to a 3D simulator based on proposed modifications to underlying circuit parameters of the RF filter. For example, the method further includes augmenting (905) the RL agent of the source simulator with the mapping to obtain an augmented RL agent; and generating (907), from the augmented RL agent of the source simulator, a value for influencing the tuning mechanism of the first simulation of the RF filter based on the S-parameter reading. In some embodiments, the S-parameter reading is obtained from the circuit parameter. As discussed further herein, a potential advantage of the mapping may be that the combined RL agent needs fewer training samples (e.g., at least 2 times fewer), which may decrease overall training time. Reduction in training time may also provide faster hyperparameter tuning because hyperparameter space may be searched faster. Such a potential advantage may be important when the method includes intricate RF filter environments. Reduction in training time also may allow consideration of more complex filters whose auto-tuning may not be possible with current approaches at least due to time constraints.

[0051] In some embodiments, the RL agent of the source simulator comprises a neural network, the mapping comprises a single-layer neural network, and the RL agent of the target simulator comprises a neural network.

[0052] The learning of the combined RL agent can be transferred to a more accurate target simulator (e.g., a 3D simulator) where the simulation time can be, e.g., a couple of minutes by leveraging the knowledge accumulated during the previous training.

[0053] A subnetwork of the combined RL agent can be used as initialization to train on the target simulator. For example, in some embodiments, the RL agent of the target simulator receives the transferred learning of the augmented RL agent, and the generating (913) the value for influencing the tuning mechanism of the RF filter from the RL agent of the target simulator comprises generating the value with a subnetwork of the augmented RL agent that omits use of a portion of the augmented RL agent that includes the mapping. Thus, the combined RL agent, which was trained on a circuit-based simulator that is close to the 3D simulator, can utilize the subnetwork that outputs values for the tuning mechanism(s), by utilizing all, but the last layer (which mapped values of tuning mechanisms to circuit parameters).

[0054] The RL agent of the target simulator can be trained in subsequent training, as discussed further herein.

[0055] Further potential advantages of certain embodiments of the present disclosure may include that the method may bridge the gap between the source and target simulations by introducing an intermediate environment of a circuit model that leverages the mapping to train the combined RL agent in a setting which is closer to the target simulation (e.g., 3D simulation). Thus, the method may provide training an RL agent on the target simulator with substantially fewer samples. As a consequence, the method may be more efficient, flexible, and can act as a blueprint for training on even more accurate simulators (e.g., future simulators). Additionally, the method can be applied to different scenarios including, without limitation, for future, even more accurate RF filter simulators. For example, only a subpart of the RL agent trained in the source simulator (e.g., a lighter environment) can be used, or a subpart of the RL agent in the target environment (e.g., a more complex environment) can be trained while keeping the remainder of the RL agent fixed.

[0056] Furthermore, the method may provide more stable training. As discussed further herein with reference to experiments, several metrics may exhibit better behavior when employing transfer learning, rather than training from scratch. This may be attributed to the RL agent of the target simulator being initialized in a better defined, more desirable subregion of the learnable parameter space.

[0057] Additionally, the method may provide domain knowledge application. The mapping between the source and the target simulators may use domain knowledge. For example, such a mapping may incorporate certain physical properties of the RF filter, such as physical limitations of tunable structures and sensitivity analysis. [0058] RF filters of the present disclosure include, without limitation, microwave cavity filters. Band-pass filters may be used in wireless communication systems to meet sharp and demanding requirements of commercial bands. Presently, cavity filters may be dominantly used due to low cost for mass production and high-Q-factor per resonator (e.g., especially for frequencies below 1GHz). This type of filter may provide high-Q resonators that can be used to implement sharp filters with fast transitions between pass and stop bands and high selectivity. Moreover, they can cope with high-power input signals.

[0059] Cavity filters can be applicable from as low as 50 MHz up to several GHz. This versality in frequency range, as well high selectivity, may make them a popular choice in many applications, such as in base stations in a communications system (e.g., a radio access network RAN node such as an eNodeB (eNB) and/or a gNodeB (gNB) in a mobile communications system).

[0060] A drawback of this type of narrow band filter may be that since they require a very sharp frequency response, a small tolerance in the fabrication process may impact the final performance. An approach to avoid an expensive fabrication process is based on post-production tuning. In some approaches, post-production tuning uses highly trained human operators to manually tune the filter. Figure 1 is a block diagram illustrating a process used by a human 101 to tune a cavity filter 107. As illustrated, the tuning process may include turning a set of screws on the cavity filter 107 and, with the aid of a VNA 105, compare how close a current filter frequency response (S-parameters 103 (also referred to herein as an S-curve or S-parameter reading) is to a desired filter frequency response. This process is repeated until the measurement in the VNA 105 and the designed filter mask are close enough. Potential challenges with this approach may include cost (e.g., for the human), time (e.g., it may take up to 30 minutes to tune a single filter), and/or lack of automation.

[0061] Some approaches have tried to automate a tuning process. See e.g., For example, Harscher, R. Vahldieck, and S. Amari, "Automated filter tuning using generalized low-pass prototype networks and gradient-based parameter extraction," IEEE Transactions on Microwave Theory and Techniques, vol. 49, no. 12, pp. 2532-2538, 2001. doi: 10.1109/22.971646 ("Harscher"). Harscher discusses breaking the task into first finding underlying model parameters which generate a current S-curve and then performing sensitivity analysis to adjust them so that they end up to the nominal (e.g., ideal) values of a perfectly tuned filter. In another approach, as discussed herein, Al has been proposed. Such approaches, however, may lack ability to use the approaches for more complicated filters with more sophisticated topologies, may need a large amount of training samples and time to achieve desired performance, and may not be efficient and/or practical.

[0062] Certain embodiments of the present disclosure may provide solutions to these or other challenges. The method of the present disclosure can include an end-to-end process for the tunning of real, physical RF filters. The RL agent having transferred learning can generate values for influencing a tuning mechanism(s) of an RF filter in simulation, and signal the value(s) to a controller for automatic execution of the value on the tuning mechanism of the RF filter (e.g., by a robot which also can have direct access to S-parameter readings from a VNA). Actions can lie within [-1,1] and correspond to altering the tuning mechanism (e.g., altering the position of a tunable structure(s) (e.g., altering a height of a screw(s) by a specified amount (e.g., in millimeters)).

[0063] Figure 2 is block diagram illustrating an example embodiment of a device 201 configured with a RL agent 203 for RF filter tuning in an end-to-end process in accordance with some embodiments of the present disclosure. RL agent 203 of a target simulator is trained by interacting either with a simulator or directly with a real filter (as illustrated in Figure 2). In the latter case, a robot 211 that can automatically execute a value on a tuning mechanism of the RF filter (e.g., turn 213 physical screws on the real filter). A goal of RL agent 203 can include devising a sequence of actions that may lead to a tuned configuration as fast as possible.

[0064] Training 203 is described as follows. RL agent 203 obtains an S-parameter observation o, generates action a, evolving the system, yielding the corresponding reward r and next observation o'. The tuple (o,a,r,o') can be stored internally, as it can be later used for training. RL agent 203 checks 205, 207 if it should train its world model and/or actorcritic networks (e.g., perform gradient updates every 10 steps). If not, RL agent 203 simulates values 209 (e.g., simulates values for screw rotations) and returns to obtaining an S-parameter observation o, generates action a, evolving the system, yielding the corresponding reward r and next observation o'.

[0065] A goal of RL agent 203 can be quantified via the reward r, which can depict the distance that the current configuration has to a tuned one. Thus, in some embodiments, the reward value comprises a representation of a distance that a current configuration of the RF filter of the target simulator has to a tuned configuration. In some embodiments, the distance comprises a point-wise Euclidean distance between the current S-parameter values and the desired ones (that is, an S-parameter value of the tuned configuration of the RF filter of the target simulator, across the examined frequency range (as illustrated in Figures 3A- 3D). If a tuned configuration is reached, the RL agent 203 receives a fixed r tun ed reward value (e.g. +100).

[0066] Figures 3A-3D are graphs illustrating S-parameter readings from a VNA within a training loop in accordance with some embodiments of the present disclosure. S- parameter curves 301 (in dB) for an example embodiment of Figure 2 are shown throughout each graph. Requirements are indicated by the horizonal bars in each graph. For example, the curve 301 must lie above bar 303 in the pass band and below the four other horizontal bars 305, 307, 309, 311 indicating the stop band. The curve 313 (dotted line) and the curve 315 (dashed line) must lie below the bar 317 in the passband. As illustrated in Figure 3C, the filter satisfies the requirements after two time steps.

[0067] RL agent 203 can interact with the RF filter by changing a set of tuning mechanisms (e.g., tunable parameters via the tunable structures (e.g., screws)) of the RF filter. Thus, observations are mapped to rewards which in turn are mapped (by the RL agent 203) to adjustments of values for influencing a tuning mechanism(s) (e.g., screw rotations), which lead to automatic execution of a value on the tuning mechanism of the RF filter (e.g., modifications via robot 211, such as turn screws on filter 213).

[0068] After training, at inference, RL agent 203 is employed as illustrated in Figure 2. RL agent 203 obtains the S-observation(s) provided from the VNA 105, translates them into the corresponding value for influencing a tuning mechanism of the RF filter (e.g., adjustments to tunable structures such as screw rotations), and signals the value to a controller for automatic execution (e.g., to robot 211). The automatic controller (e.g., robot 211) then executes 213 the value on the tuning mechanism of the RF filter (e.g., an adjustment(s) to tunable structures of the RF filter signaled by the RL agent 203). This process continues until a tuned configuration is reached.

[0069] As referenced herein, the method of various embodiments of the present disclosure uses transfer learning for RL, which may significantly decrease sample complexity in the target simulator. Training in the source simulator may be orders of magnitude faster than that in the target simulator and, thus, may not significantly impact net training time. The method may be particularly efficient for RF filter tuning, such as cavity filter tuning, where there may be a training time discrepancy between the target and source simulators. [0070] Transfer learning is a ML technique in which knowledge acquired from a source domain is utilized in a target domain. For mobile communication applications for example, the source domain may be a light and less accurate source simulator, such as the filttune circuit-level RF filter simulator; and the target domain may be a more accurate and complex target simulator, such as a CST 3D simulator. Transfer learning may be beneficial when there is some connection between source and target domains. In a first example embodiment, some of the learned representations from filttune can be used for CST, e.g., by fixing some of the early layers of the RL agent (e.g., a neural network or world model) in the target simulator which extracts such a representation of the data. Alternatively, in a second example embodiment, the RL agent of the target simulator can be augmented by adding a few extra layers at the end which may help process the differences between the two domains. In a further example embodiment, an approach that is between the first and second example embodiments may be used where no layers of the RL agent are fixed and supplementary layers are not added. Instead, in this further example embodiment, the RL agent of the source simulator (e.g., a filttune agent) can be used as an initialization for the RL agent in the target simulator (e.g., a CST 3D simulator).

[0071] RL is a learning method concerned with how a RL agent can take actions in an environment so as to maximize a numerical reward signal (also referred to herein in as a reward value). In some embodiments of the present disclosure, the environment is an RF filter for mobile communication applications, for example a cavity filter. A cavity filter can have various topologies, e.g., such as a type of cavity filter with 6 poles and 2 zeros. In some embodiment, the RL agent of the source and target simulators is an algorithm which generates values for influencing tuning mechanisms (e.g., values for adjusting positions (e.g., heights) of tunable structures (e.g., screws, rods, knobs, pegs, bolts, gears, etc.)) of the RF filter.

[0072] To train and use the RL agent to tune the RF filter, in some embodiments, the environment can be treated as a black box in which a MFRL agent may be used; or the environment can be modeled by using a MBRL agent. Given sufficient samples (which may be referred to as "asymptotic performance"), MFRL may tend to exhibit better performance than MBRL, as errors induced by the RL agent of the target simulator may get propagated to the decision making of that RL agent (e.g., model errors may act as a bottleneck in performance).

[0073] On the other hand, a RL agent (e.g., a MBRL agent) may leverage a world model of the RL agent, which may boost training efficiency and may lead to faster training. For example, the RL agent of the target simulator can use the learned environment model to simulate a sequence of actions and observations, which in turn can give it a better understanding of the consequences of its actions. When designing an RL agent, a balance may need to be found between training speed and asymptotic performance. Achieving both may need careful modelling. A potential advantage of certain embodiments of the present disclosure may be that based on the RL agent of the target simulator having transferred learning from a RL agent trained in a source simulator to tune a first simulation of the RF filter and subsequent training in the target simulator, a balance may be found between training speed and asymptotic performance.

[0074] In some embodiments, subsequent training of the RL agent in the target simulator includes training the RL agent of the target simulator based on using the subnetwork of the augmented RL agent as initialization to train in the target simulator.

[0075] In some embodiments, the training based on using the subnetwork of the augmented RL agent as initialization to train in the target simulator further includes obtaining an S-parameter reading for the RF filter of the target simulator. The method further includes generating, from the RL agent of the target simulator, a value for influencing a tuning mechanism for at least one tuning mechanism from a plurality of tuning mechanisms of the second simulation of the RF filter or an actual RF filter of the target simulator. The generating is based on the S-parameter reading and a current value of each the plurality of tuning mechanisms of the RF filter of the target simulator. The method further includes receiving a reward value based on a new s-parameter reading for the RF filter of the target simulator. The new s-parameter reading corresponds to the value for tuning the tuning mechanism. The method further includes determining, based on the reward value, whether the RF filter is tuned. [0076] In some embodiments, the method further includes, when the determining whether the RF filter is tuned results in the RF filter of the target simulator is not tuned, repeating (909) the training the RL agent of the target simulator until the RF filter of the target simulator is tuned.

[0077] While embodiments discussed herein are explained in the non-limiting context of a RL agent comprising a Dreamer/modified Dreamer-based architecture, the invention is not so limited and includes any RL agent configured to perform operations according to embodiments disclosed herein. The actor of the Dreamer chooses the actions performed by the RL agent, and bases its decisions purely on a lower dimensional latent space. The Dreamer leverages a world model to imagine trajectories, without requiring the generation of actual observations. Thus, it may be beneficial to plan in a lower dimensional, information rich, latent space.

[0078] The Dreamer includes an Actor-Critic network pair and a world model. The world model is fit onto a sequence of observations, so that it can reconstruct an original observation from the latent space and predicts the corresponding reward. The Actor and Critic receive as an input the latent representation of the observations. The Critic aims to predict the value of a state (e.g., how close is the RF circuit to a tuned configuration), while the Actor aims to find the action which would lead to a configuration exhibiting a higher value (e.g., more tuned). The Actor obtains more precise value estimates of its output by leveraging the world model to examine the consequences of its actions multiple steps ahead.

[0079] Training of the Dreamer/modified Dreamer may include initializing an experience buffer with a random RL agent. The random RL agent trains a world model on a testing sample. The world model reconstructs an original observation from the latent space and predicts the corresponding reward. The Actor and Critic receive as an input the latent representation of the observations. The experiences from interacting with the environment are added to the experience buffer. This process is repeated until the RL agent performs at the desired level.

[0080] World model training may include observations fed through an encoder. The world model may be trained as to simultaneously maximize the likelihood of generating the

Y1 correct environment rewards r and maintain an accurate reconstruction of the original observation via a decoder. Actions are denoted as a;.

[0081] Thus, an RL agent having a Dreamer-based architecture learns from interacting with the environment to train its value, action, reward, transition, representation and observation models.

[0082] Two experiments were conducted to test robustness of the method of various embodiments. The experiments were performed for an MBRL scenario using a modified Dreamer-based architecture seeking to leverage the MBRL agent trained using a circuit simulator (Filttune) to enhance training on a 3D simulator (CST).

[0083] One experiment illustrated that directly applying transfer learning from filttune to CST may decrease sample complexity by a factor of 2. The other experiment illustrated how the mapping may generate a circuit-based simulator that is similar to a target CST model. This set of experiments illustrate efficacy of transfer learning and the mapping for the cavity filter tuning.

[0084] The experiments, thus, illustrate a potential advantage of the method, including certain details about improvement in training induced by transfer learning. The findings of the experiments are illustrated in Figures 4A, 4B, 5A, 5B, 6A, 6B, 7A, 7B, and BASF.

[0085] Figures 4A and 4B are plots illustrating RL agent performance (reward) of a an RL agent trained from scratch (Figure 4A) and an RL agent having transferred learning in accordance with some embodiments of the present disclosure (Figure 4B). In Figures 4A and 4B, a positive reward value is indicative of a tuned filter. When training from scratch, as illustrated in Figure 4A, the MBRL agent first started tuning filters after about 2.4 thousand steps. In contrast, using transfer learning, this number decreases to around 1.2 thousand steps as illustrated in Figure 4B. Figures 4A and 4B illustrate that in this experiment, the biggest improvement was seen in the early stages of training, where performance fluctuated without improving. When training from scratch, this spanned for about one thousand steps (roughly 300-1.300). When leveraging transfer learning, this was reduced to around 200 steps (300-500). The reduction of sample complexity by a factor of 2 is evident throughout the plots shown in this section. [0086] Figures 5A and 5B are plots illustrating RL reconstruction loss (negative log likelihood) of an RL agent trained from scratch (Figure 5A) and an RL agent having transferred learning in accordance with some embodiments of the present disclosure (Figure 5B). The reconstruction loss shows a similar behavior. Even though the reconstruction loss curves themselves follow a similar curve qualitatively, quantitively there is a difference, as illustrated in Figures 5A and 5B. When utilizing transfer learning, the RL agent was capable of reaching similar loss in about half number of steps and, thus learned faster by a factor of 2.

[0087] Figures 6A and 6B are plots illustrating an entropy comparison of an RL agent trained from scratch (Figure 6A) and an RL agent having transferred learning in accordance with some embodiments of the present disclosure (Figure 6B). Entropy is a measure of exploration. High entropy may be indicative of a cautions, underperforming RL agent which has not acquired the empirical knowledge to be confident and efficient with its actions. Transfer learning may allow for a faster transition towards confident predictions (lower entropies). Thus, the entropy of the RL agent may be of particular importance, as it may quantify the certainty of the RL agent. An RL agent with lower entropy may be less random and more confident about its predictions. RL agents with higher confidence may exhibit higher performance. As illustrated in Figure 6A, without transfer learning, the RL agent became confident (decreases entropy) at a significantly lower pace. As illustrated in Figure 6B, with transfer learning, the RL agent was initially overconfident from the source domain (see the initial increase of entropy) as it got adjusted to the new target domain. Nevertheless, the RL agent exhibited lower entropy. For example, the RL agent training from scratch only reached negative entropy after around 1.6 thousand steps, unlike the RL agent using transfer learning, which maintained a negative entropy throughout training.

[0088] Figures 7A and 7B are plots illustrating RL agent actor network gradient magnitude of an RL agent trained from scratch (Figure 7A) and an RL agent having transferred learning in accordance with some embodiments of the present disclosure (Figure 7B). Gradient magnitude may be of particular importance because it may quantify the rate of change that happens during training. High gradient updates may illustrate that the networks change vastly, hinting towards significant learning. Conversely, low magnitudes may illustrate that little learning is performed. Initializing the network via transfer learning may start the actor in a more desirable region of the learnable parameter space, allowing for faster and more decisive learning. As discussed herein, the actor is the network which chooses actions (e.g., which screws to turn). The magnitude of the gradient for the optimization algorithm that is used during training describes the scale of the changes in the network. Low magnitudes may indicate slow and stale training, indicating that the RL agent may not be initially located in a well-behaved region in its parameter space. As illustrated in Figure 7A, when training from scratch, the RL agent appeared to struggle initially, showing small gradients (in the order of le-3). On the other hand, as illustrated in Figure 7B, larger gradients were found when employing transfer learning. It is noted that even though they decreased fairly quickly (after around 400 steps), they never fell below lei order of magnitude.

[0089] Thus, the first example experiment illustrates that, for the example embodiment of the MBRL scenario of the experiment, a decrease in sample efficiency on CST by a factor of 2 was achieved. For this particular simulator, this may translate to saving several months of training. Further improvement may be yielded by, e.g., freezing or adding certain layers on top of the filttune agent. While the experiment and its results are described with reference to the particular scenario of this experiment, the method of the present disclosure is not so limited. Rather, the method includes a variety of scenarios including, without limitation, a RL agent having a different architecture and a more accurate and/or more time demanding filter simulator. Further, employing transfer learning can be leveraged in similar scenarios that include progressively more accurate and time demanding simulators.

[0090] The second experiment illustrated how mapping can generate a circuit-based simulator that is similar to a target CST model. While discussion of the second experiment maps screw heights to circuit parameters, the method of certain embodiments of the present disclosure is not limited to screw heights. Tuning the RF filter includes, without limitation, adjustments to values for influencing a tuning mechanism(s) of the RF filter (e.g., a position of a rod, a knob, a peg, a bolt, a gear, etc.) and/or a non-mechanical RF relevant property of a waveguide/filter material of the RF filter (e.g., dielectric parameters, temperature, etc.) which may be modulated using non-mechanical electric parameters such as current, chemical, etc. [0091] To establish a mapping from screws' height to circuit parameters (e.g., resonance frequencies, coupling bandwidths, etc.), a superposition principle (SP) was used. In the context of applications of the method of the present disclosure, a SP principle indicates that to model the effect of changing the height of several screws on the circuit parameters of a RF filter, the effect of each screw can be decoupled and effects of all screws linearly and independently can be added. SP, thus, allows use of a sensitivity analysis for establishing the mapping. That is, the height of one screw at a time of a cavity filter was adjusted in the 3D simulator, while the rest were kept at some known heights, for a set of points, and for each and every point the circuit parameters were calculated that were representing the s parameters in the 3D simulator. By repeating this process for each and every screw, a regression technique was used, which may be implemented by a neural network, to predict the circuit parameter for every given set of screws' height.

[0092] The second experiment illustrates similarity between the derived circuitbased model and CST of the experiment. The experiment was a zero-shot set up, where no further training was performed on CST. If this mapping was robust, the RL agent may be expected to be able to perform well, leading to a tuned or almost tuned configuration. Figures 8A-8Fillustrate robustness of the mapping of the experiment (that is, the derived circuit-based model from the mapping and CST are similar) based on the RL agent performing an almost tuned configuration in CST.

[0093] The two experiments illustrate robustness and other potential advantages of the method. Transfer learning directly from the original circuit-based simulator, without the in-between mapping, was sufficient to decrease sample complexity by at least a factor of 2. Moreover, in the first experiment, even though performance improves, the network still, initially, may have struggled with tuning filters (see e.g., reward values in Figure 4B). Thus, the mapping of some embodiments may help bridge such a gap, as the second experiment illustrates, where the RL agent was able to tune instances in CST without a single training step performed with that RL agent.

[0094] Figure 10 is a block diagram illustrating elements of a device 1000 configured with a RL agent 203 for RF filter tuning. Device 1000 may be provided by, e.g., a device in the cloud running software on cloud compute hardware; or a software function/service governing or controlling the RF filter tuning running in the cloud. That is, the device may be implemented as part of a communications system (e.g., a device that is part of the communications system QQ100 as discussed below with respect to Figure 11), or on a device as a separate functionality/service hosted in the cloud. The device also may be provided as a standalone software for tuning an RF filter running on computational systems like servers or workstations; and the device may be in a deployment that may include virtual or cloudbased network functions (VNFs or CNFs) and even physical network functions (PNFs). The cloud may be public, private (e.g., on premises or hosted), or hybrid.

[0095] As shown, the device may include transceiver circuitry 1000 (e.g., RF transceiver circuitry) including a transmitter and a receiver configured to provide uplink and downlink radio communications with devices (e.g., a controller for automatic execution of a value on a tuning mechanism of an RF filter). The device may include network interface circuitry 1007 (also referred to as a network interface,) configured to provide communications with other devices (e.g., a controller for automatic execution of a value on a tuning mechanism of an RF filter). The device may also include processing circuitry 1003 (also referred to as a processor) coupled to the transceiver circuitry, memory circuitry 1005 (also referred to as memory) coupled to the processing circuitry, and RL agent 2-3 coupled to the processing circuit. The RL agent 203 and/or memory circuitry 1005 may include computer readable program code that when executed by the processing circuitry 1003 causes the processing circuitry to perform operations according to embodiments disclosed herein. According to other embodiments, processing circuitry 1003 may be defined to include memory or RL agent 203 so that a separate memory circuitry or separate RL agent is not required.

[0096] As discussed herein, operations of the device may be performed by processing circuitry 1003, network interface 1007, and/or transceiver 1001. For example, processing circuitry 1003 may control RL agent 203 to perform operations according to embodiments disclosed herein. Processing circuitry 1003 also may control transceiver 1001 to transmit downlink communications through transceiver 1001 over a radio interface to one or more devices and/or to receive uplink communications through transceiver 1001 from one or more devices over a radio interface. Similarly, processing circuitry 1003 may control network interface 1007 to transmit communications through network interface 1007 to one or more devices and/or to receive communications through network interface from one or more devices. Moreover, modules may be stored in memory 1005, and these modules may provide instructions so that when instructions of a module are executed by processing circuitry 1003, processing circuitry 1003 performs respective operations (e.g., operations discussed below with respect to example embodiments relating to devices). According to some embodiments, device 1000 and/or an element(s)/function(s) thereof may be embodied as a virtual device/devices and/or a virtual machine/machines.

[0097] According to some other embodiments, a device may be implemented without a transceiver. In such embodiments, transmission to a wireless device may be initiated by the device 1000 so that transmission to the wireless device is provided through a device including a transceiver (e.g., through a base station). According to embodiments where the device includes a transceiver, initiating transmission may include transmitting through the transceiver.

[0098] Figure 11 shows an example of a communication system QQ100 in accordance with some embodiments.

[0099] In the example, the communication system QQ100 includes a telecommunication network QQ102 that includes an access network QQ104, such as a RAN, and a core network QQ106, which includes one or more core network nodes QQ108. The access network QQ104 includes one or more access network nodes, such as network nodes QQllOa and QQllOb (one or more of which may be generally referred to as network nodes QQ110), or any other similar 3rd Generation Partnership Project (3GPP) access node or non- 3GPP access point. The network nodes QQ110 facilitate direct or indirect connection of a user equipment (UE), such as by connecting UEs QQ112a, QQ112b, QQ112c, and QQ112d (one or more of which may be generally referred to as UEs QQ112) to the core network QQ106 over one or more wireless connections.

[00100] Example wireless communications over a wireless connection include transmitting and/or receiving wireless signals using electromagnetic waves, radio waves, infrared waves, and/or other types of signals suitable for conveying information without the use of wires, cables, or other material conductors. Moreover, in different embodiments, the communication system QQ100 may include any number of wired or wireless networks, network nodes, UEs, and/or any other components or systems that may facilitate or participate in the communication of data and/or signals whether via wired or wireless connections. The communication system QQ100 may include and/or interface with any type of communication, telecommunication, data, cellular, radio network, and/or other similar type of system.

[00101] The UEs QQ112 may be any of a wide variety of communication devices, including wireless devices arranged, configured, and/or operable to communicate wirelessly with the network nodes QQ110 and other communication devices. Similarly, the network nodes QQ110 are arranged, capable, configured, and/or operable to communicate directly or indirectly with the UEs QQ112 and/or with other network nodes or equipment in the telecommunication network QQ102 to enable and/or provide network access, such as wireless network access, and/or to perform other functions, such as administration in the telecommunication network QQ102.

[00102] In the depicted example, the core network QQ106 connects the network nodes QQ110 to one or more hosts, such as host QQ116. These connections may be direct or indirect via one or more intermediary networks or devices. In other examples, network nodes may be directly coupled to hosts. The core network QQ106 includes one more core network nodes (e.g., core network node QQ108) that are structured with hardware and software components. Features of these components may be substantially similar to those described with respect to the UEs, network nodes, and/or hosts, such that the descriptions thereof are generally applicable to the corresponding components of the core network node QQ108. Example core network nodes include functions of one or more of a Mobile Switching Center (MSC), Mobility Management Entity (MME), Home Subscriber Server (HSS), Access and Mobility Management Function (AMF), Session Management Function (SMF), Authentication Server Function (AUSF), Subscription Identifier De-concealing function (SIDF), Unified Data Management (UDM), Security Edge Protection Proxy (SEPP), Network Exposure Function (NEF), and/or a User Plane Function (UPF).

[00103] The host QQ116 may be under the ownership or control of a service provider other than an operator or provider of the access network QQ104 and/or the telecommunication network QQ102, and may be operated by the service provider or on behalf of the service provider. The host QQ116 may host a variety of applications to provide one or more service. Examples of such applications include live and pre-recorded audio/video content, data collection services such as retrieving and compiling data on various ambient conditions detected by a plurality of UEs, analytics functionality, social media, functions for controlling or otherwise interacting with remote devices, functions for an alarm and surveillance center, or any other such function performed by a server.

[00104] As a whole, the communication system QQ100 of Figure 11 enables connectivity between the UEs, network nodes, hosts, and devices. In that sense, the communication system may be configured to operate according to predefined rules or procedures, such as specific standards that include, but are not limited to: Global System for Mobile Communications (GSM); Universal Mobile Telecommunications System (UMTS); Long Term Evolution (LTE), and/or other suitable 2G, 3G, 4G, 5G standards, or any applicable future generation standard (e.g., 6G); wireless local area network (WLAN) standards, such as the Institute of Electrical and Electronics Engineers (IEEE) 802.11 standards (WiFi); and/or any other appropriate wireless communication standard, such as the Worldwide Interoperability for Microwave Access (WiMax), Bluetooth, Z-Wave, Near Field Communication (NFC) ZigBee, LiFi, and/or any low-power wide-area network (LPWAN) standards such as LoRa and Sigfox.

[00105] In some examples, the telecommunication network QQ102 is a cellular network that implements 3GPP standardized features. Accordingly, the telecommunications network QQ102 may support network slicing to provide different logical networks to different devices that are connected to the telecommunication network QQ102. For example, the telecommunications network QQ102 may provide Ultra Reliable Low Latency Communication (URLLC) services to some UEs, while providing Enhanced Mobile Broadband (eMBB) services to other UEs, and/or Massive Machine Type Communication (mMTC)/Massive loT services to yet further UEs.

[00106] In some examples, the UEs QQ112 are configured to transmit and/or receive information without direct human interaction. For instance, a UE may be designed to transmit information to the access network QQ104 on a predetermined schedule, when triggered by an internal or external event, or in response to requests from the access network QQ104. Additionally, a UE may be configured for operating in single- or multi-RAT or multi-standard mode. For example, a UE may operate with any one or combination of WiFi, NR (New Radio) and LTE, i.e. being configured for multi-radio dual connectivity (MR-DC), such as E-UTRAN (Evolved-UMTS Terrestrial Radio Access Network) New Radio - Dual Connectivity (EN-DC).

[00107] In the example, the hub QQ114 communicates with the access network QQ104 to facilitate indirect communication between one or more UEs (e.g., UE QQ112c and/or QQ112d) and network nodes (e.g., network node QQllOb).

[00108] Although the devices described herein may include the illustrated combination of hardware components, other embodiments may comprise computing devices with different combinations of components. It is to be understood that these devices may comprise any suitable combination of hardware and/or software needed to perform the tasks, features, functions and methods disclosed herein. Determining, calculating, obtaining or similar operations described herein may be performed by processing circuitry, which may process information by, for example, converting the obtained information into other information, comparing the obtained information or converted information to information stored in the device, and/or performing one or more operations based on the obtained information or converted information, and as a result of said processing making a determination. Moreover, while components are depicted as single boxes located within a larger box, or nested within multiple boxes, in practice, devices may comprise multiple different physical components that make up a single illustrated component, and functionality may be partitioned between separate components. For example, a communication interface may be configured to include any of the components described herein, and/or the functionality of the components may be partitioned between the processing circuitry and the communication interface. In another example, non- computationally intensive functions of any of such components may be implemented in software or firmware and computationally intensive functions may be implemented in hardware.

[00109] In certain embodiments, some or all of the functionality described herein may be provided by processing circuitry executing instructions stored on in memory, which in certain embodiments may be an RL agent and/or computer program product (e.g., including an RL agent) in the form of a non-transitory computer-readable storage medium. In alternative embodiments, some or all of the functionality may be provided by the processing circuitry without executing instructions stored on a separate or discrete device- readable storage medium, such as in a hard-wired manner. In any of those particular embodiments, whether executing instructions stored on a non-transitory computer- readable storage medium or not, the processing circuitry can be configured to perform the described functionality. The benefits provided by such functionality are not limited to the processing circuitry alone or to other components of the device, but are enjoyed by the device as a whole, and/or by end users and a wireless network generally.

[00110] Further definitions and embodiments are discussed below.

[00111] In the above-description of various embodiments of present inventive concepts, it is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of present inventive concepts. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which present inventive concepts belong. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

[00112] When an element is referred to as being "connected", "coupled", "responsive", or variants thereof to another element, it can be directly connected, coupled, or responsive to the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly connected", "directly coupled", "directly responsive", or variants thereof to another element, there are no intervening elements present. Like numbers refer to like elements throughout. Furthermore, "coupled", "connected", "responsive", or variants thereof as used herein may include wirelessly coupled, connected, or responsive. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Well-known functions or constructions may not be described in detail for brevity and/or clarity. The term "and/or" (abbreviated "/") includes any and all combinations of one or more of the associated listed items.

[00113] It will be understood that although the terms first, second, third, etc. may be used herein to describe various elements/operations, these elements/operations should not be limited by these terms. These terms are only used to distinguish one element/operation from another element/operation. Thus a first element/operation in some embodiments could be termed a second element/operation in other embodiments without departing from the teachings of present inventive concepts. The same reference numerals or the same reference designators denote the same or similar elements throughout the specification.

[00114] As used herein, the terms "comprise", "comprising", "comprises", "include", "including", "includes", "have", "has", "having", or variants thereof are open-ended, and include one or more stated features, integers, elements, steps, components or functions but does not preclude the presence or addition of one or more other features, integers, elements, steps, components, functions or groups thereof. Furthermore, as used herein, the common abbreviation "e.g.", which derives from the Latin phrase "exempli gratia," may be used to introduce or specify a general example or examples of a previously mentioned item, and is not intended to be limiting of such item. The common abbreviation "i.e.", which derives from the Latin phrase "id est," may be used to specify a particular item from a more general recitation.

[00115] Example embodiments are described herein with reference to block diagrams and/or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits. These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s). [00116] These computer program instructions may also be stored in a tangible computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks. Accordingly, embodiments of present inventive concepts may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.) that runs on a processor such as a digital signal processor, which may collectively be referred to as "circuitry," "a module" or variants thereof.

[00117] It should also be noted that in some alternate implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Moreover, the functionality of a given block of the flowcharts and/or block diagrams may be separated into multiple blocks and/or the functionality of two or more blocks of the flowcharts and/or block diagrams may be at least partially integrated. Finally, other blocks may be added/inserted between the blocks that are illustrated, and/or blocks/operations may be omitted without departing from the scope of inventive concepts. Moreover, although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.

[00118] Many variations and modifications can be made to the embodiments without substantially departing from the principles of the present inventive concepts. All such variations and modifications are intended to be included herein within the scope of present inventive concepts. Accordingly, the above disclosed subject matter is to be considered illustrative, and not restrictive, and the examples of embodiments are intended to cover all such modifications, enhancements, and other embodiments, which fall within the spirit and scope of present inventive concepts. Thus, to the maximum extent allowed by law, the scope of present inventive concepts are to be determined by the broadest permissible interpretation of the present disclosure including the examples of embodiments and their equivalents, and shall not be restricted or limited by the foregoing detailed description.