Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
BEAMFORMING TECHNIQUE FOR A RADIO NETWORK
Document Type and Number:
WIPO Patent Application WO/2024/033547
Kind Code:
A1
Abstract:
As to a method aspect, a method of jointly controlling a beamforming transceiver and a reconfigurable reflector is provided. The method comprises or initiates a step of obtaining a relative change of received signal strength in a radio communication between the beamforming transceiver and radio devices via the reconfigurable reflector. In addition, the method comprises or initiates a step of jointly determining a first set of beamforming weights for an antenna array of the beamforming transceiver and at least one second set of reflection weights for a reflector array of the reconfigurable reflector using a first agent performing reinforcement learning RL of the beamforming weights and at least one second agent performing RL of the reflection weights. A state of the first agent comprises the first set. A state of each second agent comprises the respective second set. A reward of the first agent and a reward of each second agent comprises the obtained relative change of the received signal strength.

Inventors:
ABDALLAH ASMAA (SA)
CELIK ABDULKADIR (SA)
ELTAWIL AHMED (SA)
MANSOUR MOHAMMAD (LB)
Application Number:
PCT/EP2023/072362
Publication Date:
February 15, 2024
Filing Date:
August 14, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERICSSON TELEFON AB L M (SE)
International Classes:
H04B7/00; H04B7/06
Other References:
NADERI SOORKI MEHDI ET AL: "Ultra-Reliable Indoor Millimeter Wave Communications Using Multiple Artificial Intelligence-Powered Intelligent Surfaces", IEEE TRANSACTIONS ON COMMUNICATIONS, IEEE SERVICE CENTER, PISCATAWAY, NJ. USA, vol. 69, no. 11, 23 August 2021 (2021-08-23), pages 7444 - 7457, XP011888041, ISSN: 0090-6778, [retrieved on 20211116], DOI: 10.1109/TCOMM.2021.3106686
TAN KANG ET AL: "Intelligent Handover Algorithm for Vehicle-to-Network Communications With Double-Deep Q-Learning", IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, IEEE, USA, vol. 71, no. 7, 22 April 2022 (2022-04-22), pages 7848 - 7862, XP011914409, ISSN: 0018-9545, [retrieved on 20220425], DOI: 10.1109/TVT.2022.3169804
Attorney, Agent or Firm:
ERICSSON (SE)
Download PDF:
Claims:
Telefonaktiebolaget LM Ericsson (publ) 38 / 45 P105713WO01 Claims 1. A method (200) of jointly controlling a beamforming transceiver (110) and a reconfigurable reflector (120), the method (200) comprising: obtaining (202) a relative change (150) of received signal strength in a radio communication between the beamforming transceiver (110) and radio devices (130) via the reconfigurable reflector (120); and jointly determining (204) a first set of beamforming weights (112) for an antenna array of the beamforming transceiver and at least one second set of reflection weights (122) for a reflector array of the reconfigurable reflector using a first agent performing reinforcement learning, RL, of the beamforming weights (112) and at least one second agent performing RL of the reflection weights (122), wherein a state of the first agent comprises the first set and a state of each second agent comprises the respective second set, and wherein a reward of the first agent and a reward of each second agent comprises the obtained relative change (150) of the received signal strength. 2. The method (200) of claim 1, wherein the determining (204) of the first set of beamforming weights (112) using the first agent comprises changing the first set, and wherein the relative change (150) of the received signal strength is obtained (202) responsive to the changing of the first set. 3. The method (200) of claims 1 or 2, wherein the determining (204) of the at least one second set of reflection weights (122) using the at least one second agent comprises changing each second set, and wherein the reward of each second agent comprises the relative change (150) of the received signal strength obtained (202) responsive to the changing of the respective one of the at least one second set. 4. The method (200) of claims 2 or 3, wherein the first set of beamforming weights (112) is initialized by a precoder of a DFT codebook, and/or wherein the at least one second set of reflection weights (122) is initialized by at least one precoder of a DFT codebook. Telefonaktiebolaget LM Ericsson (publ) 39 / 45 P105713WO01 5. The method (200) of any one of claims 1 to 4, wherein the first set is determined (204) using the first agent prior to determining (204) the at least one second set using the at least one second agent, optionally wherein the previously determined (204) first set of beamforming weights (112) is applied for the antenna array of the beamforming transceiver while determining (204) the at least one second set using the at least one second agent. 6. The method (200) of any one of claims 1 to 5, wherein the determining (204) of the at least one second set of the reflection weights (122) comprises determining a reflection codebook comprising B second sets of reflection weights (122) for B disjoint groups of the radio devices (130), respectively. 7. The method (200) of claim 6, wherein each of the B second sets is determined (204) using the respective second agent of B second agents performing the RL independently based on the relative change (150) of the received signal strength in the radio communication between the beamforming transceiver (110) and the respective one of the B disjoint groups of the radio devices (130) via the reconfigurable reflector (120) as the reward. 8. The method (200) of claim 6 or 7, further comprising: determining the B groups of the radio devices (130) using K-means clustering based on the received signal strength, optionally based on a received signal strength indicator, RSSI, of the received signal strength. 9. The method (200) of any one of claims 1 to 8, wherein the obtaining (202) of the relative change (150) of the received signal strength comprises at least one of: transmitting reference signals from the beamforming transceiver (110) via the reconfigurable reflector (120) to the radio devices (130) and receiving a feedback from the radio devices (130) that is indicative of the relative change (150) of the received signal strength of the transmitted reference signals; or receiving reference signals from the radio devices (130) via the reconfigurable reflector (120) at the beamforming transceiver (110) and measuring the relative change (150) of the received signal strength of the received reference signals. Telefonaktiebolaget LM Ericsson (publ) 40 / 45 P105713WO01 10. The method (200) of claim 9, wherein the transmitting and/or the receiving of the reference signals uses the first set of beamforming weights (112) and the at least one second set of reflection weights (122), optionally while a deployed first set of beamforming weights (112) other than the first set and at least one deployed second set of reflection weights (122) other than the at least one second set are used for transmitting and/or receiving data in the radio communication. 11. The method (200) of claim 9 or 10, wherein the transmitting and/or the receiving of the reference signals is performed in measurement gaps between a reception of data from the radio devices (130) or a transmission of data to the radio devices (130); and/or wherein the first set and/or the at least one second set is applied only for the transmitting and/or the receiving of the reference signals; and/or wherein the first set and/or the at least one second set is changed only for the purpose of the transmitting and/or the receiving of the reference signals. 12. The method (200) of any one of claims 1 to 11, wherein the obtaining (202) of the relative change (150) in the received signal strength and the jointly determining (204) of the first set and the at least one second set are performed in a RL phase, and wherein the first set replaces a or the deployed first set and/or the at least one second set replaces the at least one deployed second set in a deployment phase after the RL phase. 13. The method (200) of claim 12, wherein different second sets are used at non-overlapping times in the deployment phase for transmitting data to and/or receiving data from different groups of the radio devices (130). 14. The method (200) of any one of claims 1 to 13, wherein the determining (204) of the at least one second set of reflector weights comprises determining (204-1) the reflector weights for one contiguous partition of reflector elements of the reconfigurable reflector in a first stage and determining (204-2) partition weights for multiple partitions of the reflector elements in a second stage, wherein reflector weights of all reflector elements of the reconfigurable reflector result from multiplying the reflector weights for the one contiguous partition with the partition weights for the multiple partitions. Telefonaktiebolaget LM Ericsson (publ) 41 / 45 P105713WO01 15. The method (200) of any one of claims 1 to 14, wherein the obtaining (202) of the relative change (150) of the received signal strength comprises receiving a or the feedback from the radio devices, the feedback being indicative of whether the received signal strength has increased or decreased responsive to a or the changing of the first set and/or the at least one second set. 16. The method (200) of any one of claims 1 to 15, wherein the beamforming transceiver (110) is a base station (110) of a radio access network, RAN, providing radio access to the radio devices (130) via the reconfigurable reflector (120). 17. The method (200) of any one of claims 1 to 16, further comprising at least one of: deploying (206) the first set for precoding a data transmission and/or combining a data reception at the antenna array; or deploying (208) the at least one second set for reflecting a data transmission or a data reception at the reflector array, optimally wherein the one of B second sets is selected for one of the radio devices (130) based on a maximum received signal strength in the radio communication with the one of the radio devices (130). 18. A computer program product comprising program code portions for performing the steps of any one of the claims 1 to 17 when the computer program product is executed on one or more computing devices (1204; 1304; 1404), optionally stored on a computer-readable recording medium (1206; 1306; 1406). 19. A device (100; 1200; 1300; 1400; 1512; 1610; 1620) for jointly controlling a beamforming transceiver (110) and a reconfigurable reflector (120), the device (100; 1200; 1300; 1400; 1512; 1610; 1620) comprising memory (1206; 1306; 1406) operable to store instructions and processing circuitry (1204; 1304; 1404) operable to execute the instructions, such that the device (100; 1200; 1300; 1400; 1512; 1610; 1620) is operable to: obtain a relative change (150) of received signal strength in a radio communication between the beamforming transceiver (110) and radio devices (130) via the reconfigurable reflector (120); and jointly determine a first set of beamforming weights (112) for an antenna array of the beamforming transceiver and at least one second set of reflection weights (122) for a reflector array of the reconfigurable reflector using a first agent Telefonaktiebolaget LM Ericsson (publ) 42 / 45 P105713WO01 performing reinforcement learning, RL, of the beamforming weights (112) and at least one second agent performing RL of the reflection weights (122), wherein a state of the first agent comprises the first set and a state of each second agent comprises the respective second set, and wherein a reward of the first agent and a reward of each second agent comprises the obtained relative change (150) of the received signal strength. 20. The device (100; 1200; 1300; 1400; 1512; 1610; 1620) of claim 19, further operable to perform the steps of any one of claims 2 to 17. 21. A device (100; 1200; 1300; 1400; 1512; 1610; 1620) for jointly controlling a beamforming transceiver (110) and a reconfigurable reflector (120), the device (100; 1200; 1300; 1400; 1512; 1610; 1620) being configured to: obtain a relative change (150) of received signal strength in a radio communication between the beamforming transceiver (110) and radio devices (130) via the reconfigurable reflector (120); and jointly determine a first set of beamforming weights (112) for an antenna array of the beamforming transceiver and at least one second set of reflection weights (122) for a reflector array of the reconfigurable reflector using a first agent performing reinforcement learning, RL, of the beamforming weights (112) and at least one second agent performing RL of the reflection weights (122), wherein a state of the first agent comprises the first set and a state of each second agent comprises the respective second set, and wherein a reward of the first agent and a reward of each second agent comprises the obtained relative change (150) of the received signal strength. 22. The radio device (100; 1100; 1291; 1292; 1330) of claim 21, further configured to perform the steps of any one of claims 2 to 17. 23. A network node (100; 1200; 1300; 1400; 1512; 1610; 1620) comprising memory operable to store instructions and processing circuitry operable to execute the instructions, such that the network node (100; 1200; 1300; 1400; 1512; 1610; 1620) is operable to: obtain a relative change (150) of received signal strength in a radio communication between a beamforming transceiver (110) and radio devices (130) via a reconfigurable reflector (120); and Telefonaktiebolaget LM Ericsson (publ) 43 / 45 P105713WO01 jointly determine a first set of beamforming weights (112) for an antenna array of the beamforming transceiver and at least one second set of reflection weights (122) for a reflector array of the reconfigurable reflector using a first agent performing reinforcement learning, RL, of the beamforming weights (112) and at least one second agent performing RL of the reflection weights (122), wherein a state of the first agent comprises the first set and a state of each second agent comprises the respective second set, and wherein a reward of the first agent and a reward of each second agent comprises the obtained relative change (150) of the received signal strength. 24. The network node (200) of claim 23, further operable to perform any one of the steps of any one of claims 2 to 17. 25. A communication system (1500; 1600) including a host computer (1330; 1410) comprising: processing circuitry (1618) configured to provide user data; and a communication interface (1616) configured to forward the user data to a cellular or ad hoc radio network (300; 1610) for transmission to a user equipment, UE, (130) wherein the UE (130) comprises a radio interface and processing circuitry, wherein the cellular or ad hoc radio network (300; 1610) comprises a base station (100; 1200; 1300; 1400; 1512; 1610; 1620) or a radio device functioning as a gateway, which comprises processing circuitry (1204; 1304; 1404; 1628) configured to execute the steps of claims 1 to 17. 26. The communication system (1500; 1600) of claim 25, further comprising the UE (130). 27. The communication system (1500; 1600) of claim 25 or 26, further comprising the base station (100; 1200; 1300; 1400; 1512; 1610; 1620) or the radio device functioning as a gateway. 28. The communication system (1500; 1600) of any one of claims 25 to 27, wherein: the processing circuitry (1618) of the host computer (1530; 1610) is configured to execute a host application (1612), thereby providing the user data; and Telefonaktiebolaget LM Ericsson (publ) 44 / 45 P105713WO01 the processing circuitry of the UE (130) is configured to execute a client application (1632) associated with the host application (1612).
Description:
Telefonaktiebolaget LM Ericsson (publ) 1 / 45 P105713WO01 Beamforming Technique for a Radio Network Technical Field The present disclosure relates to a beamforming technique for a radio network. More specifically, and without limitation, a method and devices are provided for jointly controlling a beamforming transceiver and reconfigurable reflector of a radio access network (RAN). Background Millimeter wave (mmWave) communication has emerged as a key technology to fulfill beyond fifth-generation (5G) network requirements specified by the Third Generation Partnership Project (3GPP), such as enhanced mobile broadband, massive connectivity, and ultra-reliable low-latency communications. The mmWave band offers an abundant frequency spectrum (e.g., in the range of 30- 300 GHz) at the cost of low penetration depth and high propagation losses. Fortunately, its short-wavelength mitigates these drawbacks by allowing the deployment of large antenna arrays into small form factor transceivers, paving the way for massive multiple-input multiple-output (mMIMO) systems with high directivity gains. Considering the recently increasing interest in exploiting terahertz (THz) for sixth-generation (6G) networks, the number of antenna elements are expected to increase significantly due to further reduced wavelengths so that antenna arrays can overcome severe distance and frequency dependent path loss at THz bands. Even though increasing number of antennas in a mMIMO system inherently yields sharper beams and higher beam gains, the complexity of beamforming increases and requires a more accurate design of precoders and beamformers. To overcome these limitations and improve coverage at higher frequencies, the use of reconfigurable intelligent surfaces (RIS) has come into prominence to have an additional spatial degree of control over the radio channel. The RIS is equipped with many low-cost reconfigurable elements to intelligently control the reflection of radio signals using adjustable phase shifts. In this way, the RIS can improve and reflect signals coming from the base station (BS) to boost the performance of a user equipment (UE) which Line-of-Sight (LoS) has been blocked by obstacles in the surrounding environment. Telefonaktiebolaget LM Ericsson (publ) 2 / 45 P105713WO01 Existing techniques for controlling such complex mMIMO systems require estimating the channel state for each of the UEs. However, this conventional approach is inefficient because the knowledge of the channel for one UE does not contribute to the channel estimation of another UE because of the short wavelength even if the UEs are close to each other. Furthermore, the conventional approach requires substantial control signaling in terms of precoder indicators. Summary Accordingly, there is a need for a beamforming technique that enables accurate beamforming for high-dimensional control spaces. An alternative or more specific object is to determine close-to-optimal sets of phases without the need of estimating channel states and signaling and/or without substantial control signaling. As to a method aspect, a method of jointly controlling a beamforming transceiver and a reconfigurable reflector is provided. The method comprises or initiates a step of obtaining a relative change of received signal strength in a radio communication between the beamforming transceiver and radio devices via the reconfigurable reflector. Alternatively or in addition, the method comprises or initiates a step of jointly determining a first set of beamforming weights for an antenna array of the beamforming transceiver and at least one second set of reflection weights for a reflector array of the reconfigurable reflector using a first agent performing reinforcement learning (RL) of the beamforming weights and at least one second agent performing RL of the reflection weights. A state of the first agent comprises the first set. A state of each second agent comprises the respective second set. A reward of the first agent and a reward of each second agent comprises the obtained relative change of the received signal strength. The method aspect may be implemented alone or in combination with any one of the embodiments disclosed in detail in the description. By communicating the relative change of the received signal strength only, embodiments of the method can determine the first and second sets for jointly controlling a beamforming transceiver and a reconfigurable reflector. For example, Telefonaktiebolaget LM Ericsson (publ) 3 / 45 P105713WO01 the one bit may indicate if the change of the first set and/or the at least one second set increased or decreased the received signal strength. In any embodiment, downlink (i.e., the radio communication in the direction from the beamforming transceiver to the radio devices via the reconfigurable reflector) and/or uplink (i.e., the radio communication from the radio devices via the reconfigurable reflector to the beamforming transceiver) may be used for obtaining the received signal strength. In the latter case, no control signal may be needed at all if the signal strength is received at the beamforming transceiver, which may perform the method. Alternatively or in addition, relative changes in the uplink and downlink received signal strengths may be used in combination for rapid convergence of the first and second sets. Without limitation, for example in a 3GPP implementation, any "radio device" may be a user equipment (UE). Alternatively or in addition, any beamforming transceiver may be a radio base station (briefly: base station) or a radio access network (RAN), e.g., according to a 3GPP specification. Alternatively or in addition, the reconfigurable reflector may be a reconfigurable intelligent surface (RIS). The method aspect may be embodied by a method of determining (e.g., designing or controlling) at least one beamforming codebook (i.e., second sets of reflection weights) for at least one RIS, e.g. for a massive MIMO (mMIMO) network assisted by the at least one RIS. The RAN may comprise at least one base station acting as the beamforming transceiver. Alternatively or in addition, the determining may be based on reinforcement learning (RL), e.g. Deep Reinforcement Learning (DRL). The radio communication may use a radio access technology, and/or the radio devices and/or the beamforming transceiver and/or reconfigurable reflector may form, or may be part of, a radio network, e.g., according to the Third Generation Partnership Project (3GPP) or according to the standard family IEEE 802.11 (Wi-Fi). The method aspect may be performed by one or more embodiments of the beamforming transceiver and/or a dedicated function or node (e.g., in a core network connected to the RAN) and/or the reconfigurable reflector. The RAN may comprise one or more base stations, e.g., performing the method aspect and/or embodying the beamforming transceiver. Alternatively or in Telefonaktiebolaget LM Ericsson (publ) 4 / 45 P105713WO01 addition, the radio network may be a vehicular, ad hoc and/or mesh network comprising two or more radio devices using sidelinks (SLs) for the radio communication and/or wherein at least one of the radio devices acts as the beamforming transceiver. Any of the radio devices may be a 3GPP user equipment (UE) or a Wi-Fi station (STA). The radio device may be a mobile or portable station, a device for machine- type communication (MTC), a device for narrowband Internet of Things (NB-IoT) or a combination thereof. Examples for the UE and the mobile station include a mobile phone, a tablet computer and a self-driving vehicle. Examples for the portable station include a laptop computer and a television set. Examples for the MTC device or the NB-IoT device include robots, sensors and/or actuators, e.g., in manufacturing, automotive communication and home automation. The MTC device or the NB-IoT device may be implemented in a manufacturing plant, household appliances and consumer electronics. Whenever referring to the RAN, the RAN may be implemented by one or more base stations (e.g., embodiments of the beamforming transceiver). The radio devices may be wirelessly connected or connectable (e.g., according to a radio resource control, RRC, state) with the beamforming transceiver. The beamforming transceiver (e.g., a base station) may encompass any station that is configured to provide radio access to any of the radio devices. The beamforming transceiver may be associated with one or more cells, one or more transmission and reception points (TRPs), a radio access node or access point (AP). The beamforming transceiver (e.g., a base station) may provide a data link to a host computer providing user data to the radio devices or gathering user data from the radio devices, e.g., by means of the method aspect. Examples for the beamforming transceiver (e.g., a base station) may include a 3G base station or Node B, 4G base station or eNodeB, a 5G base station or gNodeB, a Wi-Fi AP and a network controller (e.g., according to Bluetooth, ZigBee or Z-Wave). The mMIMO network and/or the RAN may be implemented according to the Global System for Mobile Communications (GSM), the Universal Mobile Telefonaktiebolaget LM Ericsson (publ) 5 / 45 P105713WO01 Telecommunications System (UMTS), 3GPP Long Term Evolution (LTE) and/or 3GPP New Radio (NR). Any aspect of the technique may be implemented on a Physical Layer (PHY), a Medium Access Control (MAC) layer, a Radio Link Control (RLC) layer, a packet data convergence protocol (PDCP) layer, and/or a Radio Resource Control (RRC) layer of a protocol stack for the radio communication. Herein, referring to a protocol of a layer may also refer to the corresponding layer in the protocol stack. Vice versa, referring to a layer of the protocol stack may also refer to the corresponding protocol of the layer. Any protocol may be implemented by a corresponding method. As to another aspect, a computer program product is provided. The computer program product comprises program code portions for performing any one of the steps of the method aspect disclosed herein when the computer program product is executed by one or more computing devices. The computer program product may be stored on a computer-readable recording medium. The computer program product may also be provided for download, e.g., via the radio network, the RAN, the Internet and/or the host computer. Alternatively, or in addition, the method may be encoded in a Field-Programmable Gate Array (FPGA) and/or an Application-Specific Integrated Circuit (ASIC), or the functionality may be provided for download by means of a hardware description language. As to a device aspect, a device, which may be configured to perform any one of the steps of the method aspect, is provided. As to a further device aspect, a device according to any one of the embodiments of the device aspect disclosed in detail in the description is provided. The device comprises processing circuitry (e.g., at least one processor and a memory). Said memory comprises instructions executable by said at least one processor whereby the device is operative to perform any one of the steps of the method aspect. As to a still further aspect a communication system including a host computer is provided. The host computer comprises a processing circuitry configured to provide user data. The host computer further comprises a communication interface configured to forward the data to a cellular network (e.g., the RAN Telefonaktiebolaget LM Ericsson (publ) 6 / 45 P105713WO01 and/or the base station) for transmission to a UE. A processing circuitry of the cellular network is configured to execute any one of the steps of the method aspect. The UE comprises a radio interface and processing circuitry, which is configured to execute any one of the steps of the method aspect. The communication system may further include the UE. Alternatively, or in addition, the cellular network may further include one or more base stations configured for radio communication with the UE and/or to provide a data link between the UE and the host computer using the method aspect. The processing circuitry of the host computer may be configured to execute a host application, thereby providing the data and/or any host computer functionality described herein. Alternatively, or in addition, the processing circuitry of the UE may be configured to execute a client application associated with the host application. Any one of the devices, the UE, the base station, the communication system or any node or station for embodying the technique may further include any feature disclosed in the context of the method aspect, and vice versa. Particularly, any one of the units and modules disclosed herein may be configured to perform or initiate one or more of the steps of the method aspect. Brief Description of the Drawings Further details of embodiments of the technique are described with reference to the enclosed drawings, wherein: Fig.1 shows a schematic block diagram of an embodiment of a device for jointly controlling a beamforming transceiver and a reconfigurable reflector; Fig.2 shows a flowchart for an embodiment of a method of jointly controlling a beamforming transceiver and a reconfigurable reflector, which the method may be implementable by the device of Fig.1; Fig.3 schematically illustrates a radio access network including a base station embodiment of the beamforming transceiver and an embodiment of the reconfigurable reflector for performing the method of Fig.2; Telefonaktiebolaget LM Ericsson (publ) 7 / 45 P105713WO01 Fig.4 schematically illustrates a radio access network comprising a first agent at an embodiment of the beamforming transceiver and multiple second agents at an embodiment of the reconfigurable reflector for performing the method of Fig.2; Fig.5 schematically illustrates an optional step of clustering the radio devices, which may be performed after determining the first set by the first agent and prior to determining the at least one second set by the at least one second agent in an embodiment of the method of Fig.2; Fig.6 schematically illustrates an implementation of determining the at least one second set in an embodiment of the method of Fig.2; Fig.7 schematically illustrates an embodiment of reconfigurable reflector, which may be used in combination with any embodiment of the device of Fig.1 or the method of Fig.2; Fig.8 shows a schematic block diagram of an embodiment of the first agent in conjunction with the beamforming transceiver or an embodiment of any one of the at least one second agent in conjunction with the reconfigurable reflector; Fig.9 shows a flowchart of an embodiment of the method of Fig.2 including an embodiment of the step of Fig.5; Fig.10 shows a flowchart of an embodiment of optional steps for deploying the jointly determined weights in an embodiment of the method of Fig.2; Fig.11 schematically illustrates an implementation of the deployment of the second sets of reflection weights, which may be performed in an embodiment of the method of Fig.2; Fig.12 schematically illustrates a base station embodiment of the device of Fig.1; Fig.13 schematically illustrates a reconfigurable reflector embodiment of the device of Fig.1; Telefonaktiebolaget LM Ericsson (publ) 8 / 45 P105713WO01 Fig.14 schematically illustrates a cloud embodiment of the device of Fig.1; Fig.15 schematically illustrates an example telecommunication network connected via an intermediate network to a host computer; Fig.16 shows a generalized block diagram of a host computer communicating via a base station or radio device functioning as a gateway with a user equipment over a partially wireless connection including the radio communication of Fig.1; and Figs.17 and 18 show flowcharts for methods implemented in a communication system including a host computer, a base station or radio device functioning as a gateway and a user equipment. Detailed Description In the following description, for purposes of explanation and not limitation, specific details are set forth, such as a specific network environment in order to provide a thorough understanding of the technique disclosed herein. It will be apparent to one skilled in the art that the technique may be practiced in other embodiments that depart from these specific details. Moreover, while the following embodiments are primarily described for a New Radio (NR) or 5G implementation, it is readily apparent that the technique described herein may also be implemented for any other radio communication technique, including a Wireless Local Area Network (WLAN) implementation according to the standard family IEEE 802.11, 3GPP LTE (e.g., LTE-Advanced or a related radio access technique such as MulteFire), for Bluetooth according to the Bluetooth Special Interest Group (SIG), particularly Bluetooth Low Energy, Bluetooth Mesh Networking and Bluetooth broadcasting, for Z-Wave according to the Z-Wave Alliance or for ZigBee based on IEEE 802.15.4. Moreover, those skilled in the art will appreciate that the functions, steps, units and modules explained herein may be implemented using software functioning in conjunction with a programmed microprocessor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP) or a general-purpose computer, e.g., including an Advanced RISC Machine Telefonaktiebolaget LM Ericsson (publ) 9 / 45 P105713WO01 (ARM). It will also be appreciated that, while the following embodiments are primarily described in context with methods and devices, the invention may also be embodied in a computer program product as well as in a system comprising at least one computer processor and memory coupled to the at least one processor, wherein the memory is encoded with one or more programs that may perform the functions and steps or implement the units and modules disclosed herein. Fig.1 schematically illustrates a block diagram of an embodiment of a device for jointly controlling a beamforming transceiver and a reconfigurable reflector. Herein, the device is generically referred to by reference sign 100. The beamforming transceiver is generically referred to by reference sign 110. The reconfigurable reflector is generically referred to by reference sign 120. The radio devices generically referred to by reference sign 130. The device 100 may be an embodiment of the device aspect. Alternatively or in addition, the device 100 may comprise a signal strength obtaining module 102 that obtains (e.g., receives from the radio devices 130 and/or measures at the beamforming transceiver 110) a relative change of a received signal strength in a radio communication between the beamforming transceiver 110 and radio devices 130 via the reconfigurable reflector 120. The device 100 further comprises a weights determining module 104 that jointly determines a first set of beamforming weights for an antenna array of the beamforming transceiver 110 and at least one second set of reflection weights for a reflector array of the reconfigurable reflector 120. A first agent performs reinforcement learning (RL) of the beamforming weights (112) and at least one second agent performs RL of the reflection weights. A state of the first agent comprises the first set and a state of each second agent comprises the respective second set. A reward of the first agent and a reward of each second agent comprises the obtained relative change of the received signal strength. Any of the modules of the device 100 may be implemented by units configured to provide the corresponding functionality. Telefonaktiebolaget LM Ericsson (publ) 10 / 45 P105713WO01 The device 100 may also be referred to as, or may be embodied by, the beamforming transceiver 110 (or briefly: transceiver). The beamforming transceiver 110 and the radio devices 130 may be in direct radio communication via the reconfigurable reflector 120. Fig.2 shows an example flowchart for a method 200 of jointly controlling a beamforming transceiver 110 and a reconfigurable reflector 120. In a step 202, a relative change of a received signal strength is obtained in a radio communication (e.g., in the uplink and/or in the downlink and/or in a sidelink) between the beamforming transceiver 110 and radio devices 130 via the reconfigurable reflector 120. In a step 204, a first set of beamforming weights for an antenna array of the beamforming transceiver and at least one second set of reflection weights for a reflector array of the reconfigurable reflector are jointly determined using a first agent performing RL of the beamforming weights and at least one second agent performing RL of the reflection weights. A state of the first agent comprises the first set. A state of each second agent comprises the respective second set. A reward of the first agent and a reward of each second agent comprises the obtained relative change of the received signal strength. The method 200 may be performed by the device 100. For example, the modules 102 and 104 may perform the steps 202 and 204, respectively. The first set and the at least one second set may be jointly determined by determining the at least one second set based on the determined first set. Alternatively or in addition, the first set and the at least one second set may be jointly determined by using the obtained relative change of the received signal strength as a common reward for the first agent and the at least one second agent. Herein, the received signal strength may be represented by a received signal strength indicator (RSSI). Alternatively or in addition, the received signal strength may comprise at least one of a signal-to-noise ratio (SNR), a signal-to-interference- plus-noise-ratio (SINR), a reference signal received power (RSRP), and a reference signal received quality (RSRQ). Telefonaktiebolaget LM Ericsson (publ) 11 / 45 P105713WO01 The beamforming transceiver may comprise the antenna array (e.g., an antenna system comprising an array of antenna elements). The beamforming weights may control at least one of a phase and a gain of the antenna elements of the antenna array. The phase may be a relative phase between the antenna elements and/or the gain may be a relative gain between the antenna elements. The beamforming weights may be complex-valued, e.g. comprising the phase and an amplitude (or absolute value) as the gain (e.g., other than 1). Alternatively or in addition, the beamforming transceiver may comprise a precoder that applies the beamforming weights for the antenna array, i.e. that uses the beamforming weights for a beamformed transmission at the beamforming transceiver, and/or may comprise a combiner that applies the beamforming weights for the antenna array, i.e. that uses the beamforming weights for a beamformed reception at the beamforming transceiver. The reconfigurable reflector may comprise the reflector array (e.g., a reconfigurable reflection surface comprising an array of reflector elements). The reflector weights may control at least one of a phase and a gain of the reflector elements of the reflector array. The phase may be a relative phase between the reflector elements and/or the gain may be a relative gain between the reflector elements. The reflector weights may be complex-valued, e.g. comprising the phase and an amplitude (or absolute value) as the gain (e.g., other than 1). For example, the reconfigurable reflector may comprise a controller that applies the reflector weights for the reflector array, i.e. that uses the reflector weights for a beamformed reflection at the beamforming transceiver. The beamforming weights may be applied in an analog or digital domain of the beamforming transceiver. Alternatively or in addition, the reflector weights may be applied in an analog or digital domain of the reconfigurable reflector. The beamforming transceiver may be an active node of the radio communication. For example, the transceiver may encode a radio signal to be transmitted using the beamforming weights and/or may decode a radio signal received using the beamforming weights. The reconfigurable reflector may be a passive node of the radio communication. For example, the reconfigurable reflector may (at least partly) redirect incident electromagnetic energy of the radio signal to reflected electromagnetic energy without reading or processing the radio signal. Telefonaktiebolaget LM Ericsson (publ) 12 / 45 P105713WO01 The determining 204 of the first set of beamforming weights 112 using the first agent comprises changing the first set. The relative change 150 of the received signal strength is obtained 202 responsive to the changing of the first set. The changing of the first set may be performed (e.g., by the first agent) alternately with the obtaining 202 of the relative change 150 of received signal strength. Alternatively or in addition, the changing of the first set may correspond to an action of the first agent. The determining 204 of the at least one second set of reflection weights 122 using the at least one second agent comprises changing each second set. The reward of each second agent comprises the relative change 150 of the received signal strength obtained 202 responsive to the changing of the respective one of the at least one second set. The changing of the at least one second set may be performed alternately with the obtaining of the relative change 150 of received signal strength. Alternatively or in addition, the changing of the at least one second set may correspond to an action of the respective second agent. The changing of each of the first set and the at least one second set may correspond to a change of the state of the respective agent. Each of the first set and/or the at least one second set may be changed (i.e., the action may be performed, e.g., determined, decided, selected, or chosen) according to any mechanism of RL, e.g. using at least one of exploration, exploitation, and epsilon- greedy action selection. Exploration may allow the respective agent to improve its current knowledge about each action, which improve the first set and/or the at least one second set in the long term of the determining step 204. Improving the accuracy of an estimated action-values can enable the respective agent to make more informed decisions on the action in the future. Exploitation may refer to chooses the greedy action to get the most reward by exploiting the current action-value estimates of the respective agent. By being greedy with respect to action-value estimates may not get the most reward in the long term of the determining step 204 and can lead to sub-optimal behavior. Telefonaktiebolaget LM Ericsson (publ) 13 / 45 P105713WO01 Epsilon-greedy action selection may be an example for a mechanism of RL that balances exploration and exploitation by choosing between exploration and exploitation randomly. Herein, epsilon ( ^^^^) may refer to the probability of choosing to explore. The epsilon-greedy action selection may exploit most of the time with a small chance of exploring. The first set of beamforming weights 112 is initialized by a precoder of a DFT codebook, and/or the at least one second set of reflection weights 122 is initialized by at least one precoder of a DFT codebook. The DFT codebook may be constructed from a DFT matrix. The precoder may correspond to a vector (e.g., a row or column vector) of the DFT matrix. The first set is determined 204 using the first agent prior to determining 204 the at least one second set using the at least one second agent. Optionally, the previously-determined 204 first set of beamforming weights 112 is applied for the antenna array of the beamforming transceiver while determining 204 the at least one second set using the at least one second agent. The previously-determined first set of beamforming weights may be applied for the beamformed transmission and/or the beamformed reception of the beamforming transceiver while determining 204 the at least one second set using the at least one second agent. The previously-determined first set of beamforming weights may be kept unchanged while the at least one second set is changed for the determining 204 of the at least one second set. The determining 204 of the at least one second set of the reflection weights 122 comprises determining a reflection codebook comprising B second sets of reflection weights 122 for B disjoint groups of the radio devices 130, respectively. The number of B second sets may be an integer greater than one. The groups may be the result of a (e.g., spatial) clustering, (e.g., only) based on the received signal strength. Each of the B second sets is determined 204 using the respective second agent of B second agents performing the RL independently based on the relative change 150 of the received signal strength in the radio communication between the beamforming transceiver 110 and the respective one of the B disjoint groups of the radio devices 130 via the reconfigurable reflector 120 as the reward. Telefonaktiebolaget LM Ericsson (publ) 14 / 45 P105713WO01 Optionally, determining the B groups of the radio devices 130, using K-means clustering based on the received signal strength, is based on a received signal strength indicator, RSSI, of the received signal strength. The obtaining 202 of the relative change 150 of the received signal strength can be performed by transmitting reference signals from the beamforming transceiver 110 via the reconfigurable reflector 120 to the radio devices 130 and receiving a feedback from the radio devices 130 that is indicative of the relative change 150 of the received signal strength of the transmitted reference signals. Alternatively or in addition, the obtaining 202 of the relative change 150 of the received signal strength can be performed by receiving reference signals from the radio devices 130 via the reconfigurable reflector 120 at the beamforming transceiver 110 and measuring the relative change 150 of the received signal strength of the received reference signals. The transmitted reference signals may comprise at least one of channel state information (CSI) reference signals (CSI-RS), demodulation reference signals (DMRS), positioning reference signals (PRS), and a synchronization signals (e.g., a synchronization signal block, SSB). Alternatively or in addition, the received reference signals may comprise at least one of sounding reference signals (S-RS), and demodulation reference signals (DM-RS). The transmitting and/or the receiving of the reference signals may use the first set of beamforming weights 112 and/or the at least one second set of reflection weights 122. For example, the (e.g., changed) first set and/or the (e.g., changed) at least one second set may be used only for the transmitting and/or the receiving of the reference signals and/or may be used only for the obtaining of the relative change of the received signal strength and/or the determining of the first set and/or the at least one second set. Transmitting and/or receiving data in the radio communication may use a deployed first set of beamforming weights 112 and/or at least one deployed second set of reflection weights 122. The first set may replace the deployed set and/or the at least one second set may replace the at least one deployed second set after an RL phase (also: learning phase) and/or in a deployment phase. The RL phase may comprise the obtaining of the relative change of the received signal Telefonaktiebolaget LM Ericsson (publ) 15 / 45 P105713WO01 strength and the jointly determining of the first set and the at least one second set. The determining of the first set and/or the determining of the at least one second set may be performed in measurement gaps and/or between a reception of data from the radio devices 130 or a transmission of data to the radio devices 130. The transmitting and/or the receiving of the reference signals may be performed in measurement gaps between a reception of data from the radio devices 130 or a transmission of data to the radio devices 130. The first set and/or the at least one second set may be applied only for the transmitting and/or the receiving of the reference signals; and/or the first set and/or the at least one second set may be changed only for the purpose of the transmitting and/or the receiving of the reference signals. For concreteness and not limitation, the following embodiments and advantages are described for a base station embodying the beamforming transceiver 110, of a radio access network, RAN, providing radio access to user equipments (UEs) embodying the radio devices 130 via a reconfigurable intelligent surface (RIS) embodying the reconfigurable reflector 120. Conventionally, a perfect design of a precoder and/or a combiner (e.g., the base station) needs perfect channel state information (CSI), which is not readily available and hard to obtain in mMIMO systems using mmWave and/or THz due to the following reasons: (1) There is no direct access to the different antenna elements in the array since the cascaded channel is seen through the analog combining network, which forms a compression stage for the received signal when the number of RF chains is much smaller than the number of antennas ( ^^^^ ^^^^ ^^^^ < ^^^^); (2) the large channel bandwidth of mmWave/THz bands yields high noise power and low received signal-to-noise ratio (SNR) before beamforming; and (3) the large size of channel matrices increases the complexity and overhead associated with existing techniques for precoding and/or combining as well as methods for performing channel estimation. Since finding an optimal vector (i.e., first set) of a precoder and/or decoder is conventionally not practical due to the aforementioned CSI acquisition challenges, Telefonaktiebolaget LM Ericsson (publ) 16 / 45 P105713WO01 existing wireless standards typically employ predefined codebooks, which is a set of precoding matrices determined based on radio front-end hardware parameters. Current beamforming technologies exploited in the cellular networks depend on predefined codebooks that consists of a set of matrices for precoding (e.g., for transmitting) and/or combining (e.g., for receiving). During the signaling process, the existing transceivers exchange precoding matrix indicators to decide beams giving the best performance. There are two critical limitations on using predefined codebooks: Firstly, codebooks follow phase shift tables defined in standards, that is, they are not site-specific and adaptive to the dynamic environment of the mobile wireless networks. Secondly, the codebook is a strict subset of all possible phase shift combinations. For example, a base station (BS) with 32 antennas each with 3-bit phase quantization yields 2 96 different beamforming vectors. For the same BS, a typical codebook based on discrete Fourier transform (DFT codebook) has 32 beams, which can be slightly increased by adjusting an oversampling ratio. In addition to the challenges mentioned above, the existing RIS-assisted mMIMO systems need to tackle the following additional CSI acquisition challenges: (4) independent estimation of BS-RIS and RIS-UE channels requires pilot transmission from the RIS, which is not always possible due to the passive nature of RIS; and (5) the cascaded channel estimation poses the most problematic scenario as previous challenges exacerbate with number of RIS elements, which is typically much higher than number of BS antennas ( ). In conclusion, exploiting the full potential gains of the RIS 120 relies on finding the optimal RIS configuration (e.g., the at least one second set of reflection weights) that yields the best phase shifts to reflect the incident signals toward the target receivers (e.g., UEs 130). Conventionally, pre-designed reflection beam codebooks are used to scan all possible directions, i.e., DFT codebooks mentioned above. However, in large RIS setups, these codebooks pose several challenges: (i) An existing codebook typically relies on channel knowledge, which is difficult to acquire due to the passive nature of RIS, and (ii) the pre-defined codebooks normally require significant beam training overhead and are not adaptive to the environment. Embodiments of the technique according to the method 200 and/or the device 100 can comprise various deep learning methods for massive multiple-input Telefonaktiebolaget LM Ericsson (publ) 17 / 45 P105713WO01 multiple-out (mMIMO) wireless networks with or without assistance of a reconfigurable intelligent surfaces (RIS) 120. The RAN is generically referred to by the reference signs 300. Any embodiment may comprise or use at least one of the following features of the RIS 120. The RIS 120 may comprise meta-surfaces that can be electronically tuned to manipulate the electromagnetic (EM) wave properties of impinging signals (e.g., polarization, amplitude, phase, etc.) in a wide frequency range such as microwave, millimeter wave (mmWave), terahertz (THz), etc. according the reflection weights. Different manipulation techniques may yield various effects on incoming EM waves such as reflection, refraction, absorption, polarization, splitting, and focusing. Based on the underlying effect, RIS facilitate a degree of control over the wireless channel between transceivers to improve overall communication performance. A common interest in employing RIS is establishing a link between transceivers, where a line-of-sight (LoS) link is blocked by large obstacles. This is especially a challenging issue at mmWave and THz bands due to significant propagation and penetration losses. Even if there is a LoS link, the RIS may still offer a promising performance improvement if the direct link is weak due to distance, scatterers, absorbers, and detrimental atmospheric conditions. Even though the invention is presented considering the worst-case scenario with no LoS between transceivers, it is still applicable to cases with LoS existence between the transceivers. Moreover, the invention is not limited to underlying RIS technology, a detailed background is provided at the end. Any embodiment may comprise or use deep reinforcement learning (DRL) for the joined determining 204. DRL is a powerful artificial intelligence technique due to its powerful exploration capabilities has been considered as a promising candidate to handle the dynamic problem in complicated environments. Compared to the deep learning (DL) approaches, the DRL technique does not require a large amount of training data, which might be very difficult to obtain in wireless communication systems. By leveraging the powerful exploration capabilities of DRL, we develop a low complexity, yet efficient multi-agent (MA) DRL based approach for designing joint active BS beamforming and RIS reflection beam codebook. An embodiment of the technique may account for the practical hardware limitations, such as the quantized phase shifter constraints on the BS 110 and the Telefonaktiebolaget LM Ericsson (publ) 18 / 45 P105713WO01 RIS 120 and/or does not require any explicit channel knowledge (e.g., CSI). For example, any embodiment may rely only on receive power measurements (i.e., the received signal strength), which relaxes the synchronization requirements and the channel estimation overhead. Furthermore, the disclosed technique may serve as a general solution for both active and passive beamforming problem, e.g., any RAN 300 that includes a cascade beamforming, wherein the first and second sets are jointly determined in the step 204 by a cascaded learning and combining framework that highly reduces the convergence time. In any aspect, the framework of multiple agents performing DRL (multi-agent or MA-DRL) according to an embodiment of the method 200 learns how to jointly optimize the active beamforming at the 110 side of the BS and designs the reflection beam codebook (i.e., the second set) at the side of the RIS 120 relying only on a one-bit feedback obtained as an example of the relative change of the received signal strength. The proposed MA-DRL framework adopts a multi-level learning approach that transfers the learning between the multiple RIS subarrays, which speeds up the learning convergence and significantly reduces the computational complexity for large RIS surfaces. To this aim, the multi-level learning reduces the search space (e.g., the configuration space comprising the first and second sets) by dividing the RIS into multiple partitions. Simulation results show that the proposed learning framework can learn optimized active BS beamforming and RIS reflection codebook. In any aspect, the agent may perform machine learning for determining the sets (e.g., deep reinforcement learning) and/or for clustering the radio devices 130 (e.g., k-means). The reflection of the beam defined by the beamforming transceiver 110 may also be referred to as beamforming, e.g., as cascaded beamforming. The second sets may be collectively referred to as a codebook of the reconfigurable reflector. The RAN comprising at least one of the beamforming transceiver and the reconfigurable reflector may be cell-free (e.g., spatially structured by means of beams defined by the first and second sets) and/or may be a massive MIMO (mMIMO) network. Telefonaktiebolaget LM Ericsson (publ) 19 / 45 P105713WO01 In any aspect, the technique may be applied to uplink (UL), downlink (DL) or direct communications between radio devices, e.g., device-to-device (D2D) communications or sidelink (SL) communications. Each of the beamformed transceiver 110 and the radio devices 130 may be a radio device or a base station. Herein, any radio device may be a mobile or portable station and/or any radio device wirelessly connectable to a base station or RAN, or to another radio device. For example, the radio device may be a user equipment (UE), a device for machine-type communication (MTC) or a device for (e.g., narrowband) Internet of Things (IoT). Two or more radio devices may be configured to wirelessly connect to each other, e.g., in an ad hoc radio network or via a 3GPP SL connection. Furthermore, any base station may be a station providing radio access, may be part of a radio access network (RAN) and/or may be a node connected to the RAN for controlling the radio access. For example, the base station may be an access point, for example a Wi-Fi access point. Herein, whenever referring to noise or a signal-to-noise ratio (SNR), a corresponding step, feature or effect is also disclosed for noise and/or interference or a signal-to-interference-and-noise ratio (SINR). Fig.3 schematically illustrates a radio access network (RAN) 300 comprising a base station embodying the beamforming transceiver 110 and using an embodiment of a reconfigurable reflector 120 (e.g., a RIS) for massive multiple-input multiple- output (mMIMO), e.g., a RIS-Aided mMIMO network 300. The illustration of an RIS-aided mMIMO network 300 provided in Fig.3 shows a scenario in which line-of-sight (LoS) radio links between the base station (BS) 110 and user equipments (UEs) embodying the radio devices 130 are blocked by a large obstacle. While UEs 130 may be equipped with single or multiple antennas, the BS 110 is equipped with antennas, where ^^^^ ^^^^ and ^^^^ denote the number of antennas in vertical and horizontal directions, respectively. Denoting number of RF chains at the BS by ^^^^ ^^^^ ^^^^ , the MIMO technology deployed in the BS may follow all- digital ( ^^^^ ^^^^ ^^^^ = ^^^^), analog ( ^^^^ ^^^^ ^^^^ = 1) or hybrid (1 < ^^^^ ^^^^ ^^^^ < ^^^^) architecture. Likewise, the RIS has a total of elements, where ^^^^ ^^^^ and ^^^^ denote the number of antennas in vertical and horizontal directions, respectively. Telefonaktiebolaget LM Ericsson (publ) 20 / 45 P105713WO01 The RIS 120 can be further partitioned into ^^^^ partitions each with elements such that (or M=P² x K) , , and complex valued BS- channel matrix and RIS-UE u channel vector are denoted by and with ^^^^ ^^^^ and ^^^^ ^^^^ paths, respectively. Accordingly, the cascaded by (e.g., including a matrix product between the channel state G and the channel state diag (h u r ) . Notice that the channel state ^^^^ is common for all UEs 130 and has a much longer coherence time than the RIS-UE channels, since the locations of the BS 110 and RIS 120 are typically fixed. On the other hand, the RIS-UE channels ℎ ^ ^ ^ ^ ^ ^ ^ ^ are user specific and dynamic due to the mobility, which yields a shorter coherence time than BS-RIS channel. The vector for a precoder and/or combiner of the BS 110 is given by wherein the phase shift elements, can take any value from ^^^^ ^^^^ ^^^^ -bit uniform quantization over to 2 ^^^^ ^^^^ ^^^^ ^^^^ precoder and/or combiner combinations. The UEs with similar channel characteristics (e.g., co-located UEs) are clustered and served with the same reflection beam. Denoting a number of clusters (or reflection beams) by ^^^^, the reflection beam vector for the b th cluster is given by wherein phase shift elements, can take any value from ^^^^ ^^^^ ^^^^ ^^^^ -bit uniform quantization over leading to 2 ^^^^ ^^^^ ^^^^ ^^^^ ^^^^ reflection beam combinations per cluster. Without loss of generality, let us consider a single antenna UE for the sake of observes the received signal-to-noise-ratio (SNR) as , transmit SNR and is the cascaded beamforming gain of UE u of b th cluster. A design of precoder/combiner vector and cluster reflection beam vector requires perfect channel state information (CSI), which is not readily available and hard to obtain as mentioned previously. Telefonaktiebolaget LM Ericsson (publ) 21 / 45 P105713WO01 For instance, let us consider how could be determined in the 5G-NR standard. UEs select precoder matrices I and Type II codebooks and report its selection with a precoding matrix indicator (PMI) to the BS during radio resource control (RRC) configuration. Both Type I and Type II codebooks are constructed from 2-D discrete Fourier transform (DFT) based grid of beams and enable the CSI feedback of beam selection as well as PSK based co-phase combining between two polarizations. The DFT codebook for is computed based on the number of antennas in horizontal vertical ^ ^^^ and ^^^^ as well as DFT oversampling ratios ^^^^ and ^^^^ ^^^^ for horizontal and vertical direction, respectively. The oversampling ratio simply indicates the beam sweeping granularity and enables a finer beam tracking at the cost of storing a larger codebook and beam training. Any embodiment of the method 200 (e.g., the proposed MA-DRL algorithm) jointly determines (i.e., designs) the BS beamforming vectors (i.e., the first set) and the RIS reflection codebook (i.e., the second sets) and finds a near-optimal solution over the huge search space mentioned above. The method 200 (e.g., using MA-DRL) differs from a single-agent DRL (SA-DRL) approach in cooperating and acting jointly to achieve a common ultimate reward. The MA-DRL is especially suitable for complex problems that can be decomposable into sub-problems, each of which is handled by a single DRL agents. In this manner, we decompose the joint master problem into two sub-problems: In the former, a SA-DRL obtains the BS combiner (i.e., active beamforming) for a given RIS reflection vector. While the learned is common for all users/clusters since the RIS-BS channel is shared, user groups observe distinct channel characteristics to/from RIS. Therefore, the latter sub-problem group users with similar UE-RIS channels into B clusters and exploits B single DRL agents such that each DRL agent is responsible to design reflection (i.e., passive beamforming) vector of b th cluster. The proposed approach operates in mainly two phases, multi-agent learning phase and deployment phase. The obtaining 202 of the relative change 150 in the received signal strength and the jointly determining 204 of the first set and the at least one second set may be performed in a RL phase where the first set replaces a or the deployed first set and/or the at least one second set replaces the at least one deployed second set in a deployment phase after the RL phase. Different second sets are used at non- Telefonaktiebolaget LM Ericsson (publ) 22 / 45 P105713WO01 overlapping times in the deployment phase for transmitting data to and/or receiving data from different groups of the radio devices 130. The method 200 may comprise a Multi-Agent Learning Phase (or mode), e.g., according to the steps 202 and 204. This mode may be executed first, wherein the MA-DRL agents are trained to learn the BS beamformer and/or combiner and the RIS reflection beam codebook from UEs 130 with established links with minimal impact on the wireless system performance. Preferably, the multi-agent learning phase is executed in the background to collect information (e.g., according to the step 202) over a relatively long period of time. During the learning phase mode, the BS 110 and RIS 120 may first exploit the traditional predefined codebooks, then occasionally generate their own precoding matrices based on what is learned and validate their learning performance. In this regard, the method 200 may be implemented in a backward compatible manner and/or may be operate together with legacy technologies and standards. The method 200 may further comprise a Deployment Phase (or mode), e.g. according to below-mentioned steps 206 and/or 208. Once the learning is converged, the learned beamformer (i.e., the first set) and RIS codebook (i.e., the at least one second set) replaces the initial (e.g., conventionally predefined) codebooks. Since the RIS 120 may be configured to one of the B beams at a time, the RIS 120 can serve one cluster at a time. Preferably, the BS 110 is responsible for scheduling the exploitation (i.e., the controlling or beam steering) of the RIS 120 based on the quality-of-service (QoS) demands. During the deployment stage, although UEs 130 that have similar channels will probably be assigned to the same RIS reflection vector (i.e., the same second set), these UEs are assumed to be scheduled at different time or frequency resources to avoid the possible interference between them. For example, the same RIS reflection vector (i.e., the same second set) may serve multiple UEs at different sub-bands or in different time slots. Moreover, since the RIS partitioning is virtual not physical, the BS 110 may divide RIS into P=B partitions and use each partition to serve each cluster. In this case, Telefonaktiebolaget LM Ericsson (publ) 23 / 45 P105713WO01 P=B partitions can serve P=B clusters at the same time. Even if using the entire RIS for a cluster at a time may yield a narrower beam with higher gain, having P=B beams with relatively wider beam with relatively lower beam gain may also be beneficial in certain cases. The proposed approach is applicable to both cases. Fig.4 schematically illustrates a RAN 300 performing multi-agent deep reinforcement learning (MA-DRL) for jointly determining the beamforming weights 112 of a precoder and/or a combiner of the BS 110 and the reflection weights 122 of the RIS 120 according to an embodiment of the method 200. The multi-agent learning phase (or mode) is illustrated in Fig.4. In substep 202-0 and 204-0, a DRL 0 agent (i.e., the first agent) is used to learn the BS beamformer the BS precoder (i.e., the first set 112) is learned, B second agents (e.g., DRL agents) are used in further substeps of the steps 202 and 204 are used to learn the RIS reflection vectors (i.e., the second sets 122) for the B clusters to form the overall RIS reflection codebook. BS Precoder Design: In an embodiment, first a random RIS reflection is fixed for all UEs 130 while determining (e.g., optimizing) the first set 112 (i.e., the BS combining and/or precoding pattern) using the first agent (e.g., DRL 0 ). It is worth mentioning that the BS beam pattern is common among all the UEs and/or clusters, since BS-RIS channel is shared due to the fixed BS and RIS locations. Based on RSSI feedback, the base station would train the agent to achieve the maximal reward which is the maximal beamforming . The feedback may be specified to be , which denotes the bi-level (or binary-valued) reward determined by the gain the current chosen precoder vectors, i.e., (t) and . That is, ^^^^( ^^^^) = +1 > , ^^^^( ^^^^) = −1 otherwise. RIS Reflection Codebook Design: In order to reduce complexity and required codebook storage at the RIS, the fact that some UEs 130 share similar channels to and/or from the RIS 120 may be leveraged (e.g., as illustrated in Fig.5). Therefore, instead of individual learning reflection codebook for each UE 130, the determining of the second sets 122 may exploit B independent second agents (e.g., DRL agents) to learn RIS reflection pattern of B user clusters, a collection of which forms the RIS reflection codebook (i.e., the B second sets). Telefonaktiebolaget LM Ericsson (publ) 24 / 45 P105713WO01 Any embodiment may perform a step 500 of k-Means UE Clustering, e.g., including at least one of the following steps or features. Since the technique does not depend on explicit CSI that is not readily available, a k-means classifier exploits a set of RIS sensing beams (cf. Fig.5) that are randomly sampled from the feasible set of (e.g., at least k) RIS reflection beams. First, we use the obtained BS beamforming vector , and then use randomly sampled RIS sensing beams (or reflection vectors) as illustrated in Fig.5. The purpose of utilizing these RIS sensing beams is to gather sensing information in the form of receive combining gain. This information is used to cluster those UEs 130, developing a rough sense of their distribution in the environment. Here, the sensing beams are reflected through the RIS 120 instead of sending from the BS 110 directly to the UEs 130. The BS 110 listens to the RSSI feedback reported from UEs 130 during the beam training stage and accumulates the received power vectors. Once enough beam training power vectors are accumulated, the clustering 500 can be executed to train a k-means classifier to group users. It is worth noting that a newly deployed RIS 120 might rely on a random reflection codebook or a pre-defined codebook to serve the user. Any embodiment may perform a RIS Partitioning and/or Cascaded Learning, e.g. according to the following substeps 202-1, 204-1 (i.e., a first stage) as well as 202-2 and 204-2 (as a second stage) for the determining of each of the second sets 122. Upon user clustering, the RIS reflection codebook design of clusters is independently learned by B DRL agents (DRL 1 , …, DRL B ). Nonetheless, the large number of RIS elements still renders the task of learning a single reflection vector highly complex and time-consuming. Therefore, the RIS array may be (virtually) partitioned into multiple sub-arrays and develop a cascaded DRL learning approach to lower the computational complexity. Fig.6 schematically illustrates the partitioning (in the lower left) and the two stages. The cascaded approach proceeds with the following steps: (1) learning the RIS reflection of a small RIS sub-array in the substeps 202-1 and 204-1; (2) extending the learned reflection sub-array to the full-sized array in the substeps 201-2 and 204-2, and optionally: Telefonaktiebolaget LM Ericsson (publ) 25 / 45 P105713WO01 (3) refining the learning to obtain the entire reflection vector of the whole RIS surface. We also define which denotes the bi-level reward determined at the BS which is based on the beamforming gain relying on the current chosen RIS reflection vectors and the previous i.e., (t). That is, ^^^^( ^^^^) = +1 if > , ^^^^( ^^^^) = −1 otherwise. Since the BS that receives the it only needs to feedback to the RIS controller the reward value . Hence, the RIS controller of the RIS 120 knows whether the chosen learned RIS reflection vector is rewarding or not and may update the DRL agent accordingly as illustrated in Fig.5. The BS 110 (or the first agent) may determine in the step 204 its precoder and/or combiner weights 112 using the first DRL agent. Fig.5 schematically illustrates an embodiment of the step of UE clustering. The sensing beams (e.g., the reflection weights 120 of the sensing beams) may be random sensing beams to cluster similar channels (e.g., UEs 130). The RIS (or the B second agents) may determine the B second sets (e.g., a RIS reflection codebook) using B DRL agents. Furthermore, we use the cascaded DRL agents (according to the two stages) that consecutively learn the phases in two stages as shown in Fig.6: The first stage is executed where the first DRL agent learns the phases of the first sub-array (only the first RIS partition is turned ON) while keeping the rest of RIS elements OFF. In the second stage, all RIS elements are activated and the second DRL agent learns the partitions' phase shifts to form the phases of the full dimensional array. Consequently, the size of the searching space is significantly decreased due to the RIS partitioning which helps the algorithm to converge faster. Fig.6 schematically illustrates a two-stage implementation for the determining of the reflection weights 122, e.g., cascaded RIS reflection learning stages 204-1 and 204-2. The determining 204 of the at least one second set of reflector weights comprises determining 204-1 the reflector weights for one contiguous partition of reflector Telefonaktiebolaget LM Ericsson (publ) 26 / 45 P105713WO01 elements of the reconfigurable reflector in a first stage and determining 204-2 partition weights for multiple partitions of the reflector elements in a second stage. Reflector weights of all reflector elements of the reconfigurable reflector result from multiplying the reflector weights for the one contiguous partition with the partition weights for the multiple partitions. Any embodiment may comprise at least one of the following features on RIS technology. This section describes potential variations of the underlying RIS technology of the RIS 120. The technique is not limited to any manufacturing technology or any specific type of RIS, which can be flexibly designed to fulfill different functions on various operational frequencies (e.g., microwave, mmWave, terahertz, etc.). The existence and/or design considerations of RIS layers lead into different RIS categories, which are discussed below: Active RIS: The active RIS can serve as a transmitter, receiver, and reflector. The term ‘active’ refers to the power consumed for emitting RF signals in the transmitter mode or amplifying impinging RF signals towards a predetermined direction in the reflector mode. The layered structure of an active RIS is pictured in Fig.7, where the outermost layer is manufactured by using a metamaterial to achieve the desired radiation pattern through the intervention of a sensor and actuator layer. The metamaterial consists of meta particles (a.k.a. atoms), a group of which is called an RIS element. The number of elements in vertical and horizontal directions are denoted by N v and N h , which yields N= N v xN h total number of elements. The RIS can be partitioned into RIS block each with K= K v xK h elements. As shown in Fig. 7, the atoms consist of metallic patches (whose shape is specifically designed based on phase shift requirements), a central continuous strip, and varactor diodes. By constantly and independently adjusting the voltage pattern of varactors, the EM properties of the continuous RIS can be dynamically controlled. On the contrary, discrete RIS control the EM properties by turning on and off the diodes at each atom. The on-off pattern of the diodes determines the overall EM properties of the discrete RISs. The number of diodes used for each element determines the phase quantization, for instance, an element with q diodes yields 2 q phase shifts. Telefonaktiebolaget LM Ericsson (publ) 27 / 45 P105713WO01 An RIS controller orchestrates the sensors and actuators through a control layer. The RIS controller is responsible for running specific calculations and algorithms to obtain control layers configuration, which yields the desired EM behavior. Although the RIS controller manages the RIS independently, it may receive objective and constraint sets from a central unit placed locally in the BS or remotely in a cloud. The RIS controller can also coordinate with the wireless modem, especially to fulfill various necessary physical layer tasks (e.g., channel training, channel sounding, and channel estimation). The metamaterial layer and sensor-actuator layer are jointly manufactured in general and isolated from the bottom layers' electromagnetic behaviors utilizing a shield material. In transceiver mode, the RF signal is transmitted and received through an RF wave distribution network built on the communication interface layer. In the transmitter mode, for example, the reference wave fed by the wireless modem into the distribution network is transformed into an object wave towards the desired direction, which is possible by changing the surface pattern accordingly. The obtaining 202 of the relative change 150 of the received signal strength comprises receiving a or the feedback from the radio devices 130. The feedback is indicative of whether the received signal strength has increased or decreased responsive to a or the changing of the first set and/or the at least one second set. The feedback may be binary-valued, i.e., the feedback may be only indicative of either an increase or a decrease of as the relative change. The feedback may be received for each of the radio devices 130 or for each of the groups of radio devices 130. The first set and/or the at least one second set may be changed at a sequence of points in time, e.g. periodically. The feedback may be indicative of whether or not the received signal strength has increased, e.g. compared to the previous point in time or compared to a running average of multiple previous points in time. Fig.7 schematically illustrates an embodiment of the Reconfigurable Intelligent Surface (RIS) 120. Telefonaktiebolaget LM Ericsson (publ) 28 / 45 P105713WO01 Passive RIS: The passive RIS does not need a communication interface layer as it basically behaves as reflector, scatterer, absorber, or polarizer, based on the underlying RIS configuration pattern. The main advantage of passive RIS is their power consumption is just at the level of USB and PoE. In this respect, passive RIS is a power-and-cost-efficient way of controlling the wireless environment in a full- duplex mode. Based on the RIS classification, our invention is applicable to both active and passive RIS, as well as to their hybridization (a.k.a. semi-passive) to other technologies in the future. DRL Based Beam Pattern Design: DRL is a computational approach to learning from interaction, i.e., how to map situations to actions that maximize a numerical reward function. Trial-and-error search and reward calculation are the two most important distinguishing features of DRL. Hence, we explore the twin delayed deep deterministic policy gradient (TD3)-DRL to solve our joint beamforming and codebook design problem. As shown in Figure 8, TD3 comprises three deep neural networks (DNNs), a single actor-network and two critic networks. The actor- network takes the state as input and outputs a continuous proto-action. Since the proto-actions do not necessarily comply with available phase quantization levels, the quantizer map them into the corresponding quantized phase shifts. After that, the state and action are passed together to the critic networks. Since the critic network can overestimate the true Q-value, TD3 selects the minimum of two estimates coming from two critic networks to limit the bias on Q-value estimates. The actor-network is used to approximate the action. To ensure computational stability and avoid training divergence, the actor and critic networks have duplicates, referred to as the target actor and target critic networks. They are not trainable like the actor and critic networks, but they are utilized for calculating the targets. Despite them being not trainable, the parameters of the target actor and critic networks are updated in a soft-manner to slowly track the original networks. Fig.8 Illustration of TD3 DRL agent for the beam pattern design. Fig.9 shows a flowchart of an embodiment of the multi-agent learning mode, i.e. the steps 204 (including the step 202). Fig.10 shows a flowchart of an embodiment of the deployment mode, i.e. the steps 206 and/or 208 of deploying the determined beamforming and reflection weights 112 and 122, respectively. Telefonaktiebolaget LM Ericsson (publ) 29 / 45 P105713WO01 The method 200 may further comprise deploying 206 the first set for precoding a data transmission and/or combining a data reception at the antenna array. Alternatively or in addition, the method 200 may comprise deploying 208 the at least one second set for reflecting a data transmission or a data reception at the reflector array. Optimally the one of B second sets is selected for one of the radio devices 130 based on a maximum received signal strength in the radio communication with the one of the radio devices 130. Herein, deploying the first and/or second sets may encompass applying the respective weights to the antenna array and/or the reflector array. Fig.11 schematically illustrates an embodiment of the RAN 300 in the deployment mode (e.g., RIS beam steering) Flowcharts of embodiments of the method 200 for the multi-agent training and deployment modes are presented in Figs.8, 9, and 10. Moreover, Fig.11 shows how the beam training (or steering) works in the deployment mode, in which the RIS would steer along the direction of the reflection vectors found in the designed codebook and the UE would select the best one and report it to the BS. One important use case of the technique is application of the MA-DRL approach to a cell-free mMIMO network which depends on centralized RAN (CRAN) technology. If each RIS partition is considered as the antenna array at the BS, the invention can be extended to collaboratively learn how to serve clusters. In this case, the DRL agents can even be located at the cloud to offload the computational complexity from the BS. Another important use case is deploying passive RIS instead of deploying power-hungry BSs and orchestrate a RIS-assisted cell-free mMIMO network where CRAN facilitates learning how to coordinate BSs and RIS to serve user clusters. The proposed approach can also be implemented using federated learning approach to distribute learning tasks efficiently. The subject technique results in energy improvements, namely at the level of Node equipment, network level, and society (i.e., even if the energy improvement is small, the accumulated improvement on a global level is substantial). Telefonaktiebolaget LM Ericsson (publ) 30 / 45 P105713WO01 Fig.12 shows a schematic block diagram for an embodiment of the device 100. The device 100 comprises processing circuitry, e.g., one or more processors 1204 for performing the method 200 and memory 1206 coupled to the processors 1204. For example, the memory 1206 may be encoded with instructions that implement at least one of the modules 102 and 104. The one or more processors 1204 may be a combination of one or more of a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, or any other suitable computing device, resource, or combination of hardware, microcode and/or encoded logic operable to provide, either alone or in conjunction with other components of the device 100, such as the memory 1206, beamforming transceiver (e.g., MIMO base station) functionality. For example, the one or more processors 1204 may execute instructions stored in the memory 1206. Such functionality may include providing various features and steps discussed herein, including any of the benefits disclosed herein. The expression "the device being operative to perform an action" may denote the device 100 being configured to perform the action. As schematically illustrated in Fig.12, the device 100 may be embodied by a base station 1200, e.g., functioning as a transmitting or receiving base station or a relay UE. The base station 1200 comprises a radio interface 1202 coupled to the device 100 for radio communication with one or more other stations, e.g., functioning as the radio devices (e.g., UEs). Fig.13 shows a schematic block diagram for an embodiment of the device 100. The device 100 comprises processing circuitry, e.g., one or more processors 1304 for performing the method 200 and memory 1306 coupled to the processors 1304. For example, the memory 1306 may be encoded with instructions that implement at least one of the modules 102 and 104. The one or more processors 1304 may be a combination of one or more of a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, or any other suitable computing device, resource, or combination of hardware, microcode and/or encoded logic operable to provide, either alone or in Telefonaktiebolaget LM Ericsson (publ) 31 / 45 P105713WO01 conjunction with other components of the device 200, such as the memory 1306, reconfigurable reflector (e.g., RIS) functionality. For example, the one or more processors 1304 may execute instructions stored in the memory 1306. Such functionality may include providing various features and steps discussed herein, including any of the benefits disclosed herein. The expression "the device being operative to perform an action" may denote the device 100 being configured to perform the action. As schematically illustrated in Fig.13, the device 100 may be embodied by a receiving station 1300, e.g., functioning as a reconfigurable reflector 120. The reconfigurable reflector 1300 comprises a radio interface 1302 coupled to the device 100 for radio communication with one or more other stations, e.g., functioning as the beamforming transceiver 110 and/or the UEs 130. Fig.14 shows a schematic block diagram for an embodiment of the device 100. The device 100 comprises processing circuitry, e.g., one or more processors 1404 for performing the method 200 and memory 1406 coupled to the processors 1404. For example, the memory 1406 may be encoded with instructions that implement at least one of the modules 102 and 104. The one or more processors 1404 may be a combination of one or more of a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, or any other suitable computing device, resource, or combination of hardware, microcode and/or encoded logic operable to provide, either alone or in conjunction with other components of the device 100, such as the memory 1406, (e.g., core) network node or cloud computing or edge computing functionality. For example, the one or more processors 1404 may execute instructions stored in the memory 1406. Such functionality may include providing various features and steps discussed herein, including any of the benefits disclosed herein. The expression "the device being operative to perform an action" may denote the device 200 being configured to perform the action. As schematically illustrated in Fig.14, the device 100 may be embodied by at least one of (e.g., core) network node 1400, a cloud computing node 1400, and an edge computing node 1400. The node 1400 comprises a radio interface 1402 coupled to Telefonaktiebolaget LM Ericsson (publ) 32 / 45 P105713WO01 the device 100 for radio communication with one or more other stations, e.g., functioning as the beamforming transceiver 110 and/or the UEs 130. With reference to Fig.15, in accordance with an embodiment, a communication system 1500 includes a telecommunication network 1510, such as a 3GPP-type cellular network, which comprises an access network 1511, such as a radio access network, and a core network 1514. The access network 1511 comprises a plurality of base stations 1512a, 1512b, 1512c, such as NBs, eNBs, gNBs or other types of wireless access points, each defining a corresponding coverage area 1513a, 1513b, 1513c. Each base station 1512a, 1512b, 1512c is connectable to the core network 1514 over a wired or wireless connection 1515. A first user equipment (UE) 1591 located in coverage area 1513c is configured to wirelessly connect to, or be paged by, the corresponding base station 1512c. A second UE 1592 in coverage area 1513a is wirelessly connectable to the corresponding base station 1512a. While a plurality of UEs 1591, 1592 are illustrated in this example, the disclosed embodiments are equally applicable to a situation where a sole UE is in the coverage area or where a sole UE is connecting to the corresponding base station 1512. Any of the base stations 1512 may embody the device 100. The telecommunication network 1510 is itself connected to a host computer 1530, which may be embodied in the hardware and/or software of a standalone server, a cloud-implemented server, a distributed server or as processing resources in a server farm. The host computer 1530 may be under the ownership or control of a service provider, or may be operated by the service provider or on behalf of the service provider. The connections 1521, 1522 between the telecommunication network 1510 and the host computer 1530 may extend directly from the core network 1514 to the host computer 1530 or may go via an optional intermediate network 1520. The intermediate network 1520 may be one of, or a combination of more than one of, a public, private or hosted network; the intermediate network 1520, if any, may be a backbone network or the Internet; in particular, the intermediate network 1520 may comprise two or more sub-networks (not shown). The communication system 1500 of Fig.15 as a whole enables connectivity between one of the connected UEs 1591, 1592 and the host computer 1530. The connectivity may be described as an over-the-top (OTT) connection 1550. The host Telefonaktiebolaget LM Ericsson (publ) 33 / 45 P105713WO01 computer 1530 and the connected UEs 1591, 1592 are configured to communicate data and/or signaling via the OTT connection 1550, using the access network 1511, the core network 1514, any intermediate network 1520 and possible further infrastructure (not shown) as intermediaries. The OTT connection 1550 may be transparent in the sense that the participating communication devices through which the OTT connection 1550 passes are unaware of routing of uplink and downlink communications. For example, a base station 1512 need not be informed about the past routing of an incoming downlink communication with data originating from a host computer 1530 to be forwarded (e.g., handed over) to a connected UE 1591. Similarly, the base station 1512 need not be aware of the future routing of an outgoing uplink communication originating from the UE 1591 towards the host computer 1530. By virtue of the method 200 being performed by any one of the base stations 1512, the performance or range of the OTT connection 1550 can be improved, e.g., in terms of increased throughput and/or reduced latency. More specifically, the host computer 1530 may indicate to the RAN 300 or the device 100 (e.g., on an application layer) the QoS of the traffic as a trigger to perform the method 200. Example implementations, in accordance with an embodiment of the UE, base station and host computer discussed in the preceding paragraphs, will now be described with reference to Fig.16. In a communication system 1600, a host computer 1610 comprises hardware 1615 including a communication interface 1616 configured to set up and maintain a wired or wireless connection with an interface of a different communication device of the communication system 1600. The host computer 1610 further comprises processing circuitry 1618, which may have storage and/or processing capabilities. In particular, the processing circuitry 1618 may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The host computer 1610 further comprises software 1611, which is stored in or accessible by the host computer 1610 and executable by the processing circuitry 1618. The software 1611 includes a host application 1612. The host application 1612 may be operable to provide a service to a remote user, such as a UE 1630 connecting via an OTT connection 1650 terminating at the UE 1630 and the host computer 1610. In providing the service to the remote user, the host application 1612 may provide user data, which is transmitted using the OTT connection 1650. The user data may depend on the Telefonaktiebolaget LM Ericsson (publ) 34 / 45 P105713WO01 location of the UE 1630. The user data may comprise auxiliary information or precision advertisements (also: ads) delivered to the UE 1630. The location may be reported by the UE 1630 to the host computer, e.g., using the OTT connection 1650, and/or by the base station 1620, e.g., using a connection 1660. The communication system 1600 further includes a base station 1620 provided in a telecommunication system and comprising hardware 1625 enabling it to communicate with the host computer 1610 and with the UE 1630. The hardware 1625 may include a communication interface 1626 for setting up and maintaining a wired or wireless connection with an interface of a different communication device of the communication system 1600, as well as a radio interface 1627 for setting up and maintaining at least a wireless connection 1670 with a UE 1630 located in a coverage area (not shown in Fig.16) served by the base station 1620. The communication interface 1626 may be configured to facilitate a connection 1660 to the host computer 1610. The connection 1660 may be direct, or it may pass through a core network (not shown in Fig.16) of the telecommunication system and/or through one or more intermediate networks outside the telecommunication system. In the embodiment shown, the hardware 1625 of the base station 1620 further includes processing circuitry 1628, which may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The base station 1620 further has software 1621 stored internally or accessible via an external connection. The communication system 1600 further includes the UE 1630 already referred to. Its hardware 1635 may include a radio interface 1637 configured to set up and maintain a wireless connection 1670 with a base station serving a coverage area in which the UE 1630 is currently located. The hardware 1635 of the UE 1630 further includes processing circuitry 1638, which may comprise one or more programmable processors, application-specific integrated circuits, field programmable gate arrays or combinations of these (not shown) adapted to execute instructions. The UE 1630 further comprises software 1631, which is stored in or accessible by the UE 1630 and executable by the processing circuitry 1638. The software 1631 includes a client application 1632. The client application 1632 may be operable to provide a service to a human or non-human user via the UE 1630, with the support of the host computer 1610. In the host computer 1610, an executing host application 1612 may communicate with the executing client Telefonaktiebolaget LM Ericsson (publ) 35 / 45 P105713WO01 application 1632 via the OTT connection 1650 terminating at the UE 1630 and the host computer 1610. In providing the service to the user, the client application 1632 may receive request data from the host application 1612 and provide user data in response to the request data. The OTT connection 1650 may transfer both the request data and the user data. The client application 1632 may interact with the user to generate the user data that it provides. It is noted that the host computer 1610, base station 1620 and UE 1630 illustrated in Fig.16 may be identical to the host computer 1530, one of the base stations 1512a, 1512b, 1512c and one of the UEs 1591, 1592 of Fig.15, respectively. This is to say, the inner workings of these entities may be as shown in Fig.16, and, independently, the surrounding network topology may be that of Fig.15. In Fig.16, the OTT connection 1650 has been drawn abstractly to illustrate the communication between the host computer 1610 and the UE 1630 via the base station 1620, without explicit reference to any intermediary devices and the precise routing of messages via these devices. Network infrastructure may determine the routing, which it may be configured to hide from the UE 1630 or from the service provider operating the host computer 1610, or both. While the OTT connection 1650 is active, the network infrastructure may further take decisions by which it dynamically changes the routing (e.g., on the basis of load balancing consideration or reconfiguration of the network). The wireless connection 1670 between the UE 1630 and the base station 1620 is in accordance with the teachings of the embodiments described throughout this disclosure. One or more of the various embodiments improve the performance of OTT services provided to the UE 1630 using the OTT connection 1650, in which the wireless connection 1670 forms the last segment. More precisely, the teachings of these embodiments may reduce the latency and improve the data rate and thereby provide benefits such as better responsiveness and improved QoS. A measurement procedure may be provided for the purpose of monitoring data rate, latency, QoS and other factors on which the one or more embodiments improve. There may further be an optional network functionality for reconfiguring the OTT connection 1650 between the host computer 1610 and UE 1630, in response to variations in the measurement results. The measurement procedure and/or the network functionality for reconfiguring the OTT connection 1650 may Telefonaktiebolaget LM Ericsson (publ) 36 / 45 P105713WO01 be implemented in the software 1611 of the host computer 1610 or in the software 1631 of the UE 1630, or both. In embodiments, sensors (not shown) may be deployed in or in association with communication devices through which the OTT connection 1650 passes; the sensors may participate in the measurement procedure by supplying values of the monitored quantities exemplified above, or supplying values of other physical quantities from which software 1611, 1631 may compute or estimate the monitored quantities. The reconfiguring of the OTT connection 1650 may include message format, retransmission settings, preferred routing etc.; the reconfiguring need not affect the base station 1620, and it may be unknown or imperceptible to the base station 1620. Such procedures and functionalities may be known and practiced in the art. In certain embodiments, measurements may involve proprietary UE signaling facilitating the host computer’s 1610 measurements of throughput, propagation times, latency and the like. The measurements may be implemented in that the software 1611, 1631 causes messages to be transmitted, in particular empty or "dummy" messages, using the OTT connection 1650 while it monitors propagation times, errors etc. Fig.17 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment. The communication system includes a host computer, a base station and a UE which may be those described with reference to Figs.15 and 16. For simplicity of the present disclosure, only drawing references to Fig.17 will be included in this paragraph. In a first step 1710 of the method, the host computer provides user data. In an optional substep 1711 of the first step 1710, the host computer provides the user data by executing a host application. In a second step 1720, the host computer initiates a transmission carrying the user data to the UE. In an optional third step 1730, the base station transmits to the UE the user data which was carried in the transmission that the host computer initiated, in accordance with the teachings of the embodiments described throughout this disclosure. In an optional fourth step 1740, the UE executes a client application associated with the host application executed by the host computer. Fig.18 is a flowchart illustrating a method implemented in a communication system, in accordance with one embodiment. The communication system includes a host computer, a base station and a UE which may be those described with reference to Figs.15 and 16. For simplicity of the present disclosure, only drawing references to Fig.18 will be included in this paragraph. In a first step 1810 of the Telefonaktiebolaget LM Ericsson (publ) 37 / 45 P105713WO01 method, the host computer provides user data. In an optional substep (not shown) the host computer provides the user data by executing a host application. In a second step 1820, the host computer initiates a transmission carrying the user data to the UE. The transmission may pass via the base station, in accordance with the teachings of the embodiments described throughout this disclosure. In an optional third step 1830, the UE receives the user data carried in the transmission. As has become apparent from above description, at least some embodiments of the technique can avoid (as opposed to existing technologies) depending on (i.e., requiring for determining the weights) at least one or each of (a) channel state information (CSI), (b) predefined generic codebooks, and (c) UE locations. Same or further embodiments of the technique use multi-agent RL (e.g., MA- DRL), which can allow site-specific beamforming (e.g., based on the determined first set) and reflection codebook design (i.e., the at least one second set) without requiring any CSI and/or UE location information. This is possible through a ±1 feedback from UEs through a long-term observation. Same or further embodiments can determine codebooks (i.e., second sets) of a much lower codebook size then traditional DFT codebooks, which substantially reduces the beam training (i.e., beam steering) overhead in the deployment since the BS 110 and UE 130 are assigned the best beam according to a clustering. Moreover, embodiments of the technique (e.g., using the MA-DRL) are not constrained by the channel coherence time, since the first and second agents (e.g., DRL agents) can perform the learning (e.g., adjust their decision to choose phases) solely based on the relative change of the received signal strength (e.g., the UE feedback). Many advantages of the present invention will be fully understood from the foregoing description, and it will be apparent that various changes may be made in the form, construction and arrangement of the units and devices without departing from the scope of the invention and/or without sacrificing all of its advantages. Since the invention can be varied in many ways, it will be recognized that the invention should be limited only by the scope of the following embodiments.