Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
NON-LINEAR ADAPTIVE NEURAL NETWORK EQUALIZER IN OPTICAL COMMUNICATION
Document Type and Number:
WIPO Patent Application WO/2019/191099
Kind Code:
A1
Abstract:
We propose and validate a novel nonlinear artificial neural network (ANN) equalizer for PAM-8 transmission in IM/DD system. Mini-batch gradient descent is introduced to efficiently train ANN equalizer. Using the proposed ANN equalizer, we successfully transmit a 40Gbaud PAM-8 signal over 4-km SMF with BER under the threshold of 3.8 x 10-3 and over 10-km SMF with BER under the threshold of 1 χ 10-2. We also elaborately compare the proposed ANN equalizer with other methods including LMS equalizer, Volterra equalizer and look-up table (LUT). Experimental results indicate that ANN achieves the best performance that is slightly superior to Volterra equalizer with computational complexity exponentially reduced. To the best of our knowledge, this is the first time to adopt mini-batch gradient descent to train ANN equalizer in IM/DD system.

Inventors:
YU JIANJUN (US)
XIAO XIN (US)
ZHANG YUN (CN)
Application Number:
PCT/US2019/024076
Publication Date:
October 03, 2019
Filing Date:
March 26, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ZTE CORP (CN)
YU JIANJUN (US)
XIAO XIN (US)
ZHANG YUN (CN)
International Classes:
H04B10/516; G06N3/08; G06N20/00; H04B10/54; H04B14/02
Foreign References:
US8107826B22012-01-31
US20120195602A12012-08-02
US20160087747A12016-03-24
Other References:
MUTSAM A. JARAJREH ET AL.: "Artificial Neural Network Nonlinear Equalizer for Coherent Optical OFDM", IEEE PHOTONICS TECHNOLOGY LETTERS, vol. 27, no. 4, 15 February 2015 (2015-02-15), pages 387 - 390, XP011570917, doi:10.1109/LPT.2014.2375960
CHUNKAI ZHANG ET AL.: "An ANN' s Evolved by a New Evolutionary System and Its Application", PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, vol. 4, December 2000 (2000-12-01), XP010536603
Attorney, Agent or Firm:
SATHE, Vinay (US)
Download PDF:
Claims:
CLAIMS

What is claimed is what is described and illustrated, including:

1. A method of optical communication, comprising:

receiving an optical signal that includes information bits modulated using an intensity modulation scheme;

performing front end processing on the optical signal to obtain a digital signal in an electrical domain;

equalizing, during a validation phase, the digital signal using an equalizer stage in which equalization is performed using an artificial neural network based equalizer; and

extracting the information bits from an output of the equalizer stage.

2. The method of claim 1, wherein the artificial neural network was trained , during a training phase, using a mean square error criterion using a machine learning algorithm, wherein the training phase and the validation phase are temporally non-overlapping.

3. The method of claim 2, wherein the machine learning algorithm includes:

a forward propagation step in which an output signal estimate is calculated from a number of samples of the digital signal using a weight vector, and a cost function is computed based on a sum of squares of errors between expected values of the number of samples of the digital signal and the output signal estimate; and

a back-propagation step in which a gradient descent is adopted to optimize the cost function as a function of the weight vector, and wherein a batch of k samples from the digital signal are used for the error propagation step, wherein k is an integer that is greater than 1 and less than the number of samples.

4. The method of claim 2, wherein the artificial neural network was trained using an activation function that has an unbiased distribution characteristic.

5. The method of claim 4, wherein the activation function is a hyperbolic tangent tanh function.

6. The method of claim 1, wherein the performing the front end processing includes generating a waveform in the electrical domain by converting the optical signal using a photodiode element.

7. The method of claim 6, further including:

sampling the waveform in the electrical domain to produce the digital signal in the electrical domain.

8. The method of claim 1, wherein the performing the front end processing includes: synchronizing the digital signal based on an estimate of carrier frequency in the optical signal.

9. The method of claim 1, wherein the intensity modulation scheme comprises a pulse amplitude modulation scheme.

10. The method of claim 9, wherein the extracting the information bits includes performing pulse amplitude demodulation on the output of the equalizer stage.

11. An optical communication apparatus, comprising:

an optical receiver configured to receive an optical signal that includes information bits modulated using an intensity modulation scheme;

a photodiode to convert the optical signal into an electrical signal;

an analog to digital converter configured to convert the electrical signal into a digital signal;

a processor configured to equalize, during a validation phase, the digital signal using an equalizer stage in which equalization is performed using an artificial neural network based equalizer; and

extract the information bits from an output of the equalizer stage.

12. The apparatus of claim 11, wherein the artificial neural network was trained, during a training phase, using a mean square error criterion using a machine learning algorithm, wherein the training phase and the validation phase are temporally non-overlapping.

13. The apparatus of claim 12, wherein the machine learning algorithm includes:

a forward propagation step in which an output signal estimate is calculated from a number of samples of the digital signal using a weight vector, and a cost function is computed based on a sum of squares of errors between expected values of the number of samples of the digital signal and the output signal estimate; and

a back-propagation step in which a gradient descent is adopted to optimize the cost function as a function of the weight vector, and wherein a batch of k samples from the digital signal are used for the error propagation step, wherein k is an integer that is greater than 1 and less than the number of samples.

14. The apparatus of claim 11, further including:

a synchronizer configured to synchronize the digital signal based on an estimate of carrier frequency in the optical signal.

15. The apparatus of claim 11, wherein the intensity modulation scheme comprises a pulse amplitude modulation scheme.

16. The apparatus of claim 15, wherein the extracting the information bits includes performing pulse amplitude demodulation on the output of the equalizer stage.

17. The apparatus of claim 12, wherein the artificial neural network uses, for computing hidden nodes, an activation function that exhibits a distribution characteristic that in unbiased with respect to an operand of the activation function.

18. An optical receiver apparatus comprising a processor configured to: receive a digital signal that includes an information signal modulated using an intensity modulation scheme;

equalize, during a validation phase, the digital signal using an equalizer stage in which equalization is performed using an artificial neural network based equalizer; and

extract the information bits from an output of the equalizer stage.

19. The apparatus of claim 18, wherein the artificial neural network was trained, during a training phase, using a mean square error criterion using a machine learning algorithm, wherein the training phase and the validation phase are temporally non-overlapping.

20. The apparatus of claim 19, wherein the machine learning algorithm includes:

a forward propagation step in which an output signal estimate is calculated from a number of samples of the digital signal using a weight vector, and a cost function is computed based on a sum of squares of errors between expected values of the number of samples of the digital signal and the output signal estimate; and

a back-propagation step in which a gradient descent is adopted to optimize the cost function as a function of the weight vector, and wherein a batch of k samples from the digital signal are used for the error propagation step, wherein k is an integer that is greater than 1 and less than the number of samples.

21. The apparatus of claim 18, wherein the performing the front end processing includes: synchronizing the digital signal based on an estimate of carrier frequency in the optical signal.

22. The apparatus of claim 18, wherein the intensity modulation scheme comprises a pulse amplitude modulation scheme.

23. The apparatus of claim 22, wherein the extracting the information bits includes performing pulse amplitude demodulation on the output of the equalizer stage.

Description:
NON-LINEAR ADAPTIVE NEURAL NETWORK EQUALIZER IN

OPTICAL COMMUNICATION

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This patent document claims the benefit of priority of International Application No. PCT/CN2018/080504, filed on March 26, 2018. The entire content of the above referenced international patent applications is incorporated by reference as a part of this patent document.

TECHNICAL FIELD

[0002] This patent document relates to digital communication, and, in one aspect, optical communication systems that use pulse amplitude modulation.

BACKGROUND

[0003] There is an ever-growing demand for data communication in application areas such as wireless communication, fiber optic communication and so on. The demand on core and access networks are all growing higher because not only are user devices such as smartphones and computers using more and more bandwidth due to multimedia applications, but also the total number of devices for which data is carried over the whole network is increasing. For profitability and to meet increasing demand, equipment manufacturers and network operators are continually looking for ways in which operational and capital expenditure can be reduced.

SUMMARY

[0004] The present document discloses techniques that can be implemented by optical receivers for achieving high throughput reception of intensity modulated optical signals.

[0005] In one example aspect, an example method of optical communication is disclosed.

The method includes receiving an optical signal that includes information bits modulated using an intensity modulation scheme, performing front end processing on the optical signal to obtain a digital signal in an electrical domain, equalizing, during a validation phase, the digital signal using an equalizer stage in which equalization is performed using an artificial neural network based equalizer, and extracting the information bits from an output of the equalizer stage.

[0006] In yet another aspect, an optical communication receiver apparatus is disclosed. The apparatus includes an optical receiver configured to receive an optical signal that includes information bits modulated using an intensity modulation scheme, a photodiode to convert the optical signal into an electrical signal, an analog to digital converter configured to convert the electrical signal into a digital signal, and a processor configured to equalize, during a validation phase, the digital signal using an equalizer stage in which equalization is performed using an artificial neural network based equalizer, and extract the information bits from an output of the equalizer stage.

[0007] In yet another example aspect, an optical communication apparatus comprising a processor is disclosed. The processor is configured to receive a digital signal that includes an information signal modulated using an intensity modulation scheme, equalize, during a validation phase, the digital signal using an equalizer stage in which equalization is performed using an artificial neural network based equalizer, and extract the information bits from an output of the equalizer stage.

[0008] These, and other aspects, are disclosed in the present document.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] FIG. 1 shows an example of an equalizer with one hidden layer.

[0010] FIG. 2 shows an example embodiment for intensity modulated direct decision

(IM/DD) 40GBaud PAM-8 over 2-10km SMF with detailed DSP block.

[0011] FIG. 3 shows example performance of an embodiment of the proposed ANN equalizer (a) BER versus batch size, (b) MSE (mean square error) versus numbers of iteration with training set and validation set, (c) PAM8 constellation with ANN as BER=l.4x 10 3 , (d) PAM8 constellation with Volterra as BER=2.1 / 10 3 .

[0012] FIG. 4 shows the measured BER versus received optical power (dBm) with different algorithms at BTB (back to back) case: (a) with LETT involved, (b) without LUT involved, for an example embodiment.

[0013] FIG. 5 shows the measured BER versus received optical power (dBm) with different algorithms for examples of (a) at 2-km case, (b) at 4-km and lO-km case embodiments.

[0014] FIG. 6 is a flowchart for an example method of optical communication.

[0015] FIG. 7 is a block diagram of an example optical communication network.

[0016] FIG. 8 is a block diagram of an example implementation of an optical transmitter or receiver apparatus. DETAILED DESCRIPTION

[0017] The advent of cloud computing and big data applications has provided continuous thrust for short reach optical communication with increasing bandwidth and low cost operation. Among the diverse potential technological solutions for these types of applications, intensity modulation and direct detection (IM/DD) systems have aroused much attention for their low cost and low complexity in an optical access system. Meanwhile, several solutions based on advanced modulation format that achieve rates beyond lOO-Gb/s per lane have been reported, such as pulse amplitude modulations (PAM), carrier-less amplitude phase (CAP) modulation and discrete multi-tone (DMT). Among them, PAM is quite promising due to its simple configuration and low power consumption. However, the nonlinear impairments introduced by electro-optical components in transmitter and receiver side in IM/DD system become the major barrier that limits the system performance.

[0018] Many schemes based on digital signal processing (DSP) have been reported to address these issues. Volterra series transfer function (VSTF) and digital back-propagation (DBP) with inverse fiber parameters are common methods to compensate nonlinear channel impairment. However, the computational complexity of these methods is exorbitant due to a high number of processing steps. Alternatively, look-up-table (LETT) pre-distortion and simplified Volterra-based equalizer have been proved effective to reduce the algorithm complexity in IM/DD systems. Machine learning is seen as a powerful technique for estimating parameters in noisy environment and recognizing complex mapping between input and output data. Some artificial neural network (ANN) equalizers have been proposed to solve the nonlinear problems that arise in the machine learning based parameter estimation. Some implementations may use a multi-level ANN equalizer for 16QAM and 64QAM in the 60GHz radio-over-fiber (RoF) system. In some implementations, an adaptive ANN equalizer is applied in millimeter wave RoF systems. In some implementations, a combined FIR and functional link ANN equalizer is used to compensate linear and nonlinear distortion in nonlinear channels. ANN-based equalizer has also been proposed to equalize fiber nonlinearity -induced penalty in coherent optical orthogonal frequency-division multiplexing (CO-OFDM) systems.

[0019] The techniques described in the present document can be used in embodiments in which a nonlinear ANN equalizer for an IM/DD systems such as a PAM-8 transmission. To accelerate the training speed and to obtain the converged results in relatively fewer iterations, a mini-batch gradient descent is initially introduced to train an ANN equalizer. With the aid of the ANN scheme, embodiments can be implemented to successfully transmit a 40Gbaud PAM-8 signal over 4-km SMF with BER under the threshold of 3.8 x 1 O 3 and over lO-km SMF with BER under the threshold of 1 x 10 2 . In order to further evaluate the nonlinear compensation

performance, the inventors compared the results the other popular methods including LMS equalizer, Volterra equalizer and look-up table (LETT) in contrast to the ANN equalizer.

Experimental results indicate that ANN achieves the best performance which is slightly superior to Volterra equalizer with computational complexity dramatically reduced. The document further discloses details of mini-batch gradient descent to train an ANN equalizer in IM/DD system.

[0020] In the rest of the document, section headings are used to improve readability and embodiments and techniques described in each section are not limited to the respective section, but could be combined with other embodiments and techniques described elsewhere in the document.

[0021] I. PRINCIPLE OF ANN EQUALIZER WITH MINI-BATCH GRADIENT DESCENT

[0022] ANN has great potential in learning intricate nonlinear mapping function between input and output data. Based on this intuition, we embark on adopting ANN to address nonlinear distortion problem in IM/DD system.

[0023] FIG. 1 shows a tap-delay ANN equalizer structure with one hidden layer. The ANN in this example is composed of an input layer, a hidden layer and an output layer with some internal neurons within each layer. The activation function in each neuron determines the nonlinear mapping characteristic across the network. In general, any continuous function can be approximated with a neural network with one hidden layer. In FIG. 1, the boxes“D” represent unit delay in the digital domain.

[0024] In some embodiments, the ANN equalizer may be trained using a mean-square-error (MSE) criterion with a back-propagation (BP) algorithm. The whole BP process may be carried out in two steps: a forward propagation step, and a back-propagation step.

[0025] The first step is forward propagation that calculates the output from its input feed.

The specific description is formulated as: where

X(«) = [x(«) x(n 1) . x(n M + 1)]

(2)

is the activation function, j s the input vector, is the output signal, W, (n) and

W 2 {n) are weight vector depicted in FIG. 1. The variable n is an integer representing an arbitrary time index.

[0026] The second step is back-propagation. The second step propagates the calculated error term from the output layer back to the input layer. The cost function of one training example is defined as:

the desired signal. The desired signal may, for example, be a reference signal that is used for training. Also the cost function of the whole training set may be represented as: where m is the total number of training set.

[0027] The gradient descent may be adopted to optimize the cost function with respect to W in) W (n)

1 ' and 2 ’ . Regarding how much data should embodiments use to calculate the cost function, the gradient descent can be divided into three variants: (1) batch gradient descent, (2) stochastic gradient descent, and (3) mini-batch gradient descent.

[0028] The batch gradient descent (BGD) computes the gradient of the cost function to the parameter for the entire training set. The weight vector of next moment can be obtained by dJ(X,d )

W {n + \) = W {n)

~ dw n) i = 1, 2

(8)

[0029] When computing the gradient of the whole training set to perform only one update step, such a scheme may be very slow to train the ANN when encountering a large dataset.

[0030] The stochastic gradient descent (SGD), in contrast, performs one parameter update for only one training example.

[0031] SGD is the basic idea behind the adaptive algorithm such as decision-directed least mean square (DDLMS) and cascaded multi-module algorithm (CMMA). It is usually much faster to run compared with BDG. However, SGD performs frequent update with a high variance that usually causes cost function to fluctuate dramatically. This usually complicates convergence speed to the exact minimum.

[0032] Mini -batch gradient descent (MGD) takes the trade-off between batch gradient descent and SGD and performs an update for every mini-batch of k training examples. [0033] In this way, MGD reduces the variance for each parameter update, which can lead to more stable convergence state. Therefore, as described further in next section, using MGD to an ANN nonlinear equalizer with mini-batch gradient descent provides superior results. The number k may be selected to be a number that is greater than 1 and less than m.

[0034] II. AN EXAMPLE EMBODIMENT

[0035] FIG. 2 shows an example setup for an IM/DD 40 GBaud PAM-8 transmission over 2- lOkm SMF with detailed DSP block. In various embodiments, the transmitter-side may be implemented at an optical transmitter and the receiver-side may be implemented at a

corresponding optical receiver. The following abbreviations are used in FIG. 2. DAC: digital to analog converter, EA: electrical amplifier, SMF: single mode fiber, VOA: variable optical attenuator, PD: photonic detector, EDFA: erbium doped fiber amplifier, OSC: oscilloscope.

[0036] At the transmitter side, the 40 GBaud PAM-8 signal is generated off-line and then uploaded into a high speed digital -to-analog converter (DAC) with 80-GSa/s sample rate and 20- GHz 3-dB bandwidth. In some embodiments, the PAM-8 signal may be generated in real-time and may modulate information bits for transmission over the optical link. The information bits may include, for example, application layer and user data such as digital audio, video, images, and other traffic.

[0037] In an experimental setup, a pseudorandom bit sequence is used for testing

performance of the system. The pseudorandom bit sequence (PRBS, 2 16 ) signal is first mapped into PAM-8 symbols, followed by nonlinear look-up table (LETT) pre-distortion. Then, a decision-directed least mean square (DDLMS) FIR pre-equalization is adopted to combat the DAC bandwidth limitation and nonlinear distortion. One external cavity laser (ECL) at l550nm is modulated by 40 GBaud electrical amplified (25-dB gain) PAM-8 signal via the Mach- Zehnder modulator (MZM) with 30 GHz 3-dB bandwidth for signal modulation.

[0038] The modulated PAM-8 signal is launched into the fiber link (single mode fiber SMF of length 2-lOkm) by amplifying through an Erbium-doped fiber amplifier (EDFA) to amplify the signal. Then a variable optical attenuator (VOA) is applied to adjust the received optical power for sensitivity measurement. In operational networks, the VOA may not be needed.

[0039] At the receiver, optical to electrical (O/E) conversion may be performed via a photonic detection (PD) with 3-dB bandwidth of 50 GHz. In an experimental setup, the converted electrical signal is sent into a real-time oscilloscope with l20-GSa/s sample rate and 45 GHz electrical bandwidth. In other optical receivers, an oscilloscope is not typically used. For off-line DSP process (receivers for real-time data reception may use real-time signal processing), the resampled signal is firstly synchronized. Then proposed ANN, Volterra and LMS equalizer are compared to analyze their performance. In order to further improve the BER performance, DD-LMS (decision directed least mean square) equalization may be added for precise decision. Finally, the signal is de-mapped and the BER for PAM-8 signals is calculated. It is worth noting that the pre-equalization based on DD-LMS FIR tap and LETT pre-distortion are initially achieved on BTB (back to back) case. The optical spectrum of modulated MZM is depicted as inset in FIG. 2. It will be appreciated by one of skill in the art that some of the above-described functional blocks are optional (e.g., non-linear distortion correction may be implemented using another technique, or simply be omitted) while other functional blocks may be further complemented by other techniques such as polarization domain multiplexing of optical signals, and so on.

[0040] III. EXPERIMENTAL RESULTS

[0041] In an example ANN equalizer scheme, the total PAM8 signals are divided temporally into a training set used in the training phase and a test set (or validation set) used in the validation phase. During the training phase, the ANN is trained with mini-batch gradient descent under the criterion of minimizing the mean square error (MSE). FIG. 3(a) (302) depicts the BER with various batch size in 40Gabud PAM8 IM/DD system. There are about 40,000 examples used as our training sequence. As we can conclude from the FIG. 3(a), the best performance can be achieved as l.4x 10 3 when batch size is 200. Batch gradient descent is also investigated for a trial, but it is too time-consuming to obtain the converged result when the entire training set is input to get only one gradient update. When it comes to SGD, the random fluctuation of cost function fails to obtain the reasonable results.

[0042] During test phase (validation phase), the MSE versus numbers of iteration is plotted with training set and validation set (test set) in FIG. 3(b) (304). The MSE of Volterra equalizer is also depicted in contrast. The tap number of Volterra equalizer is set as 150 for effective convergence of PAM8. In order to make a fair comparison, the input nodes and hidden nodes of the ANN equalizer are set as 150 and 40, respectively (e.g., values of M and N in FIG. 1). The activation function of ANN is tanh (hyperbolic tangent) for its unbiased distribution

characteristic. Other activation functions may be used, such as sigmoid function or a binary function, etc. After about 100 epochs of mini -batch gradient descent, MSE of both the training set and validation set is below Volterra equalizer. The PAM-8 constellations after ANN and Volterra equalizer with BER of 1.4x 10 3 and 2.1 x 10 3 are shown in FIG. 3(c) and FIG. 3(d) (306 and 3308), respectively.

[0043] As a further validation of the advantageous aspect of the proposed scheme, we compared the BER performance of 40GBaud PAM8 with LMS, second order Volterra equalizer, ANN equalizer and DD-LMS. When LETT is involved at the transmitter side, BER performances of all the mentioned algorithms are depicted in FIG .4(a). While LMS equalizer achieves the poorest BER performance, the Volterra equalizer as the nonlinear terms added in the structure, performs slightly better than LMS. The ANN equalizer can further reduce the BER to 4.3 x 10 3 when received optical power (ROP) is at 1.6 dBm. DD-LMS is adopted to improve PAM8 decision precision. When both using DD-LMS, ANN can obtain about l-dB receiver sensitivity improvement at BER of 3.8 x 10 3 (7% HD-FEC) compared with Volterra equalizer. When LUT is not involved shown in FIG. 4(b), it achieves the similar performance between ANN and Volterra equalizer. DD-LMS can further obtain about l-dB receiver sensitivity improvement at BER of 3.8x l0 3 .

[0044] We also validated the BER performances of above mentioned algorithms at 2-, 4- and 10- km cases. As for 2-km case whether LUT is adopted or not in FIG. 5(a), the ANN slightly outperforms Volterra varying the ROP. The BER gap between ANN and Volterra is closer when reducing the ROP. As shown in the upper graph in FIG. 5, at 4km, receiver embodiments can still obtain about 0.5-dB receiver sensitivity improvement at BER of 3.8 x 10 3 when utilizing ANN instead. In addition, the upper graph in FIG. 5 shows another l-dB receiver sensitivity improvement at BER of 2.4x 10 3 (20% SD-FEC) can be achieved when adopting ANN at lOkm case.

[0045] The computational complexity is summarized between the ANN and the second-order Volterra equalizer in Table 1. The computational complexity can be roughly exponentially reduced compared with Volterra equalizer under similar circumstance. Considering the same

O(M

memory sequence length of M for convergence efficiency, there are nearly ) filter tap coefficients are trained when adopting Volterra equalizer. Whereas in the proposed ANN with one output node, only weights are to be trained with mini-batch gradient descent. That is the significant difference between linear increase and exponential increase.

Table. 1 the computational complexity of the Volterra and ANN equalizer

[0046] We propose and experimentally demonstrate a novel ANN equalizer for high-speed PAM-8 transmission in IM/DD system. Mini-batch gradient descent is initially introduced to train ANN equalizer and obtain the converged results for nonlinear distortion. Using the proposed ANN equalizer, we successfully transmit a 40Gbaud PAM-8 signal over 4-km SMF with BER under the threshold of 3.8 x 10 3 and over lO-km SMF with BER under the threshold of 1 x 10 2 . We also elaborately compare the proposed ANN equalizer with other methods including LMS equalizer, Volterra equalizer and look-up table (LUT). Experimental results indicate that ANN slightly outperforms the Volterra equalizer with computational complexity exponentially reduced. It shows the promising prospect to utilize state-of-art neural -network based methods to solve nonlinear distortion problems in optical transmission system.

[0047] In some embodiments, an optical signal processing method 600 may be implemented by a receiver apparatus to receive an optical signal that includes information bits modulated using intensity modulation. The IM technique may be a PAM modulation scheme such as PAM- 2, PAM4, PAM8, PAM16 and so on.

[0048] FIG. 6 is a flowchart for an example method 600 of optical communication. The method 600 may include receiving (602) an optical signal that includes information bits modulated using pulse amplitude modulation. Other intensity modulation schemes may also be used. The PAM schemes used may include 4, 8 or 16 (or higher) constellations.

[0049] The method 600 may include performing front end processing (604) on the optical signal to convert into electrical domain and to generate a digital signal.

[0050] The method 600 may include equalizing (606) during a validation phase, the digital signal using an equalizer stage in which equalization is performed using an artificial neural network based equalizer. The artificial neural network may be operated as described in Section II of this document. For example, in certain embodiments, a mini-batch based training technique may be used where the length of data samples is 200 (plus or minus 10) samples. For example, the optical signal processing may be performed during a training phase in which a reference signal (or another signal that is known a priori) may be received and local results may be compared against the received signal. The validation phase may then follow, in which the information bits may be extracted using the training of the ANN performed in the training phase. In some embodiments, the training phase may be of a duration that is a relatively small percentage of the validation phase. For example, less than 10% of the time may be spent in the training phase. In some embodiments, the training phase may be repeated on a periodic basis, or on as-needed basis (e.g., when bit error rate is perceived to go up, then training may be performed more often).

[0051] As described in the example embodiments described in Section II, the training of the artificial neural network may have be done using a mean square error criterion using a machine learning algorithm. The training may include a forward propagation step in which an output signal estimate is calculated from a number of samples of the digital signal using a weight vector, and a cost function is computed based on a sum of squares of errors between expected values of the number of samples of the digital signal and the output signal estimate; and a back- propagation step in which a gradient descent is adopted to optimize the cost function as a function of the weight vector, and wherein a batch of k samples from the digital signal are used for the error propagation step, wherein k is an integer that is greater than 1 and less than the number of samples.

[0052] In some embodiments, during the validation phase, no back-propagation is performed and the non-linear equalization, learned during the training phase, may be used to map the input digital signals to constellation points of the intensity modulation signals, which are then converted in to information bits.

[0053] The method 600 may include extracting (608) the information bits from an output of the equalizer stage. This operation may be performed by demodulating levels of pulse amplitude modulation to the corresponding data bits. [0054] FIG. 7 depicts an example optical communication system 700 that includes one or more optical transmitters 702, communicating through an optical communication medium 704 with one or more optical communication receivers 706.

[0055] FIG. 8 is an example of an optical communications apparatus 800 that includes an optical receiver 802, a photodiode 804, and analog to digital converter 806 and a processor 808. The optical receiver 802 is configured to receive an optical signal that includes information bits modulated using an intensity modulation scheme. The photodiode 804 is configured to convert the optical signal into an electrical signal. The analog to digital converter 806 is configured to convert the electrical signal into a digital signal. The processor 808 is configured to equalize, during a validation phase, the digital signal using an equalizer stage in which equalization is performed using an artificial neural network based equalizer; and extracting the information bits from an output of the equalizer stage. The processor 808 may train, during a training phase, the artificial neural network using a mean square error criterion using a machine learning algorithm, wherein the training phase and the validation phase are temporally non-overlapping. The processor 808 may perform further functions described in Section II. In some embodiment, the optical receiver 802 and the photodiode 804 may be implemented in a same device that receives light signals over an optical link and converts them into electrical signals, where an intensity of the received light signal may be proportional to voltage or current amplitude of the electrical signal.

[0056] In some embodiments, an optical receiver apparatus (e.g., FIG. 8) may comprises a processor that is configured to implement the method 600. The optical receiver apparatus may be embodied into the receiving function implemented at the optical transmitter 702 (e.g., a network- side equipment such as an optical line terminal) or the receiving function at the optical receiver 706 (e.g., an optical network unit). In some embodiments, both the transmitter 702 and receiver 706 are able to transmit and receiver optical signals in a two-way communication network.

[0057] It will be appreciated that a novel nonlinear artificial neural network (ANN) equalizer for PAM-8 transmission in IM/DD system is disclosed. The Mini-batch gradient descent is introduced to efficiently train ANN equalizer. It will also be appreciated that the disclosed techniques can be used to build optical equipment that can transmit a 40Gbaud PAM-8 signal over 4-km SMF with BER under the threshold of 3.8 x 10 3 and over lO-km SMF with BER under the threshold of 1 c 10 2 . It will also be appreciated that the ANN equalizer achieves the best performance that is slightly superior to Volterra equalizer with computational complexity exponentially reduced. To the best of our knowledge, this is the first time to adopt mini-batch gradient descent to train ANN equalizer in IM/DD system.

[0058] The disclosed and other embodiments, modules and the functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more them. The term“data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.

[0059] A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

[0060] The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).

[0061] Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices.

Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

[0062] While this patent document contains many specifics, these should not be construed as limitations on the scope of an invention that is claimed or of what may be claimed, but rather as descriptions of features specific to particular embodiments. Certain features that are described in this document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub combination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a sub-combination or a variation of a sub-combination. Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results.

[0063] Only a few examples and implementations are disclosed. Variations, modifications, and enhancements to the described examples and implementations and other implementations can be made based on what is disclosed.