- Laith Sharba

# Natural Language Processing: Role of Encoders, Decoders, Attention, and Transformers

Probably, the concept of attention is most important in Transformers, and that’s why they are so much emphasized, but Encoders and Decoders are equally important. And we need all of them. The encoder-decoder style for the RN networks looks like being very influential on a host of the order to order prediction problem for natural processing like machine translation or caption generator.

Attention is a technique that speeches the restraint of the encoder and decoder planning on the long sequence, and that increases the speed of the learning as well boosts the cleverness of the model when it comes to order to order forecast problems.

Popular part of the tutorial below, we will be discovering how to make an encoder-decoder RRN network with the attention in Python with the Keras.

You will learn how to project a minor and configurable problematic to gage encoder-decoder RRN network, and that without or with attention. We will be discussing the encoder-decoder network through and deprived of the attention.

So let’s start the journey:

### Python Setting

**This paper requires a certain prerequisite. It is assumed that Python 3 SciPy is installed.**

**Also, you need to have the Keras (2.0 or advanced) which is to be fitted with moreover the Tensorflow or else the Theano backend. **

**The paper also requires you have the knowledge of the Pandas, scikit-learn, NumPy and the Matplotlib, and they should be installed as well.**

If you need assistance with your setting, see this column [1].

### Encoder-Decoder through Attention

The encoder-decoder prototypical with the help of the RRN Network is architected for the order to order forecast kind of problems.

Like its name, this is a blend of two models. And these sub-models are Encoders and the decoders. Encoder steps over the input time ladders and encrypts the entire order into the fixed distance vector. The decoder, on the other hand, steps over the production time phases and it does that while reading through the setting vector.

However, this architecture is performance wise poor when we have extended input or the output order. And the reason for this is the static sized interior illustration which is being cast-off by the encoder.

The problem is being solved with the help of the attention, which is an allowance to the architecture, which speeches this restraint. It first delivers a richer background that it receipts from the encoder then to the decoder in addition to an education machinery where the decoder can be made to learn how to pay more attention to a learning mechanism, and where the decoder can learn where to wage the courtesy in the richer encoding while forecasting each stretch the ladders in the production order.

For additional on attention in the encoder-decoder style, see the columns [3][4].

### Test Problematic aimed at Attention

Before we develop models with the attention, we first want to describe the forced ascendable test problematic which can be used to find out whether the attention is providing a little profit. In this examination problematic, we will produce a series of haphazard records which will aid as being input and identical output order that is embraced of the subsection of numbers chosen from the input order.

As an instance, an input order of [2,7,3,8,4] can certainly be one and the predictable output from this might be haphazard numbers in the sequence [2,7].

`And we will describe the problematic as such the input and the output are the same in length as well as pad the output sequence with the “0” as values, and as needed.

Thus, firstly we are in need of a function to generate the sequence of the random integers. And we need to practice the Python function randint() for this. We need to produce the haphazard integers in between 0 and the extreme value and then practice this choice as the cardinality, which is the number of features or an alliance of trouble. The function generate-sequence() is going to generate the random integers in a sequence to a static distance and through the stated cardinality.

We are also required to decode an encoded sequence. This is required to turn the prediction from the model or expected sequence that has been encoded, back to its integer form, and we can read that and evaluate.

And we have functions in the Python to accomplish the above.

To view a test for this, please refer to [2].

Thus, first, we generate a random integer, then encode it, and finally decode it.

And finally, we require a function that can create input & output pair of sequences to train as well as evaluate a model. We can name it get_pair(). To find the details read [2].

The function get_pair() will return one input as well as output sequence pair and we need to provide it input length, cardinality and output length. And both the output and the input sequence are going to be of the same length, and that is the length of the input sequence, However, the output will be considered as the first n character of the input sequence, and it is then padded with zero values to make it as large as the required length.

And these sequences of integers are then being encoded and finally been reshaped into the 3D mode, which is required for the recurrent neural network, and the dimensions are samples, time stamps, and the features. And in this one, the samples are always 1 as we can generate only one input-output pair, the timestamp will be the length of an input.

The above is just an example and a layout for how this NLP works. We will continue with this paper next week as well. Hence stay tuned, our main objective through this paper is to make you learn Neural Network and NLP. And in the next few weeks, we will be up with complete details here, and the transformer will follow as well. Till next Monday, you can have a read about the above topics, and we will be up with more details next week. Till then Good Bye.

### References:

1. https://machinelearningmastery.com/setup-python-environment-machine-learning-deep-learning-anaconda/

2. https://machinelearningmastery.com/encoder-decoder-attention-sequence-to-sequence-prediction-keras/

3. https://machinelearningmastery.com/attention-long-short-term-memory-recurrent-neural-networks/

4. https://machinelearningmastery.com/how-does-attention-work-in-encoder-decoder-recurrent-neural-networks/