¿Cómo calcular el número de parámetros de una red LSTM?

¿Hay alguna manera de calcular el número total de parámetros en una red LSTM?

He encontrado un ejemplo pero no estoy seguro de qué tan correctoesta es o si lo he entendido correctamente.

Por ejemplo, considere el siguiente ejemplo: -

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM
model = Sequential()
model.add(LSTM(256, input_dim=4096, input_length=16))
model.summary()


Output

____________________________________________________________________________________________________
Layer (type)                       Output Shape        Param #     Connected to                     
====================================================================================================
lstm_1 (LSTM)                      (None, 256)         4457472     lstm_input_1[0][0]               
====================================================================================================
Total params: 4457472
____________________________________________________________________________________________________

As per My understanding n is the input vector lenght. And m is the number of time steps. and in this example they consider the number of hidden layers to be 1.

Hence according to the formula in the post. 4(nm+n^2) in my example m=16;n=4096;num_of_units=256

4*((4096*16)+(4096*4096))*256 = 17246978048

Why is there such a difference? Did I misunderstand the example or was the formula wrong ?

Respuestas a la pregunta(4)

Su respuesta a la pregunta