TensorFlow 2 allows to count the number of trainable and non-trainable parameters of the model. It can be useful if we want to improve the model structure, reduce the size of a model, reduce the time taken for model predictions, and so on.
Let's say we have a model with two trainable and two non-trainable Dense
layers. We can use summary
method to print a summary of the model.
from tensorflow import keras
import numpy as np
model = keras.Sequential([
keras.layers.Dense(4, activation='relu', input_shape=(3,)),
keras.layers.Dense(5, activation='relu', trainable=False),
keras.layers.Dense(10, activation='relu', trainable=False),
keras.layers.Dense(2, activation='softmax'),
])
model.summary()
At the end of the summary we can see the number of trainable and non-trainable parameters of the model.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 4) 16
_________________________________________________________________
dense_1 (Dense) (None, 5) 25
_________________________________________________________________
dense_2 (Dense) (None, 10) 60
_________________________________________________________________
dense_3 (Dense) (None, 2) 22
=================================================================
Total params: 123
Trainable params: 38
Non-trainable params: 85
There is another way to count the number of parameters of the model. We can use attributes: trainable_weights
and non_trainable_weights
. They contains a list of trainable and non-trainable variables. Loop through variables and multiply the dimensions of their shapes. Finally sum all the previously multiplied values.
Instead of trainable_weights
and non_trainable_weights
we can use trainable_variables
and non_trainable_variables
attributes.
trainableParams = np.sum([np.prod(v.get_shape()) for v in model.trainable_weights])
nonTrainableParams = np.sum([np.prod(v.get_shape()) for v in model.non_trainable_weights])
totalParams = trainableParams + nonTrainableParams
print(trainableParams)
print(nonTrainableParams)
print(totalParams)
We provide the formulas and examples how to count the number of trainable and non-trainable parameters of the model without using a code.
Dense layers
Formula:
num_params = (input_size + bias) * output_size
bias = 1
Model:
model = keras.Sequential([
keras.layers.Dense(5, activation='relu', input_shape=(3,)),
keras.layers.Dense(8, activation='relu', trainable=False),
keras.layers.Dense(10, activation='relu', trainable=False),
keras.layers.Dense(15, activation='relu'),
keras.layers.Dense(4, activation='softmax'),
])
Calculation:
dense = (3 + 1) * 5 = 20
dense_1 = (5 + 1) * 8 = 48
dense_2 = (8 + 1) * 10 = 90
dense_3 = (10 + 1) * 15 = 165
dense_4 = (15 + 1) * 4 = 64
Trainable params: dense + dense_3 + dense_4 = 20 + 165 + 64 = 249
Non-trainable params: dense_1 + dense_2 = 48 + 90 = 138
Total params: 249 + 138 = 387
Summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 5) 20
_________________________________________________________________
dense_1 (Dense) (None, 8) 48
_________________________________________________________________
dense_2 (Dense) (None, 10) 90
_________________________________________________________________
dense_3 (Dense) (None, 15) 165
_________________________________________________________________
dense_4 (Dense) (None, 4) 64
=================================================================
Total params: 387
Trainable params: 249
Non-trainable params: 138
Convolution layers
Formula:
num_params = (input_channels * kernel_size + bias) * output_channels
bias = 1
Model:
model = keras.Sequential([
keras.layers.Conv2D(64, (6, 6), activation='relu', input_shape=(28, 28, 3)),
keras.layers.MaxPooling2D(2, 2),
keras.layers.Conv2D(32, (4, 4), activation='relu'),
keras.layers.MaxPooling2D((2, 2)),
])
Calculation:
conv2d = (3 * 6 * 6 + 1) * 64 = 6976
conv2d_1 = (64 * 4 * 4 + 1) * 32 = 32800
Trainable params: conv2d + conv2d_1 = 6976 + 32800 = 39776
Non-trainable params: 0
Total params: 39776
Summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 23, 23, 64) 6976
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 11, 11, 64) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 8, 8, 32) 32800
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 4, 4, 32) 0
=================================================================
Total params: 39,776
Trainable params: 39,776
Non-trainable params: 0
Convolution and Dense layers
Model:
model = keras.Sequential([
keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(28, 28, 3)),
keras.layers.MaxPooling2D((2, 2)),
keras.layers.Conv2D(32, (3, 3), activation='relu'),
keras.layers.MaxPooling2D((2, 2)),
keras.layers.Conv2D(64, (3, 3), activation='relu'),
keras.layers.MaxPooling2D((2, 2)),
keras.layers.Flatten(),
keras.layers.Dense(512, activation='relu'),
keras.layers.Dense(1)
])
Calculation:
conv2d = (3 * 3 * 3 + 1) * 16 = 448
conv2d_1 = (16 * 3 * 3 + 1) * 32 = 4640
conv2d_2 = (32 * 3 * 3 + 1) * 64 = 18496
dense = (64 + 1) * 512 = 33280
dense_1 = (512 + 1) * 1 = 513
Trainable params: 448 + 4640 + 18496 + 33280 + 513 = 57377
Summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_2 (Conv2D) (None, 26, 26, 16) 448
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 16) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 11, 11, 32) 4640
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 5, 5, 32) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 3, 3, 64) 18496
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 1, 1, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 64) 0
_________________________________________________________________
dense_5 (Dense) (None, 512) 33280
_________________________________________________________________
dense_6 (Dense) (None, 1) 513
=================================================================
Total params: 57,377
Trainable params: 57,377
Non-trainable params: 0
Recurrent layers
Formula:
num_params = num_ffns * (output_size * (output_size + input_size) + output_size)
num_ffns = 1 (SimpleRNN)
num_ffns = 3 (GRU)
num_ffns = 4 (LSTM)
If parameter reset_after
of the GRU
layer is set to True
(default TensorFlow 2) then number of trainable parameters can be calculated using formula:
num_params = num_of_ffns * (output_size * (output_size + input_size) + 2 * output_size)
Model:
model = keras.Sequential([
keras.layers.LSTM(64, return_sequences=True, input_shape=(None, 5)),
keras.layers.GRU(32, return_sequences=True),
keras.layers.GRU(16, return_sequences=True, reset_after=False),
keras.layers.SimpleRNN(8)
])
Calculation:
lstm = 4 * (64 * (64 + 5) + 64) = 17920
gru = 3 * (32 * (32 + 64) + 2 * 32) = 9408
gru_1 = 3 * (16 * (16 + 32) + 16) = 2352
simple_rnn = 1 * (8 * (8 + 16) + 8) = 200
Trainable params: 17920 + 9408 + 2352 + 200 = 29880
Summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, None, 64) 17920
_________________________________________________________________
gru (GRU) (None, None, 32) 9408
_________________________________________________________________
gru_1 (GRU) (None, None, 16) 2352
_________________________________________________________________
simple_rnn (SimpleRNN) (None, 8) 200
=================================================================
Total params: 29,880
Trainable params: 29,880
Non-trainable params: 0
Bidirectional recurrent layers
Formula:
num_params = 2 * num_params_recurrent_layer
Model:
model = keras.Sequential([
keras.layers.Bidirectional(
keras.layers.LSTM(64, return_sequences=True),
input_shape=(None, 5)
),
keras.layers.Bidirectional(
keras.layers.GRU(32, return_sequences=True)
),
keras.layers.Bidirectional(
keras.layers.GRU(16, return_sequences=True, reset_after=False)
),
keras.layers.Bidirectional(
keras.layers.SimpleRNN(8)
)
])
Calculation:
lstm = 2 * 4 * (64 * (64 + 5) + 64) = 35840
gru = 2 * 3 * (32 * (32 + 128) + 2 * 32) = 31104
gru_1 = 2 * 3 * (16 * (16 + 64) + 16) = 7776
simple_rnn = 2 * 1 * (8 * (8 + 32) + 8) = 656
Trainable params: 35840 + 31104 + 7776 + 656 = 75376
Summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional (Bidirectional (None, None, 128) 35840
_________________________________________________________________
bidirectional_1 (Bidirection (None, None, 64) 31104
_________________________________________________________________
bidirectional_2 (Bidirection (None, None, 32) 7776
_________________________________________________________________
bidirectional_3 (Bidirection (None, 16) 656
=================================================================
Total params: 75,376
Trainable params: 75,376
Non-trainable params: 0
Leave a Comment
Cancel reply