Count Number of Parameters of Model in TensorFlow 2

November 2, 2020
TensorFlow 2
0 Comments
11621 Views

TensorFlow 2 allows to count the number of trainable and non-trainable parameters of the model. It can be useful if we want to improve the model structure, reduce the size of a model, reduce the time taken for model predictions, and so on.

Let's say we have a model with two trainable and two non-trainable Dense layers. We can use summary method to print a summary of the model.

from tensorflow import keras
import numpy as np

model = keras.Sequential([
    keras.layers.Dense(4, activation='relu', input_shape=(3,)),
    keras.layers.Dense(5, activation='relu', trainable=False),
    keras.layers.Dense(10, activation='relu', trainable=False),
    keras.layers.Dense(2, activation='softmax'),
])

model.summary()

At the end of the summary we can see the number of trainable and non-trainable parameters of the model.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 4)                 16        
_________________________________________________________________
dense_1 (Dense)              (None, 5)                 25        
_________________________________________________________________
dense_2 (Dense)              (None, 10)                60        
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 22        
=================================================================
Total params: 123
Trainable params: 38
Non-trainable params: 85

There is another way to count the number of parameters of the model. We can use attributes: trainable_weights and non_trainable_weights. They contains a list of trainable and non-trainable variables. Loop through variables and multiply the dimensions of their shapes. Finally sum all the previously multiplied values.

Instead of trainable_weights and non_trainable_weights we can use trainable_variables and non_trainable_variables attributes.

trainableParams = np.sum([np.prod(v.get_shape()) for v in model.trainable_weights])
nonTrainableParams = np.sum([np.prod(v.get_shape()) for v in model.non_trainable_weights])
totalParams = trainableParams + nonTrainableParams

print(trainableParams)
print(nonTrainableParams)
print(totalParams)

We provide the formulas and examples how to count the number of trainable and non-trainable parameters of the model without using a code.

Dense layers

Formula:

num_params = (input_size + bias) * output_size
bias = 1

Model:

model = keras.Sequential([
    keras.layers.Dense(5, activation='relu', input_shape=(3,)),
    keras.layers.Dense(8, activation='relu', trainable=False),
    keras.layers.Dense(10, activation='relu', trainable=False),
    keras.layers.Dense(15, activation='relu'),
    keras.layers.Dense(4, activation='softmax'),
])

Calculation:

dense = (3 + 1) * 5 = 20
dense_1 = (5 + 1) * 8 = 48
dense_2 = (8 + 1) * 10 = 90
dense_3 = (10 + 1) * 15 = 165
dense_4 = (15 + 1) * 4 = 64

Trainable params: dense + dense_3 + dense_4 = 20 + 165 + 64 = 249
Non-trainable params: dense_1 + dense_2 = 48 + 90 = 138
Total params: 249 + 138 = 387

Summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 5)                 20        
_________________________________________________________________
dense_1 (Dense)              (None, 8)                 48        
_________________________________________________________________
dense_2 (Dense)              (None, 10)                90        
_________________________________________________________________
dense_3 (Dense)              (None, 15)                165       
_________________________________________________________________
dense_4 (Dense)              (None, 4)                 64        
=================================================================
Total params: 387
Trainable params: 249
Non-trainable params: 138

Convolution layers

Formula:

num_params = (input_channels * kernel_size + bias) * output_channels
bias = 1

Model:

model = keras.Sequential([
    keras.layers.Conv2D(64, (6, 6), activation='relu', input_shape=(28, 28, 3)),
    keras.layers.MaxPooling2D(2, 2),
    keras.layers.Conv2D(32, (4, 4), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),
])

Calculation:

conv2d = (3 * 6 * 6 + 1) * 64 = 6976
conv2d_1 = (64 * 4 * 4 + 1) * 32 = 32800

Trainable params: conv2d + conv2d_1 = 6976 + 32800 = 39776
Non-trainable params: 0
Total params: 39776

Summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 23, 23, 64)        6976      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 11, 11, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 8, 8, 32)          32800     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 4, 4, 32)          0         
=================================================================
Total params: 39,776
Trainable params: 39,776
Non-trainable params: 0

Convolution and Dense layers

Model:

model = keras.Sequential([
    keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(28, 28, 3)),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Conv2D(32, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Conv2D(64, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Flatten(),
    keras.layers.Dense(512, activation='relu'),
    keras.layers.Dense(1)
])

Calculation:

conv2d = (3 * 3 * 3 + 1) * 16 = 448
conv2d_1 = (16 * 3 * 3 + 1) * 32 = 4640
conv2d_2 = (32 * 3 * 3 + 1) * 64 = 18496
dense = (64 + 1) * 512 = 33280
dense_1 = (512 + 1) * 1 = 513

Trainable params: 448 + 4640 + 18496 + 33280 + 513 = 57377

Summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_2 (Conv2D)            (None, 26, 26, 16)        448       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 16)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 11, 11, 32)        4640      
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 5, 5, 32)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 3, 3, 64)          18496     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 1, 1, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 64)                0         
_________________________________________________________________
dense_5 (Dense)              (None, 512)               33280     
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 513       
=================================================================
Total params: 57,377
Trainable params: 57,377
Non-trainable params: 0

Recurrent layers

Formula:

num_params = num_ffns * (output_size * (output_size + input_size) + output_size)
num_ffns = 1 (SimpleRNN)
num_ffns = 3 (GRU)
num_ffns = 4 (LSTM)

If parameter reset_after of the GRU layer is set to True (default TensorFlow 2) then number of trainable parameters can be calculated using formula:

num_params = num_of_ffns * (output_size * (output_size + input_size) + 2 * output_size)

Model:

model = keras.Sequential([
    keras.layers.LSTM(64, return_sequences=True, input_shape=(None, 5)),
    keras.layers.GRU(32, return_sequences=True),
    keras.layers.GRU(16, return_sequences=True, reset_after=False),
    keras.layers.SimpleRNN(8)
])

Calculation:

lstm = 4 * (64 * (64 + 5) + 64) = 17920
gru = 3 * (32 * (32 + 64) + 2 * 32) = 9408
gru_1 = 3 * (16 * (16 + 32) + 16) = 2352
simple_rnn = 1 * (8 * (8 + 16) + 8) = 200

Trainable params: 17920 + 9408 + 2352 + 200 = 29880

Summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, None, 64)          17920     
_________________________________________________________________
gru (GRU)                    (None, None, 32)          9408      
_________________________________________________________________
gru_1 (GRU)                  (None, None, 16)          2352      
_________________________________________________________________
simple_rnn (SimpleRNN)       (None, 8)                 200       
=================================================================
Total params: 29,880
Trainable params: 29,880
Non-trainable params: 0

Bidirectional recurrent layers

Formula:

num_params = 2 * num_params_recurrent_layer

Model:

model = keras.Sequential([
    keras.layers.Bidirectional(
        keras.layers.LSTM(64, return_sequences=True),
        input_shape=(None, 5)
    ),
    keras.layers.Bidirectional(
        keras.layers.GRU(32, return_sequences=True)
    ),
    keras.layers.Bidirectional(
        keras.layers.GRU(16, return_sequences=True, reset_after=False)
    ),
    keras.layers.Bidirectional(
        keras.layers.SimpleRNN(8)
    )
])

Calculation:

lstm = 2 * 4 * (64 * (64 + 5) + 64) = 35840
gru = 2 * 3 * (32 * (32 + 128) + 2 * 32) = 31104
gru_1 = 2 * 3 * (16 * (16 + 64) + 16) = 7776
simple_rnn = 2 * 1 * (8 * (8 + 32) + 8) = 656

Trainable params: 35840 + 31104 + 7776 + 656 = 75376

Summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
bidirectional (Bidirectional (None, None, 128)         35840     
_________________________________________________________________
bidirectional_1 (Bidirection (None, None, 64)          31104     
_________________________________________________________________
bidirectional_2 (Bidirection (None, None, 32)          7776      
_________________________________________________________________
bidirectional_3 (Bidirection (None, 16)                656       
=================================================================
Total params: 75,376
Trainable params: 75,376
Non-trainable params: 0

Dense layers

Convolution layers

Convolution and Dense layers

Recurrent layers

Bidirectional recurrent layers

Related

Leave a Comment