Input size =nh[l−1]×nw[l−1]×nc[l−1]
Kernel size =fh[l]×fw[l]×nc[l]
Where nc[l]= number of filters in layer l because it should match the number of outputs.
Stride size =sh[l]×sw[l]
Padding size =ph[l]×pw[l]
Output (Activations a[l]) size =nh[l]×nw[l]×nc[l]
nh[l]=⌊sh[l]nh[l−1]+2ph[l]−kh[l]⌋+1
nw[l]=⌊sw[l]nw[l−1]+2pw[l]−kw[l]⌋+1
# of Parameters:
Weights: fh[l]×fw[l]×nc[l−1]×nc[l]
Bias: nc[l]