Learn Before
Concept

Implementing Multi-task Learning in Deep Learning

Instead of one output, we use avector of binary outputs in the final layer. For any datapoint, any number of these outputs can be 0 or 1. This is in contrast to the Softmax activation function where only one label could be true.

Because we have a series of (probably independent) outputs, the loss function is Sigmoid, similar to Logistic regression, but instead of one output variable, y insidcates a vector of binary outputs.

Image 0

0

1

Updated 2021-04-07

Tags

Data Science