Jun 8, 2021
w1, w2, and w3 are the weights from the network. The network may have been pre-trained or you may have trained it yourself on the task. These attention maps are always calculated on a network that is done training, i.e. its weights are finalized / fixed / not going to change any more. So, the weights w1, w2, and w3 can just be read directly out of the weights of your finalized network (i.e. they are parameters of your final network). You don’t need to calculate them.