That is similar to Graph Attention Network (GAN) that we noticed, simply that the way in which we calculate consideration is barely completely different since we additionally must inculcate the sting data!!
Above I’ve solely proven wrt one single head consideration, we will equally have a number of head consideration !!
In another successive papers to this paper, researchers have additionally tried including positional encodings !!
Identical aggregation strategies that you just noticed in non studying technique right here.