SpeakerRecognition-ResNet-GhostVLAD
Utterance-level Aggregation For Speaker Recognition In The Wild, using a "thin-ResNet" trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end