# Speaker Recognition Papers

- Dataset
    - VoxCeleb
        - [Voxceleb: A Large-Scale Speaker Identification Dataset](https://www.robots.ox.ac.uk/~vgg/publications/2017/Nagrani17/nagrani17.pdf)
        - [Voxceleb2: Deep Speaker Recognition](https://www.robots.ox.ac.uk/~vgg/publications/2018/Chung18a/chung18a.pdf)
        - [Voxceleb: Large-Scale Speaker Verification In The Wild](https://www.robots.ox.ac.uk/~vgg/publications/2019/Nagrani19/nagrani19.pdf)
    - CnCeleb
        - [Cn-Celeb: A Challenging Chinese Speaker Recognition Dataset](https://arxiv.org/pdf/1911.01799.pdf)
        - [Cn-Celeb: Multi-Genre Speaker Recognition](https://arxiv.org/pdf/2012.12468.pdf)
- Architecture Design
    - [X-Vectors: Robust Dnn Embeddings For Speaker Recognition](https://www.danielpovey.com/files/2018_icassp_xvectors.pdf) (
      **x-vector**)
    - [But System Description To Voxceleb Speaker Recognition Challenge 2019](https://arxiv.org/pdf/1910.12592.pdf) (*
      *r-vector**)
    - [Rawnet: Advanced End-To-End Deep Neural Network Using Raw Waveforms For Text-Independent Speaker Verification](https://arxiv.org/pdf/1904.08104.pdf) (
      **RawNet**)
    - [Speaker Recognition From Raw Waveform With Sincnet](https://arxiv.org/pdf/1808.00158.pdf) (**SincNet**)
    - [Ecapa-Tdnn: Emphasized Channel Attention, Propagation And Aggregation In Tdnn Based Speaker Verification](https://arxiv.org/pdf/2005.07143.pdf) (
      **ECAPA-TDNN**)
- Optimization Objective
    - Classification Based Loss
        - [Exploring The Encoding Layer And Loss Function In End-To-End Speaker And Language Recognition System](https://arxiv.org/pdf/1804.05160.pdf)
        - [Angular Softmax For Short-Duration Text-Independent Speaker Verification](https://www.researchgate.net/publication/327389164)
        - [Ensemble Additive Margin Softmax For Speaker Verification](https://cs.nju.edu.cn/lwj/paper/ICASSP19_EAMS.pdf)
        - [Margin Matters: Towards More Discriminative Deep Neural Network Embeddings For Speaker Recognition](https://arxiv.org/pdf/1906.07317.pdf)
        - [Large Margin Softmax Loss For Speaker Verification](https://arxiv.org/abs/1904.03479)
    - End-to-End Loss
        - [End-To-End Text-Dependent Speaker Verification](https://arxiv.org/pdf/1509.08062.pdf)
        - [End-To-End Text-Independent Speaker Verification With Triplet Loss On Short Utterances](https://www.researchgate.net/publication/317416159)
        - [Generalized End-To-End Loss For Speaker Verification](https://arxiv.org/pdf/1710.10467.pdf)
- Pooling Method
    - [Attentive Statistics Pooling For Deep Speaker Embedding](https://arxiv.org/pdf/1803.10963.pdf)
    - [Multi-Resolution Multi-Head Attention In Deep Speaker Embedding](https://ieeexplore.ieee.org/abstract/document/9053217)
    - [Utterance-Level Aggregation For Speaker Recognition In The Wild](https://ieeexplore.ieee.org/abstract/document/8683120)
    - [A Novel Learnable Dictionary Encoding Layer For End-To-End Language Identification](https://arxiv.org/abs/1804.00385)
- Self-supervised Learning
    - [Augmentation Adversarial Training For Self-Supervised Speaker Recognition](https://arxiv.org/pdf/2007.12085.pdf)
    - [Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning](https://arxiv.org/pdf/2012.07178.pdf)
    - [Self-Supervised Speaker Recognition With Loss-Gated Learning](https://arxiv.org/pdf/2110.03869.pdf)