Skip to the content.

X-SepFormer

Paper

X-SepFormer: End-to-end Speaker Extraction Network with Explicit Optimization on Speaker Confusion

Introduction

This is a demo for our paper X-SepFormer: End-to-end Speaker Extraction Network with Explicit Optimization on Speaker Confusion.

We list some speech examples of our baseline system, and compare our proposed methods with the baseline.

If you have any questions or suggestions, please contact liukai89@huawei.com

Examples

We divided examples into three categories:

  1. Female-Male Mixtures

  2. Male-Male Mixtures

  3. Female-Female Mixtures

For better display, we divided each category into 2 groups.

1. Female - Male Mixtures (Group 1 & 2)

Mixture
Ground-Truth
Baseline
X-SepFormer(S2)*
X-SepFormer(S3)*
Mixture
Ground-Truth
Baseline
X-SepFormer(S2)*
X-SepFormer(S3)*

2. Male - Male Mixtures (Group 1 & 2)

Mixture
Ground-Truth
Baseline
X-SepFormer(S2)*
X-SepFormer(S3)*
Mixture
Ground-Truth
Baseline
X-SepFormer(S2)*
X-SepFormer(S3)*

3. Female - Female Mixtures (Group 1 & 2)

Mixture
Ground-Truth
Baseline
X-SepFormer(S2)*
X-SepFormer(S3)*
Mixture
Ground-Truth
Baseline
X-SepFormer(S2)*
X-SepFormer(S3)*

[Demo GitHub]