Describe the bug I've been using SpeechBrain Wav2Vec2 training rec

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Wav2Vec2Pretrain (HFTransformersInterface implementation) samples padded values for mask_time_indices and negative_sample_indices about speechbrain HOT 3 OPEN

porfirythelaw commented on September 24, 2024

Wav2Vec2Pretrain (HFTransformersInterface implementation) samples padded values for mask_time_indices and negative_sample_indices

from speechbrain.

Comments (3)

Adel-Moumen commented on September 24, 2024

Hey @TParcollet, could you please have a look?

from speechbrain.

porfirythelaw commented on September 24, 2024

My local fix is something like this (using features_padding_mask):

   padding_mask = make_padding_masks(wav, wav_len=wav_lens)
   features_padding_mask = self.model._get_feature_vector_attention_mask(
            sequence_length, padding_mask, add_adapter=False
        )

    # 1. Compute the indices that will be masked
    mask_time_indices = _compute_mask_indices(
        (batch_size, sequence_length),
        mask_prob=self.mask_prob,
        mask_length=self.mask_length,
        attention_mask=features_padding_mask

    )
    torch_mask_time_indices = torch.tensor(
        mask_time_indices, device=wav.device, dtype=torch.long,
    )

    # 2. Sample the negative samples from the entire sequence.
    # Fairseq does it only on the masked indices, but this only work if you
    # have long sentences. For more versatily, we sample on the entire sequence.
    # value.
    full_sentence_indices = np.ones((batch_size, sequence_length))

    negative_sample_indices = torch.tensor(
        transformers.models.wav2vec2.modeling_wav2vec2._sample_negative_indices(
            (batch_size, sequence_length),
            num_negatives=self.config.num_negatives,
            # mask_time_indices=full_sentence_indices,
            mask_time_indices=features_padding_mask.detach().cpu().numpy()

        ),
        device=wav.device,
        dtype=torch.long,
    )

from speechbrain.

TParcollet commented on September 24, 2024

That's quite late to answer, but yes it certainly is true. The reason is that we rely on HF functions here, and back to when we wrote this code, I believe there was no alternative. @porfirythelaw could you propose a PR with this fix? I will test it.

Many thanks.

from speechbrain.

Wav2Vec2Pretrain (HFTransformersInterface implementation) samples padded values for mask_time_indices and negative_sample_indices about speechbrain HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent