Describe the bug
No ASR results are produced and error after a while
To Reproduce
whispering --language en --model tiny --debug
Logs
(whisper_streaming) fc@Claudios-MacBook-Pro whisper_streaming % whispering --language en --model tiny --debug
[2022-10-09 19:13:13,532] cli.get_wshiper:211 DEBUG -> WhisperConfig: model_name='tiny' device='cpu' language='en' fp16=True
[2022-10-09 19:13:14,103] transcriber._set_dtype:35 WARNING -> FP16 is not supported on CPU; using FP32 instead
Using cache found in /Users/fc/.cache/torch/hub/snakers4_silero-vad_master
[2022-10-09 19:13:16,014] cli.get_context:223 DEBUG -> Context: timestamp=0.0 buffer_tokens=[] buffer_mel=None vad=True temperatures=[0.0, 0.2, 0.4, 0.6, 0.8, 1.0] allow_padding=False patience=None compression_ratio_threshold=2.4 logprob_threshold=-1.0 no_captions_threshold=0.6 best_of=5 beam_size=5 no_speech_threshold=0.6 buffer_threshold=0.5 vad_threshold=0.5
[2022-10-09 19:13:16,014] cli.transcribe_from_mic:51 INFO -> Ready to transcribe
[2022-10-09 19:13:16,058] cli.transcribe_from_mic:62 DEBUG -> Audio #: 0, The rest of queue: 0
[2022-10-09 19:13:19,915] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0
Analyzing[2022-10-09 19:13:19,916] transcriber.transcribe:235 DEBUG -> 60000
[2022-10-09 19:13:20,148] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375])
[2022-10-09 19:13:20,148] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 375])
[2022-10-09 19:13:20,148] transcriber.transcribe:263 DEBUG -> seek: 0
[2022-10-09 19:13:20,148] transcriber.transcribe:265 DEBUG -> mel.shape (375) - seek (0) < N_FRAMES (3000)
[2022-10-09 19:13:20,148] transcriber.transcribe:271 DEBUG -> No padding
[2022-10-09 19:13:20,148] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 375])
[2022-10-09 19:13:20,148] cli.transcribe_from_mic:62 DEBUG -> Audio #: 1, The rest of queue: 0
[2022-10-09 19:13:23,595] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0
Analyzing[2022-10-09 19:13:23,595] transcriber.transcribe:235 DEBUG -> 60000
[2022-10-09 19:13:23,785] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375])
[2022-10-09 19:13:23,785] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 375])
[2022-10-09 19:13:23,785] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 750])
[2022-10-09 19:13:23,785] transcriber.transcribe:263 DEBUG -> seek: 0
[2022-10-09 19:13:23,785] transcriber.transcribe:265 DEBUG -> mel.shape (750) - seek (0) < N_FRAMES (3000)
[2022-10-09 19:13:23,785] transcriber.transcribe:271 DEBUG -> No padding
[2022-10-09 19:13:23,785] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 750])
[2022-10-09 19:13:23,785] cli.transcribe_from_mic:62 DEBUG -> Audio #: 2, The rest of queue: 0
[2022-10-09 19:13:27,425] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0
Analyzing[2022-10-09 19:13:27,425] transcriber.transcribe:235 DEBUG -> 60000
[2022-10-09 19:13:27,474] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375])
[2022-10-09 19:13:27,475] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 750])
[2022-10-09 19:13:27,475] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 1125])
[2022-10-09 19:13:27,475] transcriber.transcribe:263 DEBUG -> seek: 0
[2022-10-09 19:13:27,475] transcriber.transcribe:265 DEBUG -> mel.shape (1125) - seek (0) < N_FRAMES (3000)
[2022-10-09 19:13:27,475] transcriber.transcribe:271 DEBUG -> No padding
[2022-10-09 19:13:27,475] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 1125])
[2022-10-09 19:13:27,475] cli.transcribe_from_mic:62 DEBUG -> Audio #: 3, The rest of queue: 0
[2022-10-09 19:13:31,115] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0
Analyzing[2022-10-09 19:13:31,115] transcriber.transcribe:235 DEBUG -> 60000
[2022-10-09 19:13:31,160] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375])
[2022-10-09 19:13:31,161] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 1125])
[2022-10-09 19:13:31,161] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 1500])
[2022-10-09 19:13:31,161] transcriber.transcribe:263 DEBUG -> seek: 0
[2022-10-09 19:13:31,161] transcriber.transcribe:265 DEBUG -> mel.shape (1500) - seek (0) < N_FRAMES (3000)
[2022-10-09 19:13:31,161] transcriber.transcribe:271 DEBUG -> No padding
[2022-10-09 19:13:31,161] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 1500])
[2022-10-09 19:13:31,161] cli.transcribe_from_mic:62 DEBUG -> Audio #: 4, The rest of queue: 0
[2022-10-09 19:13:34,998] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0
Analyzing[2022-10-09 19:13:34,998] transcriber.transcribe:235 DEBUG -> 60000
[2022-10-09 19:13:35,046] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375])
[2022-10-09 19:13:35,046] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 1500])
[2022-10-09 19:13:35,046] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 1875])
[2022-10-09 19:13:35,046] transcriber.transcribe:263 DEBUG -> seek: 0
[2022-10-09 19:13:35,046] transcriber.transcribe:265 DEBUG -> mel.shape (1875) - seek (0) < N_FRAMES (3000)
[2022-10-09 19:13:35,046] transcriber.transcribe:271 DEBUG -> No padding
[2022-10-09 19:13:35,047] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 1875])
[2022-10-09 19:13:35,047] cli.transcribe_from_mic:62 DEBUG -> Audio #: 5, The rest of queue: 0
[2022-10-09 19:13:38,689] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0
Analyzing[2022-10-09 19:13:38,689] transcriber.transcribe:235 DEBUG -> 60000
[2022-10-09 19:13:38,737] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375])
[2022-10-09 19:13:38,737] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 1875])
[2022-10-09 19:13:38,737] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 2250])
[2022-10-09 19:13:38,737] transcriber.transcribe:263 DEBUG -> seek: 0
[2022-10-09 19:13:38,737] transcriber.transcribe:265 DEBUG -> mel.shape (2250) - seek (0) < N_FRAMES (3000)
[2022-10-09 19:13:38,737] transcriber.transcribe:271 DEBUG -> No padding
[2022-10-09 19:13:38,737] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 2250])
[2022-10-09 19:13:38,737] cli.transcribe_from_mic:62 DEBUG -> Audio #: 6, The rest of queue: 0
[2022-10-09 19:13:42,368] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0
Analyzing[2022-10-09 19:13:42,369] transcriber.transcribe:235 DEBUG -> 60000
[2022-10-09 19:13:42,415] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375])
[2022-10-09 19:13:42,415] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 2250])
[2022-10-09 19:13:42,416] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 2625])
[2022-10-09 19:13:42,416] transcriber.transcribe:263 DEBUG -> seek: 0
[2022-10-09 19:13:42,416] transcriber.transcribe:265 DEBUG -> mel.shape (2625) - seek (0) < N_FRAMES (3000)
[2022-10-09 19:13:42,416] transcriber.transcribe:271 DEBUG -> No padding
[2022-10-09 19:13:42,416] transcriber.transcribe:326 DEBUG -> ctx.buffer_mel.shape: torch.Size([80, 2625])
[2022-10-09 19:13:42,416] cli.transcribe_from_mic:62 DEBUG -> Audio #: 7, The rest of queue: 0
[2022-10-09 19:13:46,251] cli.transcribe_from_mic:77 DEBUG -> Got. The rest of queue: 0
Analyzing[2022-10-09 19:13:46,251] transcriber.transcribe:235 DEBUG -> 60000
[2022-10-09 19:13:46,298] transcriber.transcribe:252 DEBUG -> Incoming new_mel.shape: torch.Size([80, 375])
[2022-10-09 19:13:46,298] transcriber.transcribe:256 DEBUG -> buffer_mel.shape: torch.Size([80, 2625])
[2022-10-09 19:13:46,299] transcriber.transcribe:259 DEBUG -> mel.shape: torch.Size([80, 3000])
[2022-10-09 19:13:46,299] transcriber.transcribe:263 DEBUG -> seek: 0
[2022-10-09 19:13:46,299] transcriber.transcribe:280 DEBUG -> seek=0, timestamp=0.0, mel.shape: torch.Size([80, 3000]), segment.shape: torch.Size([80, 3000])
[2022-10-09 19:13:46,299] transcriber._decode_with_fallback:103 DEBUG -> DecodeOptions: DecodingOptions(task='transcribe', language='en', temperature=0.0, sample_len=None, best_of=None, beam_size=5, patience=None, length_penalty=None, prompt=[], prefix=None, suppress_blank=True, suppress_tokens='-1', without_timestamps=False, max_initial_timestamp=1.0, fp16=False)
Traceback (most recent call last):
File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/bin/whispering", line 8, in
sys.exit(main())
File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whispering/cli.py", line 301, in main
for text in transcribe_from_mic(
File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whispering/cli.py", line 82, in transcribe_from_mic
for chunk in wsp.transcribe(audio=audio, ctx=ctx):
File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whispering/transcriber.py", line 284, in transcribe
result = self._decode_with_fallback(
File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whispering/transcriber.py", line 104, in _decode_with_fallback
decode_result = self.model.decode(
File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whisper/decoding.py", line 700, in decode
result = DecodingTask(model, options).run(mel)
File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whisper/decoding.py", line 472, in init
self.decoder = BeamSearchDecoder(
File "/Users/fc/.local/share/virtualenvs/whisper_streaming-htMqhiJ1/lib/python3.10/site-packages/whisper/decoding.py", line 283, in init
self.max_candidates: int = round(beam_size * (1.0 + patience))
TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'
Environment
- OS: macOS Monteray
- Python Version: 3.10.3
- Whispering version: 0.5.0