We take audio data, trim it and align it to tempo (which is provided alongside the audio data).
Then we split the audio data into segments with length of x
samples
(where x is Thirty-second note of the audio data in the
specified tempo at the sample rate of the specified audio).
From these segments we create Mel spectrogram which tries to encode the essential information of the audio data in reduced number of bytes.
We feed these segments into recurrent neural network which should generate the resulting beat saber notes for the specified segment (after seeing previous segments).
Each note in beat saber has horizontal and vertical position (one of 12 positions in total), one of 9 cut directions and a color.
enum HorizontalPosition // (_lineIndex)
{
Left = 0,
CenterLeft = 1,
CenterRight = 2,
Right = 3
}
enum VerticalPosition // (_lineLayer)
{
Bottom = 0,
Middle = 1,
Top = 2
}
enum Color
{
Left = 0,
Right = 1,
Bomb = 3
}
enum CutDirection
{
Up = 0,
Down = 1,
Left = 2,
Right = 3,
UpLeft = 4,
UpRight = 5,
DownLeft = 6,
DownRight = 7,
Any = 8
}
- random, based on beat detection - https://github.com/mindleaving/beatsabertools/tree/master/BeatSaberSongGenerator
https://nlml.github.io/neural-networks/detecting-bpm-neural-networks/ https://towardsdatascience.com/audio-classification-using-fastai-and-on-the-fly-frequency-transforms-4dbe1b540f89