Comments (5)
I also wondered about this matter (similarly raised in annotated-S4 and S5) and found a clue that "the sum in (7) will implicitly include the conjugate pairs of A, B, C and therefore resolve to twice the real part of the original sum." in S4D paper.
Also, check out the mathematical properties of complex conjugates.
from mamba.
I also wanna train a complex-valued Mamba model. And according to the state-space model paper I have read before and the Mamba repository, I changed the initialization of A into complex-valued which was proposed in S4D.
real-valued initialization :
mamba/mamba_ssm/modules/mamba_simple.py
Lines 103 to 111 in 009bec5
mamba/mamba_ssm/modules/mamba_simple.py
Line 143 in 009bec5
complex-valued initialization:
# # S4D real initialization
# A = repeat(
# torch.arange(1, self.d_state + 1, dtype=torch.float32, device=device),
# "n -> d n",
# d=self.d_inner,
# ).contiguous()
# A_log = torch.log(A) # Keep A_log in fp32
# self.A_log = nn.Parameter(A_log)
# self.A_log._no_weight_decay = True
# S4D complex initialization
log_A_real = torch.log(0.5 * torch.ones(self.d_inner, self.d_state))
A_imag = math.pi * repeat(torch.arange(self.d_state), 'n -> h n', h=self.d_inner)
self.log_A_real = nn.Parameter(log_A_real)
self.log_A_real._no_weight_decay = True
self.A_imag = nn.Parameter(A_imag)
self.A_imag._no_weight_decay = True
# real-valued
# A = -torch.exp(self.A_log.float()) # (d_inner, d_state)
# complex-valued
A = -torch.exp(self.log_A_real) + 1j * self.A_imag
By doing this, the state space is computed in the complex-value domain instead of the real-value domain. However, the state-space model still perform the mapping from the real-sequence u to the real-value sequence y. The inputs of the Mamba still be real-valued. This paper may be helpful.
Hope this can help you.
from mamba.
@Tworan Did you encounter this error when modifying it like the above?
RuntimeError: B must have shape (batch_size, n_groups, dstate, !is_complex ? seqlen : seqlen * 2)
from mamba.
@hungdche Same problem here. There is a mismatch between the .cpp function interface and complex input. Have you solved the problem yet? Thanks!
from mamba.
@Lily-Le I have not. Unsure how to proceed
from mamba.
Related Issues (20)
- Reducing _chunk_scan_bwd_kernel computation HOT 4
- Exploding gradients if ngroups is higher than 1. HOT 3
- clarification on how to interpret kernel size for conv1d HOT 1
- Some questions about the shape of A,B,C,D HOT 5
- How can I avoid using causal-conv1d? HOT 11
- Is it okay to put S6(MAMBA) and gated MLP blocks like a transformer? (Also please open the discussion tab)
- Vanishing gradient problem with more layer HOT 1
- Results vary greatly across experiments
- Gradient explosion in Mamba2 training, norm and loss divergence HOT 3
- Optimizing the bwd pass of Mamba 2 HOT 3
- Question about d_state. HOT 1
- Understanding about the selective scan HOT 2
- ModuleNotFoundError: No module named 'mamba_ssm.ops.triton.ssd_combined
- ERROR: Failed building wheel for mamba-ssm
- Issue about the FLOPs of selective scan
- Chunked inference
- Datasets error
- OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like state-spaces is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
- Mamba.__init__() got an unexpected keyword argument 'layer_idx'
- How to get all hidden_states of selective_scan_cuda?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mamba.