Comments (10)
We discussed and here are to possible solutions we figured out:
torch.Tensors
,np.ndarray
and everything built upon them doomed to be broken (e.g.pandas
objects), the first ones will raiseRuntimeError
, the second -ValueError
. We can suppress this exceptions in method likeobj_in_list
and wait for feedback if they are not sufficient.- replace
fetch_unique_objects
withemit_objects
like discussed earlier in this thread
from quaterion.
Unfortunately, we can't just calculate hash here, because objects can be not hashable, so we need to pass key_extractor_fn
somehow.
Also we can have several key_extractor_fn
because we can have several encoders, thus we need to apply each of them or apply one of them, which could be arguable. And we do the same thing in CacheDataLoader
.
My proposal is to replace fetch_unique_objects
with just emit_objects
with approximately such implementations:
For PairSimilarityDataLoader
:
@classmethod
def emit_object(cls, batch: List[SimilarityPairSample]) -> Any:
for sample in batch:
yield sample.obj_a
yield sample.obj_b
For GroupSimilarityDataLoader
:
@classmethod
def emit_object(cls, batch: List[SimilarityPairSample]) -> Any:
for sample in batch:
yield sample.obj
from quaterion.
replace
fetch_unique_objects
with justemit_objects
In this case, we will lose the whole functionality to prevent multiple calculation, won't we?
we can't just calculate hash here,
Ok, so what about this one:
def obj_in_list(obj, obj_list):
try:
return obj in obj_list
except: # we caught the reported exception here, so assume obj is not in list to add it anyway
return False
now we can use it like if not obj_in_list(sample.obj, unique_objects):
.
This will work as usual unless we hit the reported bug, but it will still safely process that object if we do. WDYT?
from quaterion.
In this case, we will lose the whole functionality to prevent multiple calculation, won't we?
Actually only part of it. Now the flow is like:
- fetch unique objects in batch
- for every unique object calculate its key via key extractors
- if calculated key has not been in current dataloader, calculate its embeddings, otherwise do nothing
So actually fetch_unique_objects
prevents us from repeated key calculation, which I guess not that crucial
from quaterion.
Ok, so what about this one:
It can be a solution, need to look more thoroughly into this
from quaterion.
And here's the minimal code to reproduce this bug:
import numpy as np
import torch
l = []
t1 = torch.from_numpy(np.array([1, 2, 3])) # remove `torch.from_numpy()` for the numpy version
t2 = torch.from_numpy(np.array([1, 2, 2]))
ts = [t1, t2]
for t in ts:
if t not in l:
l.append(t)
print("everything fine")
from quaterion.
Also another note on strange behaviors of tensors we figured out: there is no hash collision even for two tensors with the same values because Tensor.__hash__
hashes by id(tensor)
.
import numpy as np
import torch
# create two tensors with the same values
t1 = torch.from_numpy(np.array([1, 2, 3]))
t2 = torch.from_numpy(np.array([1, 2, 3]))
d = {hash(t1): "some value"}
print(hash(t2) in d) # this is False to our surprise
d = {t1: "some value"}
print(t2 in d) # this is also False
# only this one is True
print(t1 in d)
from quaterion.
what could help in this discussion for sure - tests with examples for reproduction
from quaterion.
The reason of exception is in the way in
operator works. It compares new object with those already in
collection. It checks If another couple of objects are the same object (obj_a is obj_b
) or check if they are equal via ==
(that's the place where exception occurs, tensors and similar objects don't support this way of comparison).
If instead of raw tensor we will pass dict
like
d = {
"value": torch.Tensor(...)
"path_to_image": "source/path/to/image.png",
}
Then if value
being compared first - it results in the same exception again.
So we can't fetch unique objects from batch only with wrapping it in dict
, maybe we need some special class like the following to handle such cases.
class ComparableClass:
def __init__(self, comparison_feature, value):
self.comparison_feature = ...
self.value = torch.tensor(...)
def __eq__(self, other):
return self.comparison_feature == other.comparison_feature
# we can provide default hash implementation here as well
The alternative for this could be rejecting the idea of fetching unique objects from batch.
In this case we can handle complicated objects via dict
and custom key_extractor_fn
and successfully use cache.
The drawback is that we need to extract key from each object from batch for each encoder, but I don't think that it is that crucial.
from quaterion.
Fixed in #34
from quaterion.
Related Issues (20)
- [tutorial] triplet-loss collapse prevention
- Badges in readme
- [BUG] ImportError: cannot import name 'PossibleUserWarning' from 'pytorch_lightning.utilities.warnings' HOT 2
- [bug] broken links in documentation
- [docs] Add README.md to `examples` HOT 2
- Implement cross-batch memory for losses HOT 2
- PyTorch Lightning's Stochastic Weight Averging callback causes infinite recursion HOT 4
- Add Quaterion to conda channel
- what is the use of freezing the encoder ? HOT 2
- Add support for soft margin variant of triplet loss
- Inplace computation error with XBM HOT 10
- Reduce amount of memory required by mining strategies HOT 2
- Implement multi-objective loss HOT 2
- Improve evaluation procedure for extensive results HOT 1
- Training error using multiple GPUs
- How to load encoder HOT 3
- error while serving the model HOT 5
- No Python 3.9/3.10 support of quaternion and quaternion-models HOT 2
- Feature: FastAP Loss Function HOT 2
- Improvement: Update internal `torch.norm` calls to `torch.linalg.norm` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from quaterion.