Giter VIP home page Giter VIP logo

product1m's People

Contributors

zhanxlin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

product1m's Issues

How to use the model to retrieval when I input image with multi products?

您的论文非常有价值,但是在阅读时,我对实现的细节产生了诸多疑惑,希望您可以帮忙解答。
第一,您是使用RPN得到的proposal作为输入图像embedding,但是这就带来了一个问题,您通过卡阈值使用top10至top36的样本作为输入,并不经过NMS,这就使得必然存在重叠度极高的proposal区域,将其中一部分mask掉,真的能够复原吗?或者举个例子,文本中描述有2件商品,您的RPN最终生成了10个感兴趣区域,那现在直接mask掉其中一个,请问如何恢复这个区域呢?还是说我的理解有偏差,请指正。
第二,您在论文4.5节的描述我没读懂,您一边说用Co-Transformer的最终输出的图文token相乘作为某一件商品的表征,一边又说使用做对比学习的text/image transformer的输出concat起来作为检索用的特征,到底是怎样的呢?
第三,接上一个疑问,如果是concat起来,那么问题来了,对于有多件商品的输入,明明应该每一件商品都有一个表征,那[CLS]token concat起来不就是全图的表征了吗?
还请您指教。

RPN 模型能开源吗

你好 大佬 能方便提供一下这个模型的开源版本吗? 方便大家复现和学习 谢谢了 !

About performance gap between the CLIP* model and our reimplementation

Many thanks for your code. I implement a CLIP-like architecture with vit-base-patch16 and bert-base-uncased, the input size of the image is 224. The model is optimized by only contrastive loss. I adopt the whole image as input just like ViT do ( I suppose you adopt region features as input). According to your released evaluation code, I got a much higher performance(mAP@10=87.7) than CLIP* in table 2.
Except for region input, is there any difference between our implementation and paper such as initialization, and model structure? Would you please offer us more details of it?

What's more, would you release the model weights of CAPTURE especially the weights of your costumed RPN for reimplementation?

About the pretrained RPN for MultiProduct Detection

Thanks for sharing your wonderful work!

In the paper, you mentioned that you trained a new RPN for Multi-Product Detection. But I could not find the pretrained checkpoint of this RPN module in the github page. Will this RPN be open source?

Thanks agian, have a nice day~!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.