"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang.
Hi, thanks for your work and posting it! I have some following questions about your work and look forward to your replies.
For the model, does it need to re-adapt the coefficients of the head every time it generates a sentence?
According to my understanding of the paper, when assigning the coefficient of the header, the query of the last id and the entire sentence are used as k. But when I look at the code, it seems that it is the first query and k when generating, so the calculated attention is all 1 (the softmax range is only 1), and the coefficients of the head are linearly allocated in order.
1.What device did you use to inference on ZeroScrolls dataset.
2.What is the input prompt length do you use when inferencing which is not mentioned in your paper ?