Comments (6)
Thanks for your interest. FCM masking can be enabled by setting fcm_min_ratio=0 and fcm_max_ratio=0.15. It is recommended for large models.
from chain-of-hindsight.
Ah, that's where it was. Thanks for the reply! Also, in the paper, only ascending sorting was used, but in the code, it looks like mix of ascending order and descending order is implemented.
Did I understand it correctly??
from chain-of-hindsight.
Also, if I understood it correctly, it seems like that the code concatenate all the data and slice it by the chunk_size.
https://github.com/lhao499/CoH/blob/3949417c638834d9213f6395196db1a802e61d8c/coh/data/hf_data.py#L54
If there is a data like [I, am, a, boy, is, worse, than, I, am, a, girl] and the chunk_size is 6, then the data will be sliced to [I, am, a, boy, is, worse], and [than, I, am, a, girl].
What this means is that, "I am a girl" part is only conditioned to "than".
Did I understand the code correctly??
from chain-of-hindsight.
For the first question, yes both ascending order and descending order are used to increase diversity, we will make this more clear.
For the second question, splitting by chunk_size has better compute efficiency compared with splitting by sentence and adding padding masks. Using padding may have better results, but we did not try it.
from chain-of-hindsight.
Thank you for the reply! So, to get the performance reported on the paper, did you use both ascending order and descending order?
from chain-of-hindsight.
I apologize for the lateness. Yes, both ascending order and descending order were used.
In our more recent Koala chatbot project, we also compared and found using both options worked better.
from chain-of-hindsight.
Related Issues (16)
- Error in README HOT 1
- [IMPORTANT] Unexpected behavior during data pre-processing of Anthropic/hh-rlhf HOT 1
- question for how to classify which of a dialogue pair is more preferred. HOT 2
- Missing license HOT 1
- Missing ShardingHelper class when using opt-350m to evaluate HOT 1
- Generated dataset
- Reproduce CoH on alpaca HOT 1
- generated data HOT 2
- Error while converting checkpoints to Flax format HOT 3
- GPU requirements results in error HOT 2
- About the used evaluation set HOT 3
- Usage of the 'masks' in hf_data HOT 4
- About CoH evaluation HOT 1
- Finetuned weights HOT 2
- pt and hf dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chain-of-hindsight.