๐งโ๐ป I'm currently a Lead Deep Learning Engineer at Chattermill, previously Research Engineer at Ontocord.ai
๐ I also carry out Machine Learning Research for LAION (Stability AI) on the Ezra-1 UltraCluster, LUMI and JUWELS supercomputers; previously did work for BigScience and the BLOOM evaluation
๐ I previously did my Masters in Machine Learning & A.I at Imperial College London carrying out work in natural language generation
๐ Iโm an active contributor of machine learning libraries such as Hugging Face Transformers and Gem-benchmark
๐ฌ I sometimes give talks for the NLP study group, the most popular NLP community on meetup.com
๐ญ Iโm currently working on Mixture of experts and the open-source chat agent OpenAssistant
relative bias addition is row additions of permutations of a subset of bias vector ,
need to find way to get rid of for loop and code this in one. definitely parralizable.
really not clear from paper: 'computed as cosine similarity
with annealing between the encodings hx and
hy. It starts at 1 and ends atp
d, linearly increasing
over the first 10K training batches.'
implement BPE from scratch with unk tokens hashed (although may achieve worse results on downstream tasks) as # perhaps not as general as bpemb's 25000.model
need to check ive implemented label smoothing with how authors how they label smoothed their objective sampling as objective fn includes negative sampling.