Giter VIP home page Giter VIP logo

dutch-llms's People

Contributors

robinsmits avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dutch-llms's Issues

Incompatible Matric Multiplication

@RobinSmits I tried running the model polylm_13b_ft_alpaca_clean_dutch with same sample data, but getting the error related to incompatible matrix multiplication.
I want to check the model performance for dutch langauge. What changes would you suggest me?

`---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[21], line 39
37 for item in val_data:
38 print(f'\n\n=== Voorbeeld: {counter} ======================================================================================')
---> 39 generate(item['instruction'], item['input'])
41 counter += 1
42 if counter > 5:

Cell In[21], line 11, in generate(instruction, input)
8 attention_masks = inputs.attention_mask.cuda()
10 # Generate output
---> 11 outputs = model.generate(input_ids = input_ids,
12 attention_mask = attention_masks,
13 max_new_tokens = 128,
14 do_sample = True,
15 top_p = 0.85,
16 top_k = 50,
17 temperature = 0.5,
18 repetition_penalty = 1.2,
19 length_penalty = -1.0,
20 num_return_sequences = 1,
21 pad_token_id = tokenizer.eos_token_id,
22 forced_eos_token_id = tokenizer.eos_token_id)
24 # Decode output
25 generated_output = tokenizer.decode(outputs[0], skip_special_tokens = True)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/peft/peft_model.py:975, in PeftModelForCausalLM.generate(self, **kwargs)
973 self.base_model.generation_config = self.generation_config
974 try:
--> 975 outputs = self.base_model.generate(**kwargs)
976 except:
977 self.base_model.prepare_inputs_for_generation = self.base_model_prepare_inputs_for_generation

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/transformers/generation/utils.py:1648, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
1640 input_ids, model_kwargs = self._expand_inputs_for_generation(
1641 input_ids=input_ids,
1642 expand_size=generation_config.num_return_sequences,
1643 is_encoder_decoder=self.config.is_encoder_decoder,
1644 **model_kwargs,
1645 )
1647 # 13. run sample
-> 1648 return self.sample(
1649 input_ids,
1650 logits_processor=logits_processor,
1651 logits_warper=logits_warper,
1652 stopping_criteria=stopping_criteria,
1653 pad_token_id=generation_config.pad_token_id,
1654 eos_token_id=generation_config.eos_token_id,
1655 output_scores=generation_config.output_scores,
1656 return_dict_in_generate=generation_config.return_dict_in_generate,
1657 synced_gpus=synced_gpus,
1658 streamer=streamer,
1659 **model_kwargs,
1660 )
1662 elif generation_mode == GenerationMode.BEAM_SEARCH:
1663 # 11. prepare beam search scorer
1664 beam_scorer = BeamSearchScorer(
1665 batch_size=batch_size,
1666 num_beams=generation_config.num_beams,
(...)
1671 max_length=generation_config.max_length,
1672 )

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/transformers/generation/utils.py:2730, in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
2727 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
2729 # forward pass to get next token
-> 2730 outputs = self(
2731 **model_inputs,
2732 return_dict=True,
2733 output_attentions=output_attentions,
2734 output_hidden_states=output_hidden_states,
2735 )
2737 if synced_gpus and this_peer_finished:
2738 continue # don't waste resources running the code we don't need

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module..new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py:1076, in GPT2LMHeadModel.forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, labels, use_cache, output_attentions, output_hidden_states, return_dict)
1068 r"""
1069 labels (torch.LongTensor of shape (batch_size, sequence_length), optional):
1070 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set
1071 labels = input_ids Indices are selected in [-100, 0, ..., config.vocab_size] All labels set to -100
1072 are ignored (masked), the loss is only computed for labels in [0, ..., config.vocab_size]
1073 """
1074 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-> 1076 transformer_outputs = self.transformer(
1077 input_ids,
1078 past_key_values=past_key_values,
1079 attention_mask=attention_mask,
1080 token_type_ids=token_type_ids,
1081 position_ids=position_ids,
1082 head_mask=head_mask,
1083 inputs_embeds=inputs_embeds,
1084 encoder_hidden_states=encoder_hidden_states,
1085 encoder_attention_mask=encoder_attention_mask,
1086 use_cache=use_cache,
1087 output_attentions=output_attentions,
1088 output_hidden_states=output_hidden_states,
1089 return_dict=return_dict,
1090 )
1091 hidden_states = transformer_outputs[0]
1093 # Set device for model parallelism

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py:900, in GPT2Model.forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions, output_hidden_states, return_dict)
890 outputs = torch.utils.checkpoint.checkpoint(
891 create_custom_forward(block),
892 hidden_states,
(...)
897 encoder_attention_mask,
898 )
899 else:
--> 900 outputs = block(
901 hidden_states,
902 layer_past=layer_past,
903 attention_mask=attention_mask,
904 head_mask=head_mask[i],
905 encoder_hidden_states=encoder_hidden_states,
906 encoder_attention_mask=encoder_attention_mask,
907 use_cache=use_cache,
908 output_attentions=output_attentions,
909 )
911 hidden_states = outputs[0]
912 if use_cache is True:

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module..new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py:390, in GPT2Block.forward(self, hidden_states, layer_past, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions)
388 residual = hidden_states
389 hidden_states = self.ln_1(hidden_states)
--> 390 attn_outputs = self.attn(
391 hidden_states,
392 layer_past=layer_past,
393 attention_mask=attention_mask,
394 head_mask=head_mask,
395 use_cache=use_cache,
396 output_attentions=output_attentions,
397 )
398 attn_output = attn_outputs[0] # output_attn: a, present, (attentions)
399 outputs = attn_outputs[1:]

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module..new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py:312, in GPT2Attention.forward(self, hidden_states, layer_past, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions)
310 attention_mask = encoder_attention_mask
311 else:
--> 312 query, key, value = self.c_attn(hidden_states).split(self.split_size, dim=2)
314 query = self._split_heads(query, self.num_heads, self.head_dim)
315 key = self._split_heads(key, self.num_heads, self.head_dim)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/peft/tuners/lora.py:1208, in Linear4bit.forward(self, x)
1207 def forward(self, x: torch.Tensor):
-> 1208 result = super().forward(x)
1210 if self.disable_adapters or self.active_adapter not in self.lora_A.keys():
1211 return result

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/bitsandbytes/nn/modules.py:248, in Linear4bit.forward(self, x)
245 x = x.to(self.compute_dtype)
247 bias = None if self.bias is None else self.bias.to(self.compute_dtype)
--> 248 out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state)
250 out = out.to(inp_dtype)
252 return out

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py:579, in matmul_4bit(A, B, quant_state, out, bias)
577 return out
578 else:
--> 579 return MatMul4Bit.apply(A, B, out, bias, quant_state)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/autograd/function.py:506, in Function.apply(cls, *args, **kwargs)
503 if not torch._C._are_functorch_transforms_active():
504 # See NOTE: [functorch vjp and autograd interaction]
505 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506 return super().apply(*args, **kwargs) # type: ignore[misc]
508 if cls.setup_context == _SingleLevelFunction.setup_context:
509 raise RuntimeError(
510 'In order to use an autograd.Function with functorch transforms '
511 '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
512 'staticmethod. For more details, please see '
513 'https://pytorch.org/docs/master/notes/extending.func.html')

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py:516, in MatMul4Bit.forward(ctx, A, B, out, bias, state)
511 return torch.empty(A.shape[:-1] + B_shape[:1], dtype=A.dtype, device=A.device)
514 # 1. Dequantize
515 # 2. MatmulnN
--> 516 output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias)
518 # 3. Save state
519 ctx.state = state

RuntimeError: mat1 and mat2 shapes cannot be multiplied (31x5120 and 15360x5120)```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.