Giter VIP home page Giter VIP logo

uie_pytorch's People

Contributors

heiheiyoyo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

uie_pytorch's Issues

UIEPredictor 推理会报错

UIEPredictor 中无batch 填充逻辑。会导致报错:
File "/home/wangjiawei/baishen/UIE/uie_predictor.py", line 560, in _auto_joiner
for i in range(len(short_results[v])):
IndexError: list index out of range

covert.py 执行 报错

作者你好,在执行模型转换时出现以下问下,请问一下,这是什么原因:
目前transformers的版本是4.20.0
from transformers.utils import ModelOutput
ImportError: cannot import name 'ModelOutput'

KeyError occur when select ernie_3.0_base_zh to convert

run:
python convert.py -i ernie-3.0-base-zh --no_validate_output

got:

2023-03-01 01:34:35.798449: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
[2023-03-01 01:34:37,186] [ INFO] - Downloading resource files...
[2023-03-01 01:34:37,187] [ INFO] - Downloading ernie_3.0_base_zh.pdparams from https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_base_zh.pdparams
[2023-03-01 01:37:55,405] [ INFO] - Downloading ernie_3.0_base_zh_vocab.txt from https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_base_zh_vocab.txt
[2023-03-01 01:37:55,798] [ INFO] - ====================save config file====================
[2023-03-01 01:37:55,800] [ INFO] - ====================save vocab file====================
[2023-03-01 01:37:55,801] [ INFO] - ====================extract weights====================
Traceback (most recent call last):
File "convert.py", line 468, in
do_main()
File "convert.py", line 427, in do_main
extract_and_convert(args.input_model, args.output_model, verbose=True)
File "convert.py", line 297, in extract_and_convert
del paddle_paddle_params['StructuredToParameterName@@']
KeyError: 'StructuredToParameterName@@'

Bug in ErnieMConverter Class

Using -m-large version, but met a bug in class ErnieMConverter(Converter):

Traceback (most recent call last):
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/src/run.py", line 23, in <module>
    ie = UIEPredictor(model='uie-m-large', schema=schema, device="cuda" if torch.cuda.is_available() else "cpu")
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/uie_pytorch/uie_predictor.py", line 146, in __init__
    self._prepare_predictor()
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/uie_pytorch/uie_predictor.py", line 160, in _prepare_predictor
    self._tokenizer = ErnieMTokenizerFast.from_pretrained(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2017, in from_pretrained
    return cls._from_pretrained(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/uie_pytorch/tokenizer.py", line 477, in __init__
    super().__init__(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 114, in __init__
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 1342, in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/uie_pytorch/tokenizer.py", line 576, in __init__
    from transformers.utils import sentencepiece_model_pb2 as model_pb2
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/utils/sentencepiece_model_pb2.py", line 91, in <module>
    _descriptor.EnumValueDescriptor(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 789, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

pytorch模型转onnx报错

运行命令
python export_model.py --model_path ./ckpt/anno_data_0210_ckpt_10words_ent/model_best/ --output_path ./export时报错
image

uie-m-large model convert 的时候验证报错,ValueError: Outputs values doesn't match between reference model and Pytorch converted model: Got max absolute difference of: 4.9104968638857827e-05

[2022-11-17 11:44:05,198] [ INFO] - Validating PyTorch model...
[2022-11-17 11:44:26,931] [ INFO] - -[✓] Pytorch model output names match reference model ({'start_prob', 'end_prob'})
[2022-11-17 11:44:26,935] [ INFO] - - Validating PyTorch Model output "start_prob":
[2022-11-17 11:44:26,937] [ INFO] - -[✓] (2, 512) matches (2, 512)
[2022-11-17 11:44:26,956] [ INFO] - -[x] values not close enough (atol: 1e-05)
Traceback (most recent call last):
File "/Users/momo/Documents/code/uie_pytorch/convert.py", line 468, in
do_main()
File "/Users/momo/Documents/code/uie_pytorch/convert.py", line 452, in do_main
validate_model(tokenizer, model, paddle_model, model_type)
File "/Users/momo/Documents/code/uie_pytorch/convert.py", line 414, in validate_model
raise ValueError(
ValueError: Outputs values doesn't match between reference model and Pytorch converted model: Got max absolute difference of: 4.9104968638857827e-05

convert uie-m-base报错AttributeError: 'ErnieMTokenizer' object has no attribute 'vocab'

Traceback (most recent call last):
File "/root/autodl-tmp/uie_pytorch/uie_predictor.py", line 679, in
uie = UIEPredictor(model=args.model, task_path=args.task_path, schema_lang=args.schema_lang, schema=args.schema, engine=args.engine, device=args.device,
File "/root/autodl-tmp/uie_pytorch/uie_predictor.py", line 147, in init
self._prepare_predictor()
File "/root/autodl-tmp/uie_pytorch/uie_predictor.py", line 162, in _prepare_predictor
self._tokenizer = ErnieMTokenizerFast.from_pretrained(
File "/root/autodl-tmp/conda/envs/uie_torch_cpu/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2017, in from_pretrained
return cls._from_pretrained(
File "/root/autodl-tmp/conda/envs/uie_torch_cpu/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2049, in _from_pretrained
slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
File "/root/autodl-tmp/conda/envs/uie_torch_cpu/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/autodl-tmp/uie_pytorch/tokenizer.py", line 139, in init
super().init(
File "/root/autodl-tmp/conda/envs/uie_torch_cpu/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in init
self._add_tokens(
File "/root/autodl-tmp/conda/envs/uie_torch_cpu/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/root/autodl-tmp/uie_pytorch/tokenizer.py", line 185, in get_vocab
return dict(self.vocab, **self.added_tokens_encoder)
AttributeError: 'ErnieMTokenizer' object has no attribute 'vocab'

数据预处理格式 - 关系抽取和事件抽取

请问下 关系抽取和事件抽取的微调数据的格式是一样的吗?都如下图?
image

不同点是不是事件抽取的entities会有一个trigger word,然后relations里面全部都是这个trigger word为from_id,其余角色为to_id?
然后关系抽取的entities里面可能就没有trigger word, 然后relations里面就纯粹是不同角色的关系?

情感分类支持微调吗?

训练集:
{"content": "不错的上网本,外形很漂亮,操作系统应该是个很大的 卖点,电池还可以。整体上讲,作为一个上网本的定位,还是不错的。\t", "result_list": [{"text": "正向", "start": -7, "end": -5}], "prompt": "情感倾向[正向,负向]"}
{"content": "<荐书> 推荐所有喜欢<红楼>的红迷们一定要收藏这本书,要知道当年我听说这本书的时候花很长时间去图书馆找和借都没能如愿,所以这次一看到当当有,马上买了,红迷们也要记得备货哦!\t", "result_list": [{"text": "正向", "start": -4, "end": -2}], "prompt": "情感倾向[负向,正向]"}

用这个去微调情感分类会报错显示:
RequestsDependencyWarning)
Traceback (most recent call last):
File "finetune.py", line 253, in
do_train()
File "finetune.py", line 35, in do_train
tokenizer = BertTokenizerFast.from_pretrained(args.model)
File "/home/ma-user/anaconda3/envs/PyTorch-1.8/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1706, in from_pretrained
local_files_only=local_files_only,
File "/home/ma-user/anaconda3/envs/PyTorch-1.8/lib/python3.7/site-packages/transformers/utils/hub.py", line 711, in get_file_from_repo
use_auth_token=use_auth_token,
File "/home/ma-user/anaconda3/envs/PyTorch-1.8/lib/python3.7/site-packages/transformers/utils/hub.py", line 292, in cached_path
local_files_only=local_files_only,
File "/home/ma-user/anaconda3/envs/PyTorch-1.8/lib/python3.7/site-packages/transformers/utils/hub.py", line 563, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

docker部署出错 无法推理结果

系统:kylin v10 armV8 aarch64
镜像:FROM kumatea/pytorch

python3 /app/uie_pytorch/uie-backend-api.py

[2023-09-10 14:42:23,681] [ INFO] - >>> [PyTorchInferBackend] Creating Engine ...
[2023-09-10 14:42:39,516] [ INFO] - >>> [PyTorchInferBackend] Use CPU to inference ...
[2023-09-10 14:42:39,518] [ INFO] - >>> [PyTorchInferBackend] Engine Created ...
/usr/local/lib/python3.9/site-packages/transformers/modeling_utils.py:909: FutureWarning: The device argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(

调用无结果 502bad
POST http://127.0.0.1:888/
Error: socket hang up
Request Headers
Content-Type: application/json
User-Agent: PostmanRuntime/7.32.3
Accept: /
Postman-Token: 3f204252-7d5a-4732-8268-c60829276d57
Host: 127.0.0.1:888
Accept-Encoding: gzip, deflate, br
Connection: keep-alive

ernie_m 的finetuning 的数组越界错误

我尝试对ernie_m进行fineturing ,发现有数组越界错误,排除是因为这一段代码(在ernie_m.py中的270行)
image

导致对position_embedding取tensor发生越界

这个positon_ids += 2的作用是什么?要怎么改?

evaluate.py执行时报错

在utils.py680行,修改如下,可以修复这个bug:

def get_relation_type_dict(relation_data):
    def compare(a, b):
        a = a[::-1]
        b = b[::-1]
        res = ''
        for i in range(min(len(a), len(b))):
            if a[i] == b[i]:
                res += a[i]
            else:
                break
        if res == "":
            return res
        elif res[::-1][0] == "的":
            return res[::-1][1:]
        return ""
    relation_type_dict = {}
    added_list = []
    for i in range(len(relation_data)):
        added = False
        if relation_data[i][0] not in added_list:
            for j in range(i + 1, len(relation_data)):
                match = compare(relation_data[i][0], relation_data[j][0])
                if match != "":
                    match = unify_prompt_name(match)
                    if relation_data[i][0] not in added_list:
                        added_list.append(relation_data[i][0])
                        relation_type_dict.setdefault(match, []).append(
                            relation_data[i][1])
                    added_list.append(relation_data[j][0])
                    relation_type_dict.setdefault(match, []).append(
                        relation_data[j][1])
                    added = True
            if not added:
                added_list.append(relation_data[i][0])
                suffix = relation_data[i][0].rsplit("的", 1)[1]
                suffix = unify_prompt_name(suffix)
               #好像是只有一个对象时会遍历到这里执行,如果执行下面这句将把字典(而不是列表)赋给relation_type_dict
                relation_type_dict.setdefault(suffix, []).append(
                            relation_data[i][1])                          
                # relation_type_dict[suffix] = relation_data[i][1]
    return relation_type_dict

实体嵌套问题

楼主,请问这套UIE支持嵌套实体抽取吗?
我尝试了下uie_predictor,发现无法抽出嵌套实体?

uie_m_large_pytorch 问题

Some weights of UIE were not initialized from the model checkpoint at uie_m_large_pytorch and are newly initialized: ['encoder.embeddings.token_type_embeddings.weight']
加载uie_m_large_pytorch ,提示有部分权重无法加载

uie pytorch config 参数问题

在uie-base的config.json中,没有task_id的值,所以实际运行过程,task_type_embeddings没有生效?一直采用默认值0吗?

Parameter error

直接运行命令python convert.py有以下错误:
TypeError: forward() got an unexpected keyword argument 'pos_ids'
图片
请问这是什么原因造成的?

报错module 'paddle.fluid.dygraph' has no attribute 'load_dygraph',请问怎么解决

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ c:\Users\n\Desktop\uie_pytorch-main (1)\uie_predictor.py:680 in │
│ │
│ 677 │ args.schema = ['航母'] │
│ 678 │ args.schema_lang = "en" │
│ 679 │ uie = UIEPredictor(model=args.model, task_path=args.task_path, schema_lang=args.sche │
│ ❱ 680 │ │ │ │ │ position_prob=args.position_prob, max_seq_len=args.max_seq_len, b │
│ 681 │ print(uie("印媒所称的“印度第一艘国产航母”—“维克兰特”号")) │
│ 682 │
│ │
│ c:\Users\n\Desktop\uie_pytorch-main (1)\uie_predictor.py:147 in init
│ │
│ 144 │ │ self._is_en = True if model in ['uie-base-en' │
│ 145 │ │ │ │ │ │ │ │ │ │ ] or schema_lang == 'en' else False │
│ 146 │ │ self.set_schema(schema) │
│ ❱ 147 │ │ self._prepare_predictor() │
│ 148 │ │
│ 149 │ def _prepare_predictor(self): │
│ 150 │ │ assert self._engine in ['pytorch', │
│ │
│ c:\Users\n\Desktop\uie_pytorch-main (1)\uie_predictor.py:158 in _prepare_predictor │
│ │
│ 155 │ │ │ if not os.path.exists(self._task_path): │
│ 156 │ │ │ │ from convert import check_model, extract_and_convert │
│ 157 │ │ │ │ check_model(self._model) │
│ ❱ 158 │ │ │ │ extract_and_convert(self._model, self._task_path) │
│ 159 │ │ │
│ 160 │ │ if self._multilingual: │
│ 161 │ │ │ from tokenizer import ErnieMTokenizerFast │
│ │
│ c:\Users\n\Desktop\uie_pytorch-main (1)\convert.py:292 in extract_and_convert │
│ │
│ 289 │ │ import paddle.fluid.dygraph as D │
│ 290 │ │ from paddle import fluid │
│ 291 │ │ with fluid.dygraph.guard(): │
│ ❱ 292 │ │ │ paddle_paddle_params, _ = D.load_dygraph( │
│ 293 │ │ │ │ os.path.join(input_dir, 'model_state')) │
│ 294 │ else: │
│ 295 │ │ paddle_paddle_params = pickle.load( │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: module 'paddle.fluid.dygraph' has no attribute 'load_dygraph'

utils.py文件中900行处bug

列表循环的顺序
all_relation_examples = [
r
for relation_example in relation_examples
for r in relation_example
]

模型输出问题

想问下,我想像bert输出那样取出最后一层的隐藏状态和pooler_output值,代码这么写有无问题:

model= UIE.frompretrained(路径)
last_hidden_state = model(inputs*).hidden_states[-1]
pooler_output = torch.max(model(inputs*).hidden_states[-1])

另外,模型输出的start_prob,end_prob是什么?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.