prometheusxn / ladan Goto Github PK

View Code? Open in Web Editor NEW

64.0 64.0 13.0 2.86 MB

The source code of article "Distinguish Confusing Law Articles for Legal Judgment Prediction", ACL 2020

License: MIT License

Python 100.00%

ladan's People

Contributors

Stargazers

Watchers

Forkers

leatingflower acupofhotwater finesjtu sunleler tulpen realcatking hariwu1995 jsnk1 6666ev lshy687 raysun0729 jemmy655 simonys001

ladan's Issues

运行LADAN+MTL_large.py时出现很长时间没跑完一个epoch后OOM的问题

我在运行LADAN+MTL_larg.py时，在运行20小时后还没有跑出一个epoch的结果，而且还报了OOM。
我batch size已经改得很小了，想问问影响显存占用量的还会有什么其他因素吗？我不常用TensorFlow，遇到类似情况能有什么办法来快速debug吗？

我的报错信息大概是这样：

/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.
WARNING:tensorflow:From LADAN+MTL_large.py:187: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From LADAN+MTL_large.py:240: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/keras/backend.py:3794: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From LADAN+MTL_large.py:347: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

WARNING:tensorflow:From LADAN+MTL_large.py:349: arg_max (from tensorflow.python.ops.gen_math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.math.argmax` instead
WARNING:tensorflow:From LADAN+MTL_large.py:376: The name tf.losses.softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.softmax_cross_entropy instead.

WARNING:tensorflow:From LADAN+MTL_large.py:412: The name tf.add_to_collection is deprecated. Please use tf.compat.v1.add_to_collection instead.

WARNING:tensorflow:From LADAN+MTL_large.py:416: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

WARNING:tensorflow:From LADAN+MTL_large.py:420: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From LADAN+MTL_large.py:429: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

2022-12-08 16:40:23.383015: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2022-12-08 16:40:23.426005: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1800000000 Hz
2022-12-08 16:40:23.430214: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b5f54978e0 executing computations on platform Host. Devices:
2022-12-08 16:40:23.430288: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2022-12-08 16:40:23.436761: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2022-12-08 16:40:26.878260: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b5f53210d0 executing computations on platform CUDA. Devices:
2022-12-08 16:40:26.878370: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2022-12-08 16:40:26.880622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:3b:00.0
2022-12-08 16:40:26.881237: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2022-12-08 16:40:26.884967: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2022-12-08 16:40:26.887944: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2022-12-08 16:40:26.888472: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2022-12-08 16:40:26.891360: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2022-12-08 16:40:26.893620: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2022-12-08 16:40:26.898651: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2022-12-08 16:40:26.899852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2022-12-08 16:40:26.899903: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2022-12-08 16:40:26.901230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-12-08 16:40:26.901252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2022-12-08 16:40:26.901259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2022-12-08 16:40:26.902515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3777 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:3b:00.0, compute capability: 7.5)
2022-12-08 16:40:28.982252: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2022-12-08 16:41:14.202769: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2022-12-09 12:48:53.798908: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 172.85MiB (rounded to 181248000).  Current allocation summary follows.
2022-12-09 12:48:53.799598: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (256): 	Total Chunks: 87, Chunks in use: 86. 21.8KiB allocated for chunks. 21.5KiB in use in bin. 624B client-requested in use in bin.
2022-12-09 12:48:53.799637: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (512): 	Total Chunks: 21, Chunks in use: 20. 12.0KiB allocated for chunks. 11.2KiB in use in bin. 8.5KiB client-requested in use in bin.
2022-12-09 12:48:53.799654: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1024): 	Total Chunks: 81, Chunks in use: 80. 94.5KiB allocated for chunks. 93.2KiB in use in bin. 91.2KiB client-requested in use in bin.
2022-12-09 12:48:53.799669: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (2048): 	Total Chunks: 13, Chunks in use: 12. 29.0KiB allocated for chunks. 25.2KiB in use in bin. 21.0KiB client-requested in use in bin.
2022-12-09 12:48:53.799683: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (4096): 	Total Chunks: 9, Chunks in use: 8. 43.0KiB allocated for chunks. 38.0KiB in use in bin. 37.8KiB client-requested in use in bin.
2022-12-09 12:48:53.799699: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (8192): 	Total Chunks: 95, Chunks in use: 66. 1.32MiB allocated for chunks. 979.5KiB in use in bin. 978.0KiB client-requested in use in bin.
（中间的报错内容类似）
2022-12-09 12:48:53.799908: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (67108864): 	Total Chunks: 3, Chunks in use: 0. 259.28MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-09 12:48:53.799922: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (134217728): 	Total Chunks: 10, Chunks in use: 9. 1.59GiB allocated for chunks. 1.43GiB in use in bin. 1.43GiB client-requested in use in bin.
2022-12-09 12:48:53.799936: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (268435456): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2022-12-09 12:48:53.799953: I tensorflow/core/common_runtime/bfc_allocator.cc:780] Bin for 172.85MiB was 128.00MiB, Chunk State: 
2022-12-09 12:48:53.799981: I tensorflow/core/common_runtime/bfc_allocator.cc:786]   Size: 160.07MiB | Requested Size: 60.0KiB | in_use: 0 | bin_num: 19, prev:   Size: 172.85MiB | Requested Size: 172.85MiB | in_use: 1 | bin_num: -1
2022-12-09 12:48:53.799993: I tensorflow/core/common_runtime/bfc_allocator.cc:793] Next region of size 1780940800
2022-12-09 12:48:53.800006: I tensorflow/core/common_runtime/bfc_allocator.cc:800] Free  at 0x7fa1e4000000 next 5199 of size 24576000
2022-12-09 12:48:53.800018: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fa1e5770000 next 620 of size 906240
2022-12-09 12:48:53.800029: I tensorflow/core/common_runtime/bfc_allocator.cc:800] InUse at 0x7fa1e584d400 next 2630 of size 226560
（中间的报错内容类似）

2022-12-09 12:48:53.849287: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 1 Chunks of size 91136 totalling 89.0KiB
2022-12-09 12:48:53.849295: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 2 Chunks of size 91392 totalling 178.5KiB
（一大堆类似的报错内容）
2022-12-09 12:48:53.851402: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 2 Chunks of size 134217728 totalling 256.00MiB
2022-12-09 12:48:53.851410: I tensorflow/core/common_runtime/bfc_allocator.cc:812] 7 Chunks of size 181248000 totalling 1.18GiB
2022-12-09 12:48:53.851417: I tensorflow/core/common_runtime/bfc_allocator.cc:816] Sum Total of in-use chunks: 3.04GiB
2022-12-09 12:48:53.851426: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 3960930304 memory_limit_: 3960930304 available bytes: 0 curr_region_allocation_bytes_: 4294967296
2022-12-09 12:48:53.851444: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats: 
Limit:                  3960930304
InUse:                  3267110912
MaxInUse:               3686056448
NumAllocs:              2608654986
MaxAllocSize:            315457792

2022-12-09 12:48:53.852011: W tensorflow/core/common_runtime/bfc_allocator.cc:319] **__*_*_*********************************___********************************************************
2022-12-09 12:48:53.852070: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at transpose_op.cc:199 : Resource exhausted: OOM when allocating tensor with shape[118,256,15,100] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
程序开始运行时间：
2022-12-08 16:39:35.666083
Model loaded succeed
Model loaded succeed
Traceback (most recent call last):
  File "/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[118,256,15,100] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node gradients/mul_grad/Mul_1-1-TransposeNHWCToNCHW-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[Adam/update/_990]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[118,256,15,100] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node gradients/mul_grad/Mul_1-1-TransposeNHWCToNCHW-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "LADAN+MTL_large.py", line 490, in <module>
    loss_value, _, graph_chose_value= sess.run([loss_total, train_op, graph_chose_loss], feed_dict=feed_dict)
  File "/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/wanghuijuan/anaconda3/envs/envtf114/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[118,256,15,100] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node gradients/mul_grad/Mul_1-1-TransposeNHWCToNCHW-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[Adam/update/_990]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[118,256,15,100] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node gradients/mul_grad/Mul_1-1-TransposeNHWCToNCHW-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

我对原代码的修改应该不多。原GitHub项目中缺失的law_label2index_large.pkl文件，我是根据CAIL-big数据集预处理后得到的new_law.txt经类似如下的操作得到的：

import argparse
import pickle as pk

parser = argparse.ArgumentParser()
parser.add_argument('--law_file')
parser.add_argument('--output_file')
args = parser.parse_args()

k={}
with open(args.law_file) as f:
    l=f.readlines()
    for i in range(len(l)):
        item=l[i]
        k[item.strip()]=i

pk.dump(k,open(args.output_file,'wb'))

When will the dual anonymity period ends

Hello author,the address in README is not accessible.if possible, can you give me a copy of the word embedding (cail_thulac.npy) and the data set used (small/ and big/).
My email is [[email protected]], or if it's more convenient, a link to a network disk that stores these files.
Thanks.

file law.txt not found

when I run the tongji_3.py,
error:

FileNotFoundError: [Errno 2] No such file or directory: './law.txt'

how can I get the law.txt and the accu.txt?
they are different from new_law.txt and new_accu.txt?

法条预测任务

我想请问下，在这篇论文中，是把法条预测当成单标签多分类任务去做了吗？

数据预处理问题

import json
import thulac
import re

Cutter = thulac.thulac(seg_only = True)

flaw = open("./law.txt",'r')
totallaw = 0
law2num = {}
num2law = {}

for line in flaw.readlines():
law2num[line.strip()] = totallaw
num2law[totallaw] = line.strip()
totallaw += 1
print(totallaw)

flaw = open("accu.txt",'r',encoding='utf-8')
totalaccu = 0
accu2num = {}
num2accu= {}
for line in flaw.readlines():
accu2num[line.strip()] = totalaccu
num2accu[totalaccu] = line.strip()
totalaccu += 1
print(totalaccu)

file1 = open("data_train.json",'r',encoding='utf-8')
file2 = open("data_test.json",'r',encoding='utf-8')
file3 = open("data_valid.json",'r',encoding='utf-8')

strpass = '二审'
totalsample = 0
totlaw = [0] * totallaw
totaccu = [0] * totalaccu

for line in file1.readlines():
dic = json.loads(line)
if (strpass in dic["fact"] != -1 or
len(dic["meta"]["accusation"]) > 1 or len(dic["meta"]["relevant_articles"]) > 1):
pass
else:
templaw = str(dic["meta"]["relevant_articles"][0])
tempaccu = dic["meta"]["accusation"][0]
totlaw[law2num[templaw]] += 1
totaccu[accu2num[tempaccu]] += 1
totalsample += 1

for line in file3.readlines():
dic = json.loads(line)
if (strpass in dic["fact"] != -1 or
len(dic["meta"]["accusation"]) > 1 or len(dic["meta"]["relevant_articles"]) > 1):
pass
else:
templaw = str(dic["meta"]["relevant_articles"][0])
tempaccu = dic["meta"]["accusation"][0]
totlaw[law2num[templaw]] += 1
totaccu[accu2num[tempaccu]] += 1
totalsample += 1

print (totalsample)
clearlaw = 0
clearaccu = 0
clearlawlist = []
clearacculist = []
clearlaw2num = {}
clearaccu2num = {}

lawfile = open("./new_law.txt", "w")
accufile = open("./new_accu.txt", "w")

for i in range(totallaw):
if (totlaw[i] >= 100):
clearlawlist.append(i)
clearlaw2num[str(num2law[i])] = clearlaw
clearlaw += 1
lawfile.write(num2law[i] + '\n')
for i in range(totalaccu):
if (totaccu[i] >= 100):
clearacculist.append(i)
clearaccu2num[num2accu[i]] = clearaccu
clearaccu += 1
accufile.write(num2accu[i] + '\n')

print (clearlaw, clearaccu)
print(clearlaw2num)

file1.close()
file2.close()
file3.close()
你好这段对所有的数据统计出所有的罪名和法条的频率的代码，之后再进行过滤罪名和法条的代码，为什么没有对file2里面的数据进行统计？？

law_label2index_large.pkl not found

when I run LADAN+TopJudge_large.py , the error show that the file not found under law_processed, it is the same with the law_lable2index.pkl?

tongji_3.py运行结果

您好，我在CAIL_small上运行tongji_3.py的时候得到的train_cs.json数据，大小是101685,不是论文中写的101619，我不知道我哪里弄错了，请问是不是直接运行就可以得到您论文中的结果呀。谢谢

When will the dual anonymity period end?

u_fact = fact_graph_choose @ atten_tensor

Can you explain the u_fact = fact_graph_choose @ atten_tensor statement? I noticed that this statement appeared in many running files. But there was an error in this line when running. Thanks ~

The problem about the Number of Law Articles and Charges

Hi, I am interested in you work and have tried to reproduce the results on the CAIL-small dataset. But I encounter some problems when I perform the preprocessing on the dataset.

The first problem is about the file law.txt and the file accu.txt in the script tongji_3.py. I do not see those two files in you git repository LADAN, but I find two files named law.txt and accu.txt in the git repository https://github.com/china-ai-law-challenge/CAIL2018/tree/master/baseline . I have tried to run the script tongji_3.py with these two files on the CAIL-small dataset, but it reports an error that some accusations in the data_train.json/ data_test.json/ data_valid.json of CAIL-small are not included in the accu.txt. Meanwhile, I notice that the law number and the accusation number in the above law.txt and accu.txt are 183 and 202, which can not match the statistical info on both CAIL-small and CAIL-big (and I also think it should not match) in your paper Distinguish Confusing Law Articles for Legal Judgment Prediction. I really wanna figure out whether my method of generating these files is right or not?

Based on the first problem, I try to traverse all samples in all three files of CAIL-small and record all the accusations and law articles that appear at least once in my own accu.txt and law.txt. There are 202 accusations and 183 law articles in these two files respectively. I try to use these two files to run the script tongji_3.py, the error mentioned in the first problem is gone, and it generates two files new_law.txt and new_accu.txt by filtering out law articles and accusations that appear less than 100 times (according to the paper). But the number of my new_law.txt is 101 and the number of my new_accu.txt is 117, which are both a little bit less than the number of law articles 103 and the number of charges 119 in the paper. And I also find that the law number in new_law.txt (103) and the accusation number in new_accu.txt (119) from your repository can precisely match the info you demonstrate in your paper. So what is the right way to generate the the new_law.txt and the new_accu.txt?

The third problem is about generating the train_cs.json, the test_cs.json and the valid_cs.json. I use my version of accu.txt, law.txt, new_accu.txt and new_law.txt to produce *the three _cs.json files. But the sample number in all three produced *cs_json files is not equal to the number in the paper. Then I run the data/make_Legal_basis_data.py with the three *cs_json files to generate the final data that I need to train the model with. It shows that the by processing with the above script, the final number of training samples and testing samples are 96477 and 24832, which are both less than the training cases number 101619 and the testing cases number 26749 in the paper.

Really wish that my wordy description does not bother you too much, but I will be extremely grateful that you could point out the wrong steps in the data preprocessing that lead to the mismatch in the number of training samples, the number of testing samples, the number of the law articles and the number of the accusations. Thanks a lot.

The problem of the LADAN_Topjudge_small's result

I run this model for 16 epochs, but I can't get the accuracy which is written in the paper.
I used the original parameters.
Here are the Hyperparameters:
epsilon=1e-9
batch_size = 128
max_epoch = 16
sent_len_fact = 100
doc_len_fact = 15
doc_len_law = 10
sent_len_law = 100
learning_rate = 1e-3

lstm_size = 128
clr_fc1_size = 512
clr_fc2_size = 256
law_relation_threshold = 0.3
vec_size = 200
shuffle = True

n_law = 103
n_accu = 119
n_term = 12

Am I using the wrong parameters or is there something wrong somewhere?

a copy of the used data files.

Hi,
if possible, can you give me a copy of the word embedding (cail_thulac.npy) and the data set used (small/ and big/).
my email is [email protected], or if it's more convenient, a link to a network disk that stores these files.
Thanks.

I would like to ask a grammar question

This code: strpass in dic["fact"] != -1:
It seems that it wants to sift out the situation that 二审 is in the fact variable (which is a str object), but I wonder how could this work? I only know that in will return True or False, which if converts to int will become 1 or 0, I wonder how could -1 occur?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.