Giter VIP home page Giter VIP logo

larqs's Introduction

LARQS (Legal Analogical Reasoning Questions Set)

An Evaluation Dataset for Chinese Codex Word Embedding Model

Word embedding is a modern distributed word representations approach widely used in many natural language processing tasks. Converting the vocabulary in a legal document into a word embedding model facilitates subjecting legal documents to machine learning, deep learning, and other algorithms and subsequently performing the downstream tasks of natural language processing, for instance, document classification, contract review, and machine translation. The most common and practical approach of accuracy evaluation with the word embedding model uses a benchmark set with linguistic rules or the relationship between words to perform analogy reasoning via algebraic calculation. This paper proposes establishing a 1,256 Legal Analogical Reasoning Questions Set (LARQS) from the 2,388 Chinese Codex corpus using five kinds of legal relations, which are then used to evaluate the accuracy of the Chinese word embedding model. Moreover, we discovered that legal relations might be ubiquitous in the word embedding model. The full paper is here (https://aircconline.com/abstract/ijnlc/v11n3/11322ijnlc01.html)

將文件中的單詞以向量形式編碼的詞嵌入模型,是當前最常見的無監督式單詞編碼方法。將法律詞彙轉換為緊湊的詞嵌入模型,就能更便利的使用機器學習、深度學習等與自然語言處理相關的演算法,將可更進一步從事法律文件的文檔分類,合約審查和機器翻譯等自然語言處理的下游應用任務。評估詞嵌入模型精確度常見且有效的方式,是建立一組具有語言規律或是詞彙關係的評估資料集,將此評估資料內的詞彙關係題組,以代數計算方式對詞嵌入模型進行類比推理。本文嘗試從2,388部中文法典語料庫中,以5種法律上的關係,建立了1,256個法律上的類比推理問題集(Legal Analogical Reasoning Questions Set, LARQS),用以評估中文詞向量模型準確度。在本文的實驗中,同時與中研院詞庫小組翻譯自Google釋出的評估資料集、自CA8的簡體中文轉換成繁體中文的評估資料集等進行比較,在相同的詞嵌入模型下,以本文的中文法典資料集所建成的評估資料集所得到的準確度,最佳狀況下可以達到約67%的準確度,並發現詞彙間的法律關係是相對比較普遍存在於詞嵌入模型中的。 論文可以在此下載 (https://aircconline.com/abstract/ijnlc/v11n3/11322ijnlc01.html)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.