Giter VIP home page Giter VIP logo

citation-utils's People

Contributors

justmars avatar

Watchers

 avatar

citation-utils's Issues

Clean modern dockets with slashes from corpus-extractor data

See itemized test cases for new way of citing dockets and clean the same for purposes of uniformity. It is only when the docket citations are uniform that we will be able to merge the pdf citations with the e-library citations. In addition, see (1) OCA IPI, (2) PET situations.

Screenshot 2023-02-23 at 4 25 01 PM

Bad source content

Will need to reconstruct the regex strings at some point since this source string results in a function that hangs:

text = "Civil Cases Nos. 4247-L, 2395-L, 2367-L, 2812-L, 4160-L, 4550-L, 4470-L, 4475-L, 4442-L, 4362-L, 4377-L, 4394-L, 2581-L, 268-L, 2799-L, 4641-L, 2995-L, 3025-L, 3031-L, 3090-L, 3042-L, 2520-L, 4669-L, 4649-L, 4693-L, 4654-L, 4605-L, 4602-L, 2507-L, ASA-VII-05-1147, 2177-L, 2396-L, 4037-L, 2487-L, 2260-L, 2059-L, 2845-L, 4299-L, 4078-L, 4054-L, 4604-L, 4522-L, 2589-L, CAD Case No. 17, LRC Rec. No. 946, Lot Nos. 1741 and 1810, CAD Case No. 20, LRC Rec. No. 1004, Lot Nos. 4563, 4565-66, 4591, 4592, 4595, 4597-98, 4600-602 and 4604-06, CAD Case No. 20 LRC Rec. No.1004, Lot No. 4354, CAD Case No. 20 LRC Rec. No.1004, Lot No. 4998, CAD Case No. 20 LRC Rec. No.1008, Lot No. 3019, CAD Case No. 21 LRC Rec. No.1008, Lot No. 5545, CAD Case No. 21 LRC Rec. No.1008, Lot No. 5252, CAD Case No. 22 LRC Rec. No.1018, Lot No. 6233, CAD Case No. 22 LRC Rec. No.1081, Lot No. 6391, CAD Case No. 22 LRC Rec. No.1018, Lot No. 6025, 6073, 6060 and 6132, CAD Case No. 22 LRC Rec. No.1018, Lot Nos. 6669, 6671 and 6672, CAD Case No. 22 LRC Rec. No.1018, Lot No. 6354, CAD Case No. 19 LRC Rec. No.1003, Lot Nos. 2036 and 2042, CAD Case No. 19, LRC Rec. No.1003, Lot No. 2843, CAD Case No. 19, LRC Rec. No.1003, Lot No. 2948,Lot no. 5350, Lot No. 5511, Lot No. 5138, CAD Case No. 21, LRC Rec. No.1008, Lot No. 5619, CAD Case No. 19, LRC Rec. No.1003, Lot No. 2044, CAD Case No. 15 , Lot No. 3757, Lot No. 3389, Lot Nos. 3430, 3023 and 3176, CAD Case No. 2, LRC Rec. No.1018, Lot Nos. 6433, 6482 and 6514, CAD Case No. 17 LRC Rec. No.946, Lot No. 1793, and Lot No. 3013."
list(CountedCitation.from_source(subtext2))

Serial num cleaning

Expansive v. narrow scope of serial numbers.

Long citations are required in decisions. Shorter citations are required for databases. It's necessary however to map the long citations to the short version, e.g.

long short (stored in db)
gr 111,112,113,114 and 115, sept. 17, 2011 gr 111, 2011-09-17

Characteristics of long citations

  1. Not consistent, e.g. may use GR, G.R., may use abbreviations like 111-15
  2. May contain asterisks, spaces

Characteristics of short citations

  1. Should always map to long ones, whatever their style
  2. Should be limited to alpha numeric characters, maybe a dash. No spaces

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.