Giter VIP home page Giter VIP logo

Comments (4)

yongzhuo avatar yongzhuo commented on May 26, 2024

比较简单的方法可以这么写:先规则抽取只包括这些字符的字符串,然后统一转阿拉伯数字、或者是中文数字,最后再转数字。

from tookit-sihui.

aeo000000 avatar aeo000000 commented on May 26, 2024

测试了一下,有些没考虑到的,比如
print(ctn.compose_decimal('三百四十一点五万')) -->0

下面这两个,点的判断一个忽略,一个有用
print(ctn.compose_decimal(".6万")) --> 60000
print(ctn.compose_decimal(".6")) -->0.6

print(ctn.compose_decimal("一万四万")) -->>140000000.0
print(ctn.compose_decimal("一万四点三万")) -->出错
print(ctn.compose_decimal("14000万")) --> 0.0
print(ctn.compose_decimal("1400万")) -->0.0

"1400万" '三百四十一点五万' 应该是非常常用的。

本来我只是给文件名排序的,因为文件名里面有中文一、二、十一什么的,结果搜到了你这里,
要中英文混写,还要自然语言,非常复杂,进你坑里了:-)

from tookit-sihui.

yongzhuo avatar yongzhuo commented on May 26, 2024

不支持阿拉伯、中文混合的数字。只支持小数点后边是小数的情况,也不支持点前面有单位的情形。

from tookit-sihui.

aeo000000 avatar aeo000000 commented on May 26, 2024

我设想的规则是这样:
比如12345.567万

整数+点+小数+单位
整数部分:你的规则是可以0打头,一般不行。
小数部分:0~9,不能出现百千万
单位:万,亿,兆。 (也有财务的用到 百万 千万的,你这不支持)

只要出现 ‘点’,就当小数,不然整数。整数和小数不能同时没有。

三百四十一点五万 --> '341' , '.' , '5' , '万'
三千零七十八亿 也可以这么理解--> '三千零七十八' , '' , '', '亿'

我写的正则规则
#########################################################
import re
#从一句话选出数字:
#用到 chinese_to_number里的 unit_dict_keys ,digit_dict, unit_step
digit_dict_keys = digit_dict.keys()

#不能忽略前后导0 ,不能 .1234 1234.
number_rule1 = '[{0}][{0}{1}](?:[.点][{0}]+(?:{2})){3}'.format(''.join(digit_dict_keys),''.join(unit_dict_keys),'|'.join(unit_step),'{0,1}')

print(re.findall(number_rule1,'在二0一一年,有四十二...人,花费一万3百4点五万四元,ww0.300千万,.4.40万45,44.万0.3万亿'))
#--> ['二0一一', '四十二', '一万3百4点五万', '四', '0.300', '4.40万', '45', '44', '0.3万亿']

#翻译成阿拉伯数字就靠你了:)。当然一般运用也足够了。

from tookit-sihui.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.