Comments (4)
比较简单的方法可以这么写:先规则抽取只包括这些字符的字符串,然后统一转阿拉伯数字、或者是中文数字,最后再转数字。
from tookit-sihui.
测试了一下,有些没考虑到的,比如
print(ctn.compose_decimal('三百四十一点五万')) -->0
下面这两个,点的判断一个忽略,一个有用
print(ctn.compose_decimal(".6万")) --> 60000
print(ctn.compose_decimal(".6")) -->0.6
print(ctn.compose_decimal("一万四万")) -->>140000000.0
print(ctn.compose_decimal("一万四点三万")) -->出错
print(ctn.compose_decimal("14000万")) --> 0.0
print(ctn.compose_decimal("1400万")) -->0.0
"1400万" '三百四十一点五万' 应该是非常常用的。
本来我只是给文件名排序的,因为文件名里面有中文一、二、十一什么的,结果搜到了你这里,
要中英文混写,还要自然语言,非常复杂,进你坑里了:-)
from tookit-sihui.
不支持阿拉伯、中文混合的数字。只支持小数点后边是小数的情况,也不支持点前面有单位的情形。
from tookit-sihui.
我设想的规则是这样:
比如12345.567万
整数+点+小数+单位
整数部分:你的规则是可以0打头,一般不行。
小数部分:0~9,不能出现百千万
单位:万,亿,兆。 (也有财务的用到 百万 千万的,你这不支持)
只要出现 ‘点’,就当小数,不然整数。整数和小数不能同时没有。
三百四十一点五万 --> '341' , '.' , '5' , '万'
三千零七十八亿 也可以这么理解--> '三千零七十八' , '' , '', '亿'
我写的正则规则:
#########################################################
import re
#从一句话选出数字:
#用到 chinese_to_number里的 unit_dict_keys ,digit_dict, unit_step
digit_dict_keys = digit_dict.keys()
#不能忽略前后导0 ,不能 .1234 1234.
number_rule1 = '[{0}][{0}{1}](?:[.点][{0}]+(?:{2})){3}'.format(''.join(digit_dict_keys),''.join(unit_dict_keys),'|'.join(unit_step),'{0,1}')
print(re.findall(number_rule1,'在二0一一年,有四十二...人,花费一万3百4点五万四元,ww0.300千万,.4.40万45,44.万0.3万亿'))
#--> ['二0一一', '四十二', '一万3百4点五万', '四', '0.300', '4.40万', '45', '44', '0.3万亿']
#翻译成阿拉伯数字就靠你了:)。当然一般运用也足够了。
from tookit-sihui.
Related Issues (3)
- 一千五得到1005, 贰千伍输出0 HOT 2
- bug:贰拾叁万零 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tookit-sihui.