Light

situgong33 / wordextract- Goto Github PK

View Code? Open in Web Editor NEW

0.0 2.0 1.0 30.04 MB

单词提取

Java 100.00%

wordextract-'s Introduction

单词提取软件

从任意文本格式的文件中读取数据，去除非单词的数据，只保留单词，提取出来讲结果存储到exportNotHandledWordDir中文件为txt 格式。将产生的txt上传到知米背单词的app上，进行背单词即可。

程序说明

正则去除非英文的字符
采用的正则表达式为 [^a-zA-Z ]
提取文本中的生词
知米背单词导入提取出来的生词
导出的格式每个单词一行符合知米背单词要求的导入格式

目录结构


├── exportNotHandledWordDir
├── handledWordDir
├── notHandledWordDir

exportNotHandledWordDir
导出的结果文件
- UTF-8 格式的文本文件
- 每个单词一行
- 换行符为 \n
handledWordDir
已经掌握的单词，任意格式的文本，需要保证单词与单词之间以空格作为间隔即可
- UTF-8 格式的文本文件
- 空格作为单词与单词之间的分割符
- 可任意行
- 分割符 whitespace
notHandledWordDir
需要分析提取的文本，格式要求如下
- utf-8 格式的文本文件
- 单词与单词之间空格作为分隔符
- 分割符 whitespace
其他说明
- 输出结果会自动去重

wordextract-'s People

Contributors

Watchers

Forkers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.