This repository is the authors' implementation of the paper
MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition.
The repo implements a Multi-Orientation Sharing Experts framework that allows you to handle seen and unseen scene text written in various orientations.
Implementation has been tested on a Lenovo P360 Ultra with an RTX4060Ti GPU connected through a Thunderbolt dock (TH3P4G2).
Software: Manjaro Linux, Python 3.11
Some dependencies will need to be compiled, so you need to follow the guide below.
https://github.com/lancercat/make_envNG
the training samples include English and Chinese word crops taken from ART [28], RCTW [29], CTW [30], LSVT [31], and the Latin and Chinese sections of the MLT [32] dataset. Note that the samples that include characters other than the 3755 Tier1 Chinese characters, 26 English letters, and the 10 digits (3791 classes in total) are excluded from the training set to avoid label leaking. The testing set includes the Japanese subsets of the MLT dataset.
You can download them from the following links (you need to download both datasets)
All models are released to the following repo
- All Models(45G): https://www.kaggle.com/datasets/object300/moose-models-release
Our training logs are released with a visualizer.
-
Logs: https://www.kaggle.com/datasets/object300/log-moose-release
-
Log Visualizer: https://github.com/lancercat/minijinja
Note the log visualizer includes ssh related behavior,
as it is a part of a home-grown server management library,
used for autonomous monitoring, issuing commands to, and collecting results from a server fleet.
The Results and a detailed manual is included in the manul.pdf file.
(The filename is an intended pun, manuls, aka Pallas cats, are the oldest cats)
The methods are configured and can be launched from the neko_2023_NGNW/project_moose/[methodname] directory.
For training, launch neko_2023_NGNW/project_moose/[methodname]/train_osr_hv_cat_wandb.py
For testing, launch neko_2023_NGNW/project_moose/[methodname]/eval_osr_hv.py
For more details, please consult the manual.
This section lists the correspondences between names in the paper and codenames in the repo
-
Horizontal
-
- 1 horizontal
-
- hori_sharebbn_s_05_ld_long
-
Horizontal-MoE
-
- 2 horizontal
-
- hori_sharebbn_62_ld_long
-
Single-Horizontal
-
- 1 horizontal 1 vertical
-
- sharebbn_05_ld_long
-
Rotated
-
- 2 horizontal, but rotates vertical samples and pipe them to horizontal experts
-
- sharebbn_62RS_ld_long
-
Share All
-
- 2 horizontal 1 vertival
-
- shareall_62_ld_long
-
Share None
-
- 2 horizontal 1 vertival
-
- sharenone_62_ld_long
-
MOoSE
-
- 2 horizontal 1 vertival
-
- sharebbn_62_ld_long
-
MOoSE-XL
-
- 2 horizontal 1 vertival
-
- yukon_sharebbn_62_ld_long_XL
- Chang Liu [email protected], [email protected]
- Simon Corbillé [email protected]
- Elisa H. Barney Smith [email protected]