ganji15 / higanplus Goto Github PK

Python 100.00%

higanplus's Introduction

HiGAN+

Introduction

This is a PyTorch implementation of the paper "HiGAN+: Handwriting Imitation GAN with Disentangled Representations" (authored by Ji Gan, Weiwiang Wang*, Jiaxu Leng, Xinbo Gao*. )

HiGAN+ can generate diverse and realistic handwritten text images (with 64-pixel height) conditioned on arbitrary textual contents and calligraphic styles.

Overview of HiGAN+

Installation & requirements

The current version of the code has been tested with the following environment:

Ubuntu 20 or 22
Python 3
PyTorch 1.11.0

To use the code, download the repository and change into it:

git clone https://github.com/ganji15/HiGANplus.git

cd HiGAN+

You need to applicant the IAM dataset from http://www.fki.inf.unibe.ch/databases/iam-handwriting-database and then extract the handwriting images. For convenience, here we provide the processed h5py files trnvalset_words64_OrgSz.hdf5 testset_words64_OrgSz.hdf5, which should put into the ./data/iam/ directory.

Training & Test

Training HiGAN on the IAM dataset

python train.py --config ./configs/gan_iam.yml

Quantitative Test

python test.py --config ./configs/gan_iam.yml --ckpt ./pretrained/deploy_HiGAN+.pth --guided True

Main arguments:
- --config: the configuration file of HiGAN
- --ckpt: the path of checkpoint, which is stored in the ./runs/ directory after training.
- --guided: whether to extract styles from reference images. If --guided False, the styles of generated images will be randomly sampled from the standard normal distribution.

Qualitative Evaluation

python eval_demo.py --config ./configs/gan_iam.yml --ckpt ./pretrained/deploy_HiGAN+.pth --mode style

Main arguments:
- --config: the configuration file of HiGAN
- --ckpt: the path of checkpoint, which is stored in the ./runs/ directory after training.
- --mode: [ rand | style | interp | text ].

Latent-guided synthesis

python eval_demo.py --config ./configs/gan_iam.yml --ckpt ./pretrained/deploy_HiGAN+.pth --mode rand

Reference-guided synthesis

python eval_demo.py --config ./configs/gan_iam.yml --ckpt ./pretrained/deploy_HiGAN+.pth --mode style

Text synthesis

python eval_demo.py --config ./configs/gan_iam.yml --ckpt ./pretrained/deploy_HiGAN+.pth --mode text

Style interpolation

python eval_demo.py --config ./configs/gan_iam.yml --ckpt ./pretrained/deploy_HiGAN+.pth --mode interp

On-the-fly plots during training

With this code it is possible to track progress during training with on-the-fly plots. This feature requires Tensorboard, which should be started from the command line:

tensorboard --logdir=./runs

The tensorboard server is now alive and can be accessed at http://localhost:6006.

Some on-the-fly plots are given as the followings:

Citation

If you find our research is helpful, please remember to cite our paper:

@article{gan2022higanplus,
author = {Gan, Ji and Wang, Weiqiang and Leng, Jiaxu and Gao, Xinbo},
title = {HiGAN+: Handwriting Imitation GAN with Disentangled Representations},
year = {2022},
volume = {42},
number = {1},
url = {https://doi.org/10.1145/3550070},
doi = {10.1145/3550070},
journal = {ACM Trans. Graph.}
}

License

HiGAN+ is free for academic research purposes.

higanplus's People

Contributors

Stargazers

Watchers

Forkers

ponteineptique muellerdominik na-burova quantitative-technologies jackzhousz niketh457 datnth1709 peterzs aniketgurav thanhnam001 johnny221b

higanplus's Issues

The lbs in hdf5 file

@ganji15

I downloaded your training set hdf5 file and found that there are Ascii code. How to match Ascii with the code of 0-80?

In traval set, the first text label is "Mr.", when i print the lbs i got the "[77, 114, 46]". This means the "M" code is 77, "r" code is 114

Is this model Zero-shot Handwriting imitation?

As described in the title, I would like to know if the model can absorb the unseen handwriting images and transfer the style of the handwriting to generate new texts.
My understanding is that in the writer embedding, the dimension of the embedding is set to the number of writers in the training set, i.e., the IAM dataset. Thus, it is not a zero-shot model. However, this idea and structure of the model should be robust enough to do so.

h5py文件如何生成？

你好，HiGAN+的作者：
如果使用其他的手写数据集，那么我该如何生成所训练所需要的h5py文件？

Why PSNR is low？

I read the paper, and i have a question about the metric. I don't have deep research on font image generation, but i found that the psnr in the paper is very low(less than 20). Does this mean that the generated image is not very similar to the original image?

Able to reproduce one specific calligraphic style?

Hello.

I have images of one particular calligraphic style which I am trying to reproduce (so that my text put in creates an image of the same words in the calligraphic style).

The section in your paper on "Imitating Handwriting in the Wild" made me wonder if this might be possible somehow, utilizing HIGANplus.

Would it be possible?

(And if so, would I use a model trained from the GNHK dataset, or would I produce my own model using the images of the style I wish to reproduce?

And if I would use a model from the GNHK dataset, do you have an already trained model available?)

Questions about model training

I want to know how to do multi-gpu training? I have four GPUs, and modify the configuration file as follows:

but it reported an error:

And i also want to know how to continue training from the previous checkpoint？
Thank you！

Why does the background of each char in the synthesis images look like inconsistent?

Hi, @ganji15 thanks for your contribution.
As shown in the picture, "Sweat" is the real style input image, and the others are synthesis images using the pretrained model.
I'm confusing about the inconsistent background of each char in the synthesis images.
Also, it looks like that the style transfer failed to imitate that of the input image. The synthesis chars look like some styles in the IAM trainset. I'm wondering how to improve the style transfer ability for HiGAN+ in a few-shot task.
By the way, can I use the model pretrained on word-level dataset for generating images on text lines of sentences?
Could you please give some kind advide?

为什么eval_style（）中的数据迭代器不起作用？

我想使用全部的测试集，于是将tst_loader中的shuffle设置为False，但是batch中的内容始终保持不变，这是怎么回事？我的目的是想顺序的遍历测试集。

discriminator loss

@ganji15
Hi！
I observed that the loss of the discriminator in your code uses this instead of using bce loss.

This is different from what is described in the paper, can you explain what this loss means？

And why are there "1+xx" and "1-xx"

Is there any purpose in inverting the color of the images?

In lib/datasets.py:

        ....

        for fn in listOfFiles:
            img = cv2.imread(fn, cv2.IMREAD_GRAYSCALE)

            # Read image labels
            label_text = os.path.basename(fn).split('.')[0]

            # Normalize image-height
            h, w = img.shape[:2]
            r = self.ImgHeight / float(h)
            new_w = max(int(w * r), int(ImgHeight / 4 * len(label_text)))
            dim = (new_w, ImgHeight)
            if new_w < w:
                resize_img = cv2.resize(img, dim, interpolation=cv2.INTER_AREA)
            else:
                resize_img = cv2.resize(img, dim, interpolation=cv2.INTER_LINEAR)
            res_img = 255 - resize_img

            all_imgs.append(res_img)
            all_texts.append(label_text)

        ....

The color inverts again when output in /networks/model.py:

        gen_imgs = self.models.G(rand_styles, fake_lbs, fake_lb_lens)
        gen_imgs = (1 - gen_imgs).squeeze().cpu().numpy() * 127

Is there any purpose in inverting the color of the images before training? Would it improve the performance? Thanks.

作者您好！如何构建跟英文手写数据集相同的数据集格式？

代码运行问题

请问你们上传的代码用指示的指令能否执行？
为什么我这里会报错。
RuntimeError: The default implementation of deepcopy() for non-wrapper subclasses only works for subclass types that implement new_empty() and for which that function returns another instance of the same subclass. You should either properly implement new_empty() for your subclass or override deepcopy() if it is intended behavior for new_empty() to return an instance of a different type.

Why does real_imgs have -1 during training？

@ganji15
I'm training HiGAN+, I printed real_imgs during training and found -1 in it.

The upper part is recn_img, and the second half is real_img. Why do these -1s exist?

Also can you explain the ’recn_l1_loss(img1, img2, img_lens)‘ function? What I know so far is that this function is a variable-length l1 loss, i want to know more details.

Thanks very much!

Why is the image generated by HIGAN+ inconsistent with the original image？

Hi! @ganji15

I have two questions: Why is the segmentation of the generated image different from the original image? As shown in the figure below, in the original image "r" and "y" are not split, but in generated image "r" and "y" are split.

Another question is why the shading of the generated image is different from the original image

I'm looking forward for your reply！

RuntimeError: The default implementation of deepcopy() for non-wrapper subclasses only works for subclass types that implement new_empty() and for which that function returns another instance of the same subclass.

I got this error, please help!

Traceback (most recent call last):
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\train.py", line 25, in
model.train()
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\networks\model.py", line 718, in train
self.y.sample_()
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\networks\rand_dist.py", line 63, in sample_
return deepcopy(self).detach()
File "C:\Users\hoang\AppData\Local\Programs\Python\Python39\lib\copy.py", line 153, in deepcopy
y = copier(memo)
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\win-env\lib\site-packages\torch_tensor.py", line 84, in deepcopy
return handle_torch_function(Tensor.deepcopy, (self,), self, memo)
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\win-env\lib\site-packages\torch\overrides.py", line 1551, in handle_torch_function
result = torch_func_method(public_api, types, args, kwargs)
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\win-env\lib\site-packages\torch_tensor.py", line 1295, in torch_function
ret = func(*args, **kwargs)
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\win-env\lib\site-packages\torch_tensor.py", line 170, in deepcopy
raise RuntimeError(
RuntimeError: The default implementation of deepcopy() for non-wrapper subclasses only works for subclass types that implement new_empty() and for which that function returns another instance of the same subclass. You should either properly implement new_empty() for your subclass or override deepcopy() if it is intended behavior for new_empty() to return an instance of a different type.

How to add a new style?

I write some new handwriting styles, how can I add them to generate with style

How to train StyleBackbone, or StyleEncoder?

I modified the structure of the StyleBackbone part, which affects the StyleEncoder and the WriterIdentification parts.
Since in the recipe, there is only the config for training the GAN part, how can I retrain the StyleBackbone and all the related parts. Thanks!

I am training the HiGAN+ network from scratch on the IAM dataset, and it gave me an error for a large imaginary component in the function fid_kid_is.py.

The error is returned by the following line:
raise ValueError('Imaginary component {}'.format(m))

Do you like to share insight on the reason that may prompt this error? Thanks.