Giter VIP home page Giter VIP logo

higanplus's Introduction

HiGAN+

Introduction

This is a PyTorch implementation of the paper "HiGAN+: Handwriting Imitation GAN with Disentangled Representations" (authored by Ji Gan, Weiwiang Wang*, Jiaxu Leng, Xinbo Gao*. )

HiGAN+ can generate diverse and realistic handwritten text images (with 64-pixel height) conditioned on arbitrary textual contents and calligraphic styles.

Overview of HiGAN+

Overview of HiGAN

Installation & requirements

The current version of the code has been tested with the following environment:

  • Ubuntu 20 or 22
  • Python 3
  • PyTorch 1.11.0

To use the code, download the repository and change into it:

git clone https://github.com/ganji15/HiGANplus.git

cd HiGAN+

You need to applicant the IAM dataset from http://www.fki.inf.unibe.ch/databases/iam-handwriting-database and then extract the handwriting images. For convenience, here we provide the processed h5py files trnvalset_words64_OrgSz.hdf5 testset_words64_OrgSz.hdf5, which should put into the ./data/iam/ directory.

Training & Test

Training HiGAN on the IAM dataset

python train.py --config ./configs/gan_iam.yml

Quantitative Test

python test.py --config ./configs/gan_iam.yml --ckpt ./pretrained/deploy_HiGAN+.pth --guided True

  • Main arguments:
    • --config: the configuration file of HiGAN
    • --ckpt: the path of checkpoint, which is stored in the ./runs/ directory after training.
    • --guided: whether to extract styles from reference images. If --guided False, the styles of generated images will be randomly sampled from the standard normal distribution.

Qualitative Evaluation

python eval_demo.py --config ./configs/gan_iam.yml --ckpt ./pretrained/deploy_HiGAN+.pth --mode style

  • Main arguments:
    • --config: the configuration file of HiGAN
    • --ckpt: the path of checkpoint, which is stored in the ./runs/ directory after training.
    • --mode: [ rand | style | interp | text ].

Latent-guided synthesis

python eval_demo.py --config ./configs/gan_iam.yml --ckpt ./pretrained/deploy_HiGAN+.pth --mode rand Rand

Reference-guided synthesis

python eval_demo.py --config ./configs/gan_iam.yml --ckpt ./pretrained/deploy_HiGAN+.pth --mode style Style

Text synthesis

python eval_demo.py --config ./configs/gan_iam.yml --ckpt ./pretrained/deploy_HiGAN+.pth --mode text Text

Style interpolation

python eval_demo.py --config ./configs/gan_iam.yml --ckpt ./pretrained/deploy_HiGAN+.pth --mode interp Interp1

On-the-fly plots during training

With this code it is possible to track progress during training with on-the-fly plots. This feature requires Tensorboard, which should be started from the command line:

tensorboard --logdir=./runs

The tensorboard server is now alive and can be accessed at http://localhost:6006.

Some on-the-fly plots are given as the followings: Loss Samples

Citation

If you find our research is helpful, please remember to cite our paper:

@article{gan2022higanplus,
author = {Gan, Ji and Wang, Weiqiang and Leng, Jiaxu and Gao, Xinbo},
title = {HiGAN+: Handwriting Imitation GAN with Disentangled Representations},
year = {2022},
volume = {42},
number = {1},
url = {https://doi.org/10.1145/3550070},
doi = {10.1145/3550070},
journal = {ACM Trans. Graph.}
}

License

HiGAN+ is free for academic research purposes.

higanplus's People

Contributors

ganji15 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

higanplus's Issues

The lbs in hdf5 file

@ganji15

image
I downloaded your training set hdf5 file and found that there are Ascii code. How to match Ascii with the code of 0-80?

In traval set, the first text label is "Mr.", when i print the lbs i got the "[77, 114, 46]". This means the "M" code is 77, "r" code is 114

image

Is this model Zero-shot Handwriting imitation?

As described in the title, I would like to know if the model can absorb the unseen handwriting images and transfer the style of the handwriting to generate new texts.
My understanding is that in the writer embedding, the dimension of the embedding is set to the number of writers in the training set, i.e., the IAM dataset. Thus, it is not a zero-shot model. However, this idea and structure of the model should be robust enough to do so.

h5py文件如何生成?

你好,HiGAN+的作者:
如果使用其他的手写数据集,那么我该如何生成所训练所需要的h5py文件?

Why PSNR is low?

I read the paper, and i have a question about the metric. I don't have deep research on font image generation, but i found that the psnr in the paper is very low(less than 20). Does this mean that the generated image is not very similar to the original image?

Able to reproduce one specific calligraphic style?

Hello.

I have images of one particular calligraphic style which I am trying to reproduce (so that my text put in creates an image of the same words in the calligraphic style).

The section in your paper on "Imitating Handwriting in the Wild" made me wonder if this might be possible somehow, utilizing HIGANplus.

Would it be possible?

(And if so, would I use a model trained from the GNHK dataset, or would I produce my own model using the images of the style I wish to reproduce?

And if I would use a model from the GNHK dataset, do you have an already trained model available?)

Questions about model training

I want to know how to do multi-gpu training? I have four GPUs, and modify the configuration file as follows:
image
but it reported an error:
image

And i also want to know how to continue training from the previous checkpoint?
Thank you!

Why does the background of each char in the synthesis images look like inconsistent?

Hi, @ganji15 thanks for your contribution.
As shown in the picture, "Sweat" is the real style input image, and the others are synthesis images using the pretrained model.
I'm confusing about the inconsistent background of each char in the synthesis images.
Also, it looks like that the style transfer failed to imitate that of the input image. The synthesis chars look like some styles in the IAM trainset. I'm wondering how to improve the style transfer ability for HiGAN+ in a few-shot task.
By the way, can I use the model pretrained on word-level dataset for generating images on text lines of sentences?
Could you please give some kind advide?

image

discriminator loss

@ganji15
Hi!
I observed that the loss of the discriminator in your code uses this instead of using bce loss.
image

This is different from what is described in the paper, can you explain what this loss means?

And why are there "1+xx" and "1-xx"

Is there any purpose in inverting the color of the images?

In lib/datasets.py:

        ....

        for fn in listOfFiles:
            img = cv2.imread(fn, cv2.IMREAD_GRAYSCALE)

            # Read image labels
            label_text = os.path.basename(fn).split('.')[0]

            # Normalize image-height
            h, w = img.shape[:2]
            r = self.ImgHeight / float(h)
            new_w = max(int(w * r), int(ImgHeight / 4 * len(label_text)))
            dim = (new_w, ImgHeight)
            if new_w < w:
                resize_img = cv2.resize(img, dim, interpolation=cv2.INTER_AREA)
            else:
                resize_img = cv2.resize(img, dim, interpolation=cv2.INTER_LINEAR)
            res_img = 255 - resize_img

            all_imgs.append(res_img)
            all_texts.append(label_text)

        ....

The color inverts again when output in /networks/model.py:

        gen_imgs = self.models.G(rand_styles, fake_lbs, fake_lb_lens)
        gen_imgs = (1 - gen_imgs).squeeze().cpu().numpy() * 127

Is there any purpose in inverting the color of the images before training? Would it improve the performance? Thanks.

代码运行问题

请问你们上传的代码用指示的指令能否执行?
为什么我这里会报错。
RuntimeError: The default implementation of deepcopy() for non-wrapper subclasses only works for subclass types that implement new_empty() and for which that function returns another instance of the same subclass. You should either properly implement new_empty() for your subclass or override deepcopy() if it is intended behavior for new_empty() to return an instance of a different type.

Why does real_imgs have -1 during training?

@ganji15
I'm training HiGAN+, I printed real_imgs during training and found -1 in it.
image
The upper part is recn_img, and the second half is real_img. Why do these -1s exist?

Also can you explain the ’recn_l1_loss(img1, img2, img_lens)‘ function? What I know so far is that this function is a variable-length l1 loss, i want to know more details.

Thanks very much!

Why is the image generated by HIGAN+ inconsistent with the original image?

Hi! @ganji15

I have two questions: Why is the segmentation of the generated image different from the original image? As shown in the figure below, in the original image "r" and "y" are not split, but in generated image "r" and "y" are split.

image

Another question is why the shading of the generated image is different from the original image

image

I'm looking forward for your reply!

RuntimeError: The default implementation of __deepcopy__() for non-wrapper subclasses only works for subclass types that implement new_empty() and for which that function returns another instance of the same subclass.

I got this error, please help!

Traceback (most recent call last):
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\train.py", line 25, in
model.train()
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\networks\model.py", line 718, in train
self.y.sample_()
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\networks\rand_dist.py", line 63, in sample_
return deepcopy(self).detach()
File "C:\Users\hoang\AppData\Local\Programs\Python\Python39\lib\copy.py", line 153, in deepcopy
y = copier(memo)
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\win-env\lib\site-packages\torch_tensor.py", line 84, in deepcopy
return handle_torch_function(Tensor.deepcopy, (self,), self, memo)
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\win-env\lib\site-packages\torch\overrides.py", line 1551, in handle_torch_function
result = torch_func_method(public_api, types, args, kwargs)
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\win-env\lib\site-packages\torch_tensor.py", line 1295, in torch_function
ret = func(*args, **kwargs)
File "C:\Users\hoang\Dropbox\AI\Handwriting\HiGANplus\win-env\lib\site-packages\torch_tensor.py", line 170, in deepcopy
raise RuntimeError(
RuntimeError: The default implementation of deepcopy() for non-wrapper subclasses only works for subclass types that implement new_empty() and for which that function returns another instance of the same subclass. You should either properly implement new_empty() for your subclass or override deepcopy() if it is intended behavior for new_empty() to return an instance of a different type.

How to train StyleBackbone, or StyleEncoder?

I modified the structure of the StyleBackbone part, which affects the StyleEncoder and the WriterIdentification parts.
Since in the recipe, there is only the config for training the GAN part, how can I retrain the StyleBackbone and all the related parts. Thanks!

我想请问一下,gan_iam.yml只是在训练一个GAN网络吗?

你好,HiGAN+的作者:
我是一名初学者,请原谅我问一下很低级细节的问题。我想请问gan_iam.yml只是在训练一个GAN网络(文章中的G和D)吗?文章中描述的R和I(复用E)是预训练好的吗?其他的yml文件,比如ocr_iam.yml以及wid_iam.yml是否可以直接利用来训练R和I(复用E),需要额外的数据集吗?以及如果我想单独训练一个I需要什么命令?
感谢!

Error for large imaginary component in fid_kid_is.py

Hi,

Thanks for sharing the code.

I am training the HiGAN+ network from scratch on the IAM dataset, and it gave me an error for a large imaginary component in the function fid_kid_is.py.

The error is returned by the following line:
raise ValueError('Imaginary component {}'.format(m))

Do you like to share insight on the reason that may prompt this error? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.