Giter VIP home page Giter VIP logo

Comments (30)

herobd avatar herobd commented on August 10, 2024 1

This is just trying to train the handwriting recognition network, right?
Things to try:

  1. Be sure the images are getting resized correctly (64 pix height)
  2. GPU size. I'd always have nvidia-smi as the first cell is your Colab notebooks so you know what GPU has been assigned to you. I trained the model with an 11GB GPU, although the CVL dataset may have longer images.
  3. Adjust the max_width in the data_loader of the config. The crash occurs on a particularly long image; this parameter will force the resizing to reduce the width to atleast this (it defaults to 3000). BE CAREFUL though, as these adjusted images now are a bit outside of the distribution of the rest of the images. (It may be better to just throw them out...)
  4. Reduce the batch size. Although be careful as the config has the CRNN using batch norm. You can change this to groupnorm if you need to drop the batch size significantly.
  5. Make the model smaller. As the handwriting recognition network is only an auxilary network, reducing the number of channels shouldn't be terrible. Change the nm variable in handwriting_line_generation/model/cnn_only_hwr.py (Assuming you are using the CNN only handwriting network)

from handwriting_line_generation.

herobd avatar herobd commented on August 10, 2024 1

The charset file defines the characters the model can see and produce. Besure each model component gets data_loader: char_file updated in the config and are trained. Also be sure to update the text file (trainer: text_data in config) when training the generator to have text from the desired output language.

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

Thank you very much, I'll try those things and response to you soon.
Have a good day!

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

Hello again! You're right, the problem is the width of cvl. There was the line bounding box of cvl. Some of lines were not bounding correctly and when pass into the model, their width explode significantly. So I remove all of them out of database and the model worked well.
Sorry for this inconvenient, Could you give me some advice for using latin language like vietnamese? I added a VN_charset.json into folder data which contains english & latin characters and theirs idx.
Thank you very much.

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

Thank you so much, Sir

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

Sorry for disturbing again, sir. Now, I have done training the model but I don't know how to calculate FID scores as in your paper. Could you give me some instruction please. THank for your help.

from handwriting_line_generation.

herobd avatar herobd commented on August 10, 2024

I had modified someones FID code to resize handwriting images properly: https://github.com/herobd/pytorch-fid-handwriting

You'll need to generate a bunch of handwriting images (which I think was the point of the Random option in generate.py and then have another directory with the (testing) dataset line images.

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

Thank you very much, Sir.

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

Hello again, I've change the charset.json file with the character of my language handwritten text database. Then I started the recognizer training process, but I got all loss is nan at the first iteration and the process stopped. The encoder has the same result as recognizer. Could you give me some advices, please. Thank you in advance.

from handwriting_line_generation.

herobd avatar herobd commented on August 10, 2024

Hmm, double check the charset.json to be sure it looks like mine (especially starting at index of 1 instead of 0). If nothing's wrong I would print the inputs to the CTC loss call and be sure they look fine. You could even run my original set up to get an example of what the inputs should be. Also, this is unlikely, but if the predictions given to the CTC loss is shorter than the target, that causes problems (CTC can't be computed properly as it assumes a longer input).

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

This is my charset.json : {
"char_to_idx": {
" ": 1, "!": 2, """: 3, "#": 4, "&": 5, "'": 6, "(": 7, ")": 8, "": 9, "+": 10, ",": 11, "-": 12, ".": 13, "/": 14, "0": 15, "1": 16, "2": 17, "3": 18, "4": 19, "5": 20, "6": 21, "7": 22, "8": 23, "9": 24, ":": 25, ";": 26, "?": 27, "A": 28, "B": 29, "C": 30, "D": 31, "E": 32, "F": 33, "G": 34, "H": 35, "I": 36, "J": 37, "K": 38, "L": 39, "M": 40, "N": 41, "O": 42, "P": 43, "Q": 44, "R": 45, "S": 46, "T": 47, "U": 48, "V": 49, "W": 50, "X": 51, "Y": 52, "Z": 53, "a": 54, "b": 55, "c": 56, "d": 57, "e": 58, "f": 59, "g": 60, "h": 61, "i": 62, "j": 63, "k": 64, "l": 65, "m": 66, "n": 67, "o": 68, "p": 69, "q": 70, "r": 71, "s": 72, "t": 73, "u": 74, "v": 75, "w": 76, "x": 77, "y": 78, "z": 79, "\u00e1": 80, "\u00e0": 81, "\u1ea3": 83, "\u00e3": 84, "\u1ea1": 85, "\u0103": 86, "\u1eaf": 87, "\u1eb1": 88, "\u1eb3": 89, "\u1eb5": 90, "\u1eb7": 91, "\u00e2": 92, "\u1ea5": 93, "\u1ea7": 94, "\u1ea9": 95, "\u1eab": 96, "\u1ead": 97, "\u00ed": 98, "\u00ec": 99, "\u1ec9": 100, "\u0129": 101, "\u1ecb": 102, "\u00fa": 103, "\u00f9": 104, "\u1ee7": 105, "\u0169": 106, "\u1ee5": 107, "\u01b0": 108, "\u1ee9": 109, "\u1eeb": 110, "\u1eed": 111, "\u1eef": 112, "\u1ef1": 113, "\u00f3": 114, "\u00f2": 115, "\u1ecf": 116, "\u00f5": 117, "\u1ecd": 118, "\u00f4": 119, "\u1ed1": 120, "\u1ed3": 121, "\u1ed5": 122, "\u1ed7": 123, "\u1ed9": 124, "\u01a1": 125, "\u1edb": 126, "\u1edd": 127, "\u1edf": 128, "\u1ee1": 129, "\u1ee3": 130, "\u00e9": 131, "\u00e8": 132, "\u1ebb": 133, "\u1ebd": 134, "\u1eb9": 135, "\u00ea": 136, "\u1ebf": 137, "\u1ec1": 138, "\u1ec3": 139, "\u1ec5": 140, "\u1ec7": 141, "\u00fd": 142, "\u1ef3": 143, "\u1ef7": 144, "\u1ef9": 145, "\u1ef5": 146, "\u0111": 147, "\u00c1": 148, "\u00c0": 149, "\u1ea2": 151, "\u00c3": 152, "\u1ea0": 153, "\u0102": 154, "\u1eae": 155, "\u1eb0": 156, "\u1eb2": 157, "\u1eb4": 158, "\u1eb6": 159, "\u00c2": 160, "\u1ea4": 161, "\u1ea6": 162, "\u1ea8": 163, "\u1eaa": 164, "\u1eac": 165, "\u00cd": 166, "\u00cc": 167, "\u1ec8": 168, "\u0128": 169, "\u1eca": 170, "\u00da": 171, "\u00d9": 172, "\u1ee6": 173, "\u0168": 174, "\u1ee4": 175, "\u01af": 176, "\u1ee8": 177, "\u1eea": 178, "\u1eec": 179, "\u1eee": 180, "\u1ef0": 181, "\u00d3": 182, "\u00d2": 183, "\u1ece": 184, "\u00d5": 185, "\u1ecc": 186, "\u00d4": 187, "\u1ed0": 188, "\u1ed2": 189, "\u1ed4": 190, "\u1ed6": 191, "\u1ed8": 192, "\u01a0": 193, "\u1eda": 194, "\u1edc": 195, "\u1ede": 196, "\u1ee0": 197, "\u1ee2": 198, "\u00c9": 199, "\u00c8": 200, "\u1eba": 201, "\u1ebc": 202, "\u1eb8": 203, "\u00ca": 204, "\u1ebe": 205, "\u1ec0": 206, "\u1ec2": 207, "\u1ec4": 208, "\u1ec6": 209, "\u00dd": 210, "\u1ef2": 211, "\u1ef6": 212, "\u1ef8": 213, "\u1ef4": 214, "\u0110": 215},
"idx_to_char": {
"1": " ", "2": "!", "3": """, "4": "#", "5": "&", "6": "'", "7": "(", "8": ")", "9": "
", "10": "+", "11": ",", "12": "-", "13": ".", "14": "/", "15": "0", "16": "1", "17": "2", "18": "3", "19": "4", "20": "5", "21": "6", "22": "7", "23": "8", "24": "9", "25": ":", "26": ";", "27": "?", "28": "A", "29": "B", "30": "C", "31": "D", "32": "E", "33": "F", "34": "G", "35": "H", "36": "I", "37": "J", "38": "K", "39": "L", "40": "M", "41": "N", "42": "O", "43": "P", "44": "Q", "45": "R", "46": "S", "47": "T", "48": "U", "49": "V", "50": "W", "51": "X", "52": "Y", "53": "Z", "54": "a", "55": "b", "56": "c", "57": "d", "58": "e", "59": "f", "60": "g", "61": "h", "62": "i", "63": "j", "64": "k", "65": "l", "66": "m", "67": "n", "68": "o", "69": "p", "70": "q", "71": "r", "72": "s", "73": "t", "74": "u", "75": "v", "76": "w", "77": "x", "78": "y", "79": "z", "80": "\u00e1", "81": "\u00e0", "82": "\u1ea3", "83": "\u1ea3", "84": "\u00e3", "85": "\u1ea1", "86": "\u0103", "87": "\u1eaf", "88": "\u1eb1", "89": "\u1eb3", "90": "\u1eb5", "91": "\u1eb7", "92": "\u00e2", "93": "\u1ea5", "94": "\u1ea7", "95": "\u1ea9", "96": "\u1eab", "97": "\u1ead", "98": "\u00ed", "99": "\u00ec", "100": "\u1ec9", "101": "\u0129", "102": "\u1ecb", "103": "\u00fa", "104": "\u00f9", "105": "\u1ee7", "106": "\u0169", "107": "\u1ee5", "108": "\u01b0", "109": "\u1ee9", "110": "\u1eeb", "111": "\u1eed", "112": "\u1eef", "113": "\u1ef1", "114": "\u00f3", "115": "\u00f2", "116": "\u1ecf", "117": "\u00f5", "118": "\u1ecd", "119": "\u00f4", "120": "\u1ed1", "121": "\u1ed3", "122": "\u1ed5", "123": "\u1ed7", "124": "\u1ed9", "125": "\u01a1", "126": "\u1edb", "127": "\u1edd", "128": "\u1edf", "129": "\u1ee1", "130": "\u1ee3", "131": "\u00e9", "132": "\u00e8", "133": "\u1ebb", "134": "\u1ebd", "135": "\u1eb9", "136": "\u00ea", "137": "\u1ebf", "138": "\u1ec1", "139": "\u1ec3", "140": "\u1ec5", "141": "\u1ec7", "142": "\u00fd", "143": "\u1ef3", "144": "\u1ef7", "145": "\u1ef9", "146": "\u1ef5", "147": "\u0111", "148": "\u00c1", "149": "\u00c0", "150": "\u1ea2", "151": "\u1ea2", "152": "\u00c3", "153": "\u1ea0", "154": "\u0102", "155": "\u1eae", "156": "\u1eb0", "157": "\u1eb2", "158": "\u1eb4", "159": "\u1eb6", "160": "\u00c2", "161": "\u1ea4", "162": "\u1ea6", "163": "\u1ea8", "164": "\u1eaa", "165": "\u1eac", "166": "\u00cd", "167": "\u00cc", "168": "\u1ec8", "169": "\u0128", "170": "\u1eca", "171": "\u00da", "172": "\u00d9", "173": "\u1ee6", "174": "\u0168", "175": "\u1ee4", "176": "\u01af", "177": "\u1ee8", "178": "\u1eea", "179": "\u1eec", "180": "\u1eee", "181": "\u1ef0", "182": "\u00d3", "183": "\u00d2", "184": "\u1ece", "185": "\u00d5", "186": "\u1ecc", "187": "\u00d4", "188": "\u1ed0", "189": "\u1ed2", "190": "\u1ed4", "191": "\u1ed6", "192": "\u1ed8", "193": "\u01a0", "194": "\u1eda", "195": "\u1edc", "196": "\u1ede", "197": "\u1ee0", "198": "\u1ee2", "199": "\u00c9", "200": "\u00c8", "201": "\u1eba", "202": "\u1ebc", "203": "\u1eb8", "204": "\u00ca", "205": "\u1ebe", "206": "\u1ec0", "207": "\u1ec2", "208": "\u1ec4", "209": "\u1ec6", "210": "\u00dd", "211": "\u1ef2", "212": "\u1ef6", "213": "\u1ef8", "214": "\u1ef4", "215": "\u0110"
}}
I just append our charset into your original charset.json and save in another file. The first index of course is 1. And I only replace the new charset.json in the config files. The setup is the same, and the result is nan of all losses.

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

I've went to model/loss.py/CTCloss and print input_length and target_length and got this result:
input_len: tensor([262, 262, 262, 262, 262, 262, 262, 262, 262, 262, 262, 262, 262, 262, 262, 262], dtype=torch.int32) target_len: tensor([32, 32, 7, 48, 23, 30, 34, 23, 7, 19, 30, 4, 7, 41, 35, 25], dtype=torch.int32
It's always input_length > target_length. Could you explain what are input length and target length and why target length is so short. Is the target_length number fine ?

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

And some of input is nan

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

I've run debug mode on PyCharm in recognizer training and I found out that problem is the predict label. After prediction and converting, label2string_single, the output character list only contain characters in English despite we're using a latin language database. I think that the model for IAM dataset not have capability for learning the latin character. And we consider to use the model for RIMES dataset beacause of our langague has some French characters. Another way is that we'll add some layers to the model to make it more deeply. Could you give me some advices. Thank you in advance

from handwriting_line_generation.

herobd avatar herobd commented on August 10, 2024

Input length is the image width (after downsampling), the target length is the length of the target string. The model predicts something for each image unit (after the network downsamples it), which should be more predictions than the target string (as each written character is multiple image units long).

The capacity of the network won't be causing the NaNs. Trace where those NaNs back to where they originate.

The model for the IAM and RIMES datasets are identical with the exception of the output classes. Be sure in the config that model -> num_class is set to the number of characters in charset.json. (looks like 216 for yours?)

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

Oh, I didn't notice that num_class in the config. I changed the num_class to 216 and the model run smoothly. I really appreciate it. I'm really grateful.

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

Hello, we have to disturb you again. We have train both the recog and the encoder, they're ok, but when we train generator at 40k iterations, the recon_sample and recon_gt_mask are all blank, only the recon_gt has text. Could you give us some advice for this problem. Thank you very much.

from handwriting_line_generation.

herobd avatar herobd commented on August 10, 2024

You're referring to the output images? Your're sure it's writing new images? I've never seen it generate blank images. Generally at the beginning of training it generates weird blurry images.

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

<img width="655" alt="Capture1" src="https://user-images.githubusercontent.com/91046245/145511900-eaa63e97-0e9b-4dfd-a284-5ecce2644c84.
Capture2
PNG">
Only the recon_gt have words

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

Capture1

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

this is our text file

VN1.txt

from handwriting_line_generation.

herobd avatar herobd commented on August 10, 2024

What do the losses look like? You can use python graph.py -c path/to/latest/snapshot.pth to graph them.

from handwriting_line_generation.

herobd avatar herobd commented on August 10, 2024

Also, do you have any of the generated ("samples") images from the beginning of training?

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

The image below is generated images.
image

from handwriting_line_generation.

herobd avatar herobd commented on August 10, 2024

Maybe turn down the learning rate? It would be good to see what the loss is doing.

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

this is the result after running code python graph.py -c path/to/latest/snapshot.pth

loaded iteration 40750
summed
loss max: 12.960875740402116, min 0.0
autoLoss max: 0.4286576807498932, min 0.0250049140304327
perceptualLoss max: 3.558936834335327, min 1.9109687805175781
reconRecogLoss max: 9.085136844078079e-05, min 4.423267000674969e-06
generatorLoss max: 9.976031303405762, min -0.29183000326156616
CER max: 0, min 0
WER max: 0, min 0
discriminatorLoss max: 0.013461818918585777, min 0.0
sec_per_iter max: 2.9882827183921474, min 2.3992390221759705
avg_loss max: 6.829719831926175, min 2.0683593205882085
avg_genRecogLoss max: 0.000307868676725775, min 0.00013851409091148524
avg_generatorLoss max: 5.6224011726379395, min 0.9937937784194947
avg_CER max: 0.0, min 0.0
avg_WER max: 0.0, min 0.0
avg_autoLoss max: 0.10185332185029984, min 0.01183332721143961
avg_perceptualLoss max: 0.9597496643066407, min 0.6026059198379516
avg_reconRecogLoss max: 5.132893413247075e-06, min 2.0161328866379337e-06
avg_discriminatorLoss max: 0.07236941555517842, min 0.0
avg_countLoss max: 0.6470937192440033, min 0.23567486727237702
genRecogLoss max: 0.0012050500372424722, min 0.00044575537322089076
countLoss max: 11.75268268585205, min 0.5220853686332703
val_loss max: 5.792953810724987, min 4.854049627309792
val_CER max: 0.0, min 0.0
val_WER max: 0.0, min 0.0
val_autoLoss max: 0.1761924303182871, min 0.04305112739081989
val_countLoss max: 2.304927395859567, min 2.159969479435741
val_perceptualLoss max: 3.311824187193767, min 2.6480215297375533
val_reconRecogLoss max: 1.3890943866481573e-05, min 8.966553432945605e-06

image
image

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

can you give me advice about this? thank you so much

from handwriting_line_generation.

herobd avatar herobd commented on August 10, 2024

The losses seem more chaotic than what the IAM model had (you can use graph.py on the pre-trained snapshot to see). I would definitly turn down the learning rate and see if that helps.

from handwriting_line_generation.

AICampB4 avatar AICampB4 commented on August 10, 2024

Hello, I've dropped the lr 10 times but after about 30k iters, the gen_sample images become blank. Those are lr I've change and the gen_sample images
MicrosoftTeams-image (1)
Capture3

from handwriting_line_generation.

herobd avatar herobd commented on August 10, 2024

Ok, so it is generating good things at the beginning. Try droping the main optimizer's learning rate, but keep the discriminators the same. And the reverse.
The model is hitting a very common problem with GANs which is unstable training. I'll admit I didn't run into this much after settling on good hyperparameters, so I'm surprised it's happening with a new dataset.
In general, getting the right balance of the learning rates as well as how often the discriminator is getting updated should do the trick.

from handwriting_line_generation.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.