Giter VIP home page Giter VIP logo

Comments (6)

pbcquoc avatar pbcquoc commented on July 19, 2024 1

Khi làm về ML em cần tuyệt đối lưu ý rằng dữ liệu lúc huấn luyện và lúc chạy thực tế phải được thu thập bằng cùng một phương pháp,
do đó việc em train trên tập dữ liệu tự phát sinh và test trên tập thực tế thì kết quả tệ là điều hiển nhiên,
nên em muốn tốt khi chạy thực tế thì em phải tự đánh nhãn với dữ liệu chạy thực tế của em rồi đem đi train nhé.

from vietocr.

pbcquoc avatar pbcquoc commented on July 19, 2024

Hi em,
bản chất là việc ocr chủ yếu là dựa vào dữ liệu từ ảnh, ảnh có chữ gì thì mình sẽ nhận dạng ra câu đó.
nên câu không có nghĩa vẫn được em nhé.
Dữ liệu train của em bao gồm bao nhiều câu?
Dữ liệu test có cùng sampling từ tập ban đầu em phát sinh không?

Em gửi log trainning lên này a xem thử nhé.

from vietocr.

chauthehan avatar chauthehan commented on July 19, 2024

Bài toán em đang làm là ocr đề bài môn toán. Em train trên 4 triệu ảnh, chia ra 3,2 triệu cho ảnh train và 0.8 triệu cho ảnh test. Mô hình em train 100000 vòng, mô hình cuối cùng có độ chính xác full_seq là 0.98. Em có lưu thêm một vài checkpoint lúc đạt 0.96, 0.97. Em test trên ảnh thực tế thì nó cho kết quả tốt với các câu tiếng việt, nhưng khi test với các ảnh có công thức toán như "Cho hàm số f=ax+by-cz" thì nó ra được mỗi phần tiếng việt là tốt, phần sau khá vớ vẩn, hoặc không ra gì luôn.
Một số nguyên nhân em có nghĩ đến:

  • Có thể em tạo data cho các ký tự nó ít hơn so với chữ
  • 4 triệu ảnh là còn ít
  • Tạo data không giống thực tế (thực ra em thấy em tạo khá đẹp)
    Các nguyên nhân này có đúng không anh? Ngoài ra còn có thể do nguyên nhân gì không anh?

Đây là log khi train,(không hiểu sao nó chỉ lưu được 24000 vòng đầu tiên):

iter: 000200 - train loss: 3.432 - lr: 4.09e-05 - load time: 0.39 - gpu time: 38.92
iter: 000400 - train loss: 2.748 - lr: 4.38e-05 - load time: 0.03 - gpu time: 41.04
iter: 000600 - train loss: 2.468 - lr: 4.85e-05 - load time: 0.03 - gpu time: 38.90
iter: 000800 - train loss: 2.157 - lr: 5.51e-05 - load time: 0.03 - gpu time: 39.53
iter: 001000 - train loss: 1.827 - lr: 6.35e-05 - load time: 0.03 - gpu time: 38.88
iter: 001200 - train loss: 1.542 - lr: 7.37e-05 - load time: 0.03 - gpu time: 38.13
iter: 001400 - train loss: 1.254 - lr: 8.57e-05 - load time: 0.03 - gpu time: 38.78
iter: 001600 - train loss: 1.073 - lr: 9.94e-05 - load time: 0.03 - gpu time: 39.26
iter: 001800 - train loss: 0.973 - lr: 1.15e-04 - load time: 0.03 - gpu time: 38.79
iter: 002000 - train loss: 0.918 - lr: 1.32e-04 - load time: 0.03 - gpu time: 40.40
iter: 002200 - train loss: 0.868 - lr: 1.50e-04 - load time: 0.03 - gpu time: 38.69
iter: 002400 - train loss: 0.853 - lr: 1.70e-04 - load time: 0.03 - gpu time: 40.65
iter: 002600 - train loss: 0.834 - lr: 1.91e-04 - load time: 0.03 - gpu time: 38.31
iter: 002800 - train loss: 0.833 - lr: 2.14e-04 - load time: 0.03 - gpu time: 36.86
iter: 003000 - train loss: 0.814 - lr: 2.38e-04 - load time: 0.03 - gpu time: 38.69
iter: 003000 - valid loss: 0.729 - acc full seq: 0.8856 - acc per char: 0.9896
iter: 003200 - train loss: 0.797 - lr: 2.63e-04 - load time: 0.03 - gpu time: 38.66
iter: 003400 - train loss: 0.790 - lr: 2.89e-04 - load time: 0.03 - gpu time: 38.32
iter: 003600 - train loss: 0.769 - lr: 3.16e-04 - load time: 0.03 - gpu time: 39.69
iter: 003800 - train loss: 0.772 - lr: 3.43e-04 - load time: 0.03 - gpu time: 38.44
iter: 004000 - train loss: 0.771 - lr: 3.72e-04 - load time: 0.03 - gpu time: 38.25
iter: 004200 - train loss: 0.765 - lr: 4.01e-04 - load time: 0.03 - gpu time: 38.78
iter: 004400 - train loss: 0.761 - lr: 4.30e-04 - load time: 0.03 - gpu time: 37.27
iter: 004600 - train loss: 0.779 - lr: 4.60e-04 - load time: 0.03 - gpu time: 38.40
iter: 004800 - train loss: 0.763 - lr: 4.90e-04 - load time: 0.03 - gpu time: 39.14
iter: 005000 - train loss: 0.756 - lr: 5.20e-04 - load time: 0.03 - gpu time: 38.45
iter: 005200 - train loss: 0.757 - lr: 5.50e-04 - load time: 0.03 - gpu time: 37.63
iter: 005400 - train loss: 0.758 - lr: 5.80e-04 - load time: 0.03 - gpu time: 38.01
iter: 005600 - train loss: 0.755 - lr: 6.10e-04 - load time: 0.03 - gpu time: 37.31
iter: 005800 - train loss: 0.754 - lr: 6.39e-04 - load time: 0.03 - gpu time: 37.25
iter: 006000 - train loss: 0.750 - lr: 6.68e-04 - load time: 0.03 - gpu time: 38.14
iter: 006000 - valid loss: 0.702 - acc full seq: 0.9303 - acc per char: 0.9942
iter: 006200 - train loss: 0.758 - lr: 6.97e-04 - load time: 0.03 - gpu time: 37.63
iter: 006400 - train loss: 0.757 - lr: 7.24e-04 - load time: 0.03 - gpu time: 38.62
iter: 006600 - train loss: 0.748 - lr: 7.51e-04 - load time: 0.03 - gpu time: 39.25
iter: 006800 - train loss: 0.748 - lr: 7.77e-04 - load time: 0.03 - gpu time: 38.97
iter: 007000 - train loss: 0.747 - lr: 8.02e-04 - load time: 0.03 - gpu time: 37.96
iter: 007200 - train loss: 0.738 - lr: 8.26e-04 - load time: 0.03 - gpu time: 38.16
iter: 007400 - train loss: 0.750 - lr: 8.49e-04 - load time: 0.03 - gpu time: 38.21
iter: 007600 - train loss: 0.743 - lr: 8.70e-04 - load time: 0.03 - gpu time: 39.11
iter: 007800 - train loss: 0.741 - lr: 8.90e-04 - load time: 0.03 - gpu time: 37.39
iter: 008000 - train loss: 0.740 - lr: 9.08e-04 - load time: 0.03 - gpu time: 40.33
iter: 008200 - train loss: 0.740 - lr: 9.25e-04 - load time: 0.03 - gpu time: 37.79
iter: 008400 - train loss: 0.733 - lr: 9.41e-04 - load time: 0.03 - gpu time: 40.04
iter: 008600 - train loss: 0.741 - lr: 9.54e-04 - load time: 0.03 - gpu time: 40.67
iter: 008800 - train loss: 0.740 - lr: 9.66e-04 - load time: 0.03 - gpu time: 38.95
iter: 009000 - train loss: 0.746 - lr: 9.77e-04 - load time: 0.03 - gpu time: 37.91
iter: 009000 - valid loss: 0.708 - acc full seq: 0.9010 - acc per char: 0.9846
iter: 009200 - train loss: 0.740 - lr: 9.85e-04 - load time: 0.03 - gpu time: 39.01
iter: 009400 - train loss: 0.734 - lr: 9.92e-04 - load time: 0.03 - gpu time: 39.29
iter: 009600 - train loss: 0.733 - lr: 9.96e-04 - load time: 0.03 - gpu time: 39.85
iter: 009800 - train loss: 0.733 - lr: 9.99e-04 - load time: 0.03 - gpu time: 39.14
iter: 010000 - train loss: 0.736 - lr: 1.00e-03 - load time: 0.03 - gpu time: 38.72
iter: 010200 - train loss: 0.734 - lr: 1.00e-03 - load time: 0.03 - gpu time: 38.29
iter: 010400 - train loss: 0.732 - lr: 1.00e-03 - load time: 0.03 - gpu time: 38.06
iter: 010600 - train loss: 0.729 - lr: 1.00e-03 - load time: 0.03 - gpu time: 39.43
iter: 010800 - train loss: 0.735 - lr: 1.00e-03 - load time: 0.03 - gpu time: 38.14
iter: 011000 - train loss: 0.729 - lr: 1.00e-03 - load time: 0.03 - gpu time: 38.27
iter: 011200 - train loss: 0.728 - lr: 1.00e-03 - load time: 0.03 - gpu time: 39.04
iter: 011400 - train loss: 0.727 - lr: 9.99e-04 - load time: 0.03 - gpu time: 39.32
iter: 011600 - train loss: 0.733 - lr: 9.99e-04 - load time: 0.03 - gpu time: 37.89
iter: 011800 - train loss: 0.726 - lr: 9.99e-04 - load time: 0.03 - gpu time: 37.57
iter: 012000 - train loss: 0.726 - lr: 9.99e-04 - load time: 0.03 - gpu time: 39.84
iter: 012000 - valid loss: 0.693 - acc full seq: 0.9542 - acc per char: 0.9954
iter: 012200 - train loss: 0.729 - lr: 9.99e-04 - load time: 0.03 - gpu time: 37.93
iter: 012400 - train loss: 0.728 - lr: 9.98e-04 - load time: 0.03 - gpu time: 38.03
iter: 012600 - train loss: 0.730 - lr: 9.98e-04 - load time: 0.03 - gpu time: 39.20
iter: 012800 - train loss: 0.723 - lr: 9.98e-04 - load time: 0.03 - gpu time: 39.24
iter: 013000 - train loss: 0.728 - lr: 9.97e-04 - load time: 0.03 - gpu time: 35.80
iter: 013200 - train loss: 0.723 - lr: 9.97e-04 - load time: 0.03 - gpu time: 39.52
iter: 013400 - train loss: 0.728 - lr: 9.96e-04 - load time: 0.03 - gpu time: 38.86
iter: 013600 - train loss: 0.727 - lr: 9.96e-04 - load time: 0.03 - gpu time: 38.69
iter: 013800 - train loss: 0.720 - lr: 9.96e-04 - load time: 0.03 - gpu time: 38.68
iter: 014000 - train loss: 0.722 - lr: 9.95e-04 - load time: 0.03 - gpu time: 38.09
iter: 014200 - train loss: 0.727 - lr: 9.95e-04 - load time: 0.03 - gpu time: 38.30
iter: 014400 - train loss: 0.726 - lr: 9.94e-04 - load time: 0.03 - gpu time: 38.84
iter: 014600 - train loss: 0.718 - lr: 9.94e-04 - load time: 0.03 - gpu time: 39.89
iter: 014800 - train loss: 0.725 - lr: 9.93e-04 - load time: 0.03 - gpu time: 37.52
iter: 015000 - train loss: 0.725 - lr: 9.92e-04 - load time: 0.03 - gpu time: 38.47
iter: 015000 - valid loss: 0.692 - acc full seq: 0.9578 - acc per char: 0.9965
iter: 015200 - train loss: 0.719 - lr: 9.92e-04 - load time: 0.04 - gpu time: 40.23
iter: 015400 - train loss: 0.723 - lr: 9.91e-04 - load time: 0.04 - gpu time: 40.97
iter: 015600 - train loss: 0.720 - lr: 9.90e-04 - load time: 0.03 - gpu time: 40.14
iter: 015800 - train loss: 0.729 - lr: 9.90e-04 - load time: 0.03 - gpu time: 38.65
iter: 016000 - train loss: 0.717 - lr: 9.89e-04 - load time: 0.03 - gpu time: 42.21
iter: 016200 - train loss: 0.714 - lr: 9.88e-04 - load time: 0.04 - gpu time: 40.79
iter: 016400 - train loss: 0.723 - lr: 9.88e-04 - load time: 0.03 - gpu time: 40.93
iter: 016600 - train loss: 0.719 - lr: 9.87e-04 - load time: 0.03 - gpu time: 40.13
iter: 016800 - train loss: 0.721 - lr: 9.86e-04 - load time: 0.03 - gpu time: 39.91
iter: 017000 - train loss: 0.722 - lr: 9.85e-04 - load time: 0.03 - gpu time: 39.95
iter: 017200 - train loss: 0.717 - lr: 9.84e-04 - load time: 0.03 - gpu time: 40.17
iter: 017400 - train loss: 0.719 - lr: 9.83e-04 - load time: 0.03 - gpu time: 39.13
iter: 017600 - train loss: 0.724 - lr: 9.83e-04 - load time: 0.03 - gpu time: 40.76
iter: 017800 - train loss: 0.723 - lr: 9.82e-04 - load time: 0.03 - gpu time: 38.27
iter: 018000 - train loss: 0.717 - lr: 9.81e-04 - load time: 0.03 - gpu time: 41.83
iter: 018000 - valid loss: 0.689 - acc full seq: 0.9647 - acc per char: 0.9979
iter: 018200 - train loss: 0.717 - lr: 9.80e-04 - load time: 0.03 - gpu time: 38.36
iter: 018400 - train loss: 0.722 - lr: 9.79e-04 - load time: 0.03 - gpu time: 38.00
iter: 018600 - train loss: 0.718 - lr: 9.78e-04 - load time: 0.03 - gpu time: 38.51
iter: 018800 - train loss: 0.718 - lr: 9.77e-04 - load time: 0.03 - gpu time: 38.38
iter: 019000 - train loss: 0.716 - lr: 9.76e-04 - load time: 0.03 - gpu time: 39.44
iter: 019200 - train loss: 0.717 - lr: 9.74e-04 - load time: 0.03 - gpu time: 38.02
iter: 019400 - train loss: 0.717 - lr: 9.73e-04 - load time: 0.03 - gpu time: 39.61
iter: 019600 - train loss: 0.714 - lr: 9.72e-04 - load time: 0.03 - gpu time: 39.49
iter: 019800 - train loss: 0.715 - lr: 9.71e-04 - load time: 0.03 - gpu time: 39.07
iter: 020000 - train loss: 0.712 - lr: 9.70e-04 - load time: 0.03 - gpu time: 39.55
iter: 020200 - train loss: 0.719 - lr: 9.69e-04 - load time: 0.03 - gpu time: 37.91
iter: 020400 - train loss: 0.715 - lr: 9.67e-04 - load time: 0.03 - gpu time: 38.70
iter: 020600 - train loss: 0.719 - lr: 9.66e-04 - load time: 0.03 - gpu time: 38.38
iter: 020800 - train loss: 0.720 - lr: 9.65e-04 - load time: 0.03 - gpu time: 39.47
iter: 021000 - train loss: 0.715 - lr: 9.64e-04 - load time: 0.03 - gpu time: 38.76
iter: 021000 - valid loss: 0.689 - acc full seq: 0.9629 - acc per char: 0.9968
iter: 021200 - train loss: 0.714 - lr: 9.62e-04 - load time: 0.03 - gpu time: 40.00
iter: 021400 - train loss: 0.717 - lr: 9.61e-04 - load time: 0.03 - gpu time: 38.70
iter: 021600 - train loss: 0.712 - lr: 9.60e-04 - load time: 0.03 - gpu time: 37.76
iter: 021800 - train loss: 0.715 - lr: 9.58e-04 - load time: 0.03 - gpu time: 38.93
iter: 022000 - train loss: 0.719 - lr: 9.57e-04 - load time: 0.03 - gpu time: 37.76
iter: 022200 - train loss: 0.709 - lr: 9.55e-04 - load time: 0.03 - gpu time: 38.47
iter: 022400 - train loss: 0.710 - lr: 9.54e-04 - load time: 0.03 - gpu time: 39.21
iter: 022600 - train loss: 0.715 - lr: 9.52e-04 - load time: 0.03 - gpu time: 37.64
iter: 022800 - train loss: 0.715 - lr: 9.51e-04 - load time: 0.03 - gpu time: 39.90
iter: 023000 - train loss: 0.710 - lr: 9.49e-04 - load time: 0.03 - gpu time: 39.43
iter: 023200 - train loss: 0.715 - lr: 9.48e-04 - load time: 0.03 - gpu time: 38.09
iter: 023400 - train loss: 0.713 - lr: 9.46e-04 - load time: 0.03 - gpu time: 39.07
iter: 023600 - train loss: 0.714 - lr: 9.45e-04 - load time: 0.03 - gpu time: 38.08
iter: 023800 - train loss: 0.717 - lr: 9.43e-04 - load time: 0.03 - gpu time: 37.75
iter: 024000 - train loss: 0.712 - lr: 9.41e-04 - load time: 0.03 - gpu time: 38.76

from vietocr.

pbcquoc avatar pbcquoc commented on July 19, 2024

Tức là ảnh train là ảnh phát sinh, còn ảnh test là ảnh thực tế chụp/scan từ điện thoai hả e

from vietocr.

chauthehan avatar chauthehan commented on July 19, 2024

đúng rồi anh, em phát sinh ảnh để train, rồi kiểm tra trên ảnh thực tế

from vietocr.

chauthehan avatar chauthehan commented on July 19, 2024

Vâng em cảm ơn ạ

from vietocr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.