Giter VIP home page Giter VIP logo

pgstosrt's People

Contributors

joooostb avatar nmcglohon avatar opaetzel avatar robertbaker avatar segator avatar tentacule avatar tzvetkoff avatar vertigo235 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pgstosrt's Issues

read_params_file: parameter not found:

when executing
dotnet D:\pgstosrt\PgsToSrt.dll --input "D:\23.sup" --output "D:\23.srt" --tesseractlanguage tha
on a specific sup file I got
2019/12/07 17:49:26.754|INFO|Starting OCR for 8 items...
read_params_file: parameter not found:

yes I have the tha tesseractlanguage
you can find the sup file here
23.zip

Exception occurs on Ubuntu 20.04

$ dotnet PgsToSrt.dll --input test.sup --output test.srt --tesseractlanguage eng
PgsToSrt 1.3.0.0

2021/01/31 13:38:00.145|INFO|Detected tesseract language data for language 'eng'.
2021/01/31 13:38:00.180|INFO|Starting OCR for 285 items...
2021/01/31 13:38:00.228|ERROR|Error: Exception has been thrown by the target of an invocation. at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture)
at System.Activator.CreateInstance(Type type, BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)
at System.Activator.CreateInstance(Type type, Object[] args)
at Tncl.NativeLoader.NativeInstance.CreateInstance(NativeLoader loader, Type interfaceType)
at PgsToSrt.TesseractApi.Initialize()
at PgsOcr.DoOcr() Exception has been thrown by the target of an invocation.

I am using:

  • Ubuntu 20.04.2 LTS
  • libtesseract4
  • dotnet 5.0.102

$ ldconfig -p -v | grep libdl
libdl.so.2 (libc6,x86-64, OS ABI: Linux 3.2.0) => /lib/x86_64-linux-gnu/libdl.so.2
libdl.so.2 (libc6, OS ABI: Linux 3.2.0) => /lib/i386-linux-gnu/libdl.so.2
libdl.so (libc6,x86-64, OS ABI: Linux 3.2.0) => /lib/x86_64-linux-gnu/libdl.so

Simiar to issue #6, caused by the missing libtesseract3 package

Bulk Conversion

Is it possible to tell PgsToSrt to convert all files in a directory? Does it depend on the container?

Question

Bonjour,
J'ai bien utilisé votre création qui fonctionne très bien.
Je vous avoue l'avoir utilisé pour mon logiciel "TAO-MKV" mais je suis passé par un autre processus en tesseract 5.0.X LSTM.
Cependant même si votre programme ne soit pas rapide, il reste très efficace ET surtout léger (j'ai pu le réduire à 19Mo contre 250Mo pour l'officiel "sans les tessdata bien sur" ).
Je n'ai aucune idée de comment extraire les bmp sous-titres et les timestamps (sous forme texte) mais si vous souhaitez inclure votre savoir faire, rapidité et efficacité dans TAO-MKV, https://github.com/serpafi/TAO-MKV
on sera ravi de votre travail qui ne sera pas spoiler mais mis en avant (textes, liens ou autres seront publiés directement dans le logiciel).
Cordialement

How to Use with MKS?

I see the examples for MKV and sup files, but I used MKVToolNix GUI to extract my subtitles, which gave me MKS files. Also, how can I have PgsToSrt convert all tracks in the MKS file?

Problem with Ubuntu 23.04 and libtesseract4

I wanted to use PgsToSrt under Ubuntu 23.04 but it comes with libtesseract5 and I did not find what to install libtesseract4. Is it possible to make PgsToSrt use libtesseract5 instead of libtesseract4?

Update upstream dependencies, consider re-doing your customizations on top of the original code instead.

You changed the code-style of the original code which makes merging upstream changes a lot harder on yourself.
In one case, IMO get => TimeSpan.Milliseconds; is more readable than what you changed it to:

get
{
    return TimeSpan.Milliseconds;
}

You can refactor your modifications so you override functionality and thus you can use the upstream nuget package and make updates a lot easier for those dependencies. Git submodules can be utilized if you must have a clone of the code in your project.

Fix OCR errors option ?

Do you plan on adding "Fix OCR errors" like subtitle edit option to resolve badly OCRd text ?

Get dotnet error during execution

Hi!

I am seeking your help today as I see the following error when executing :

dotnet PgsToSrt.dll --input /video/input/Sieben.mkv --track 3 --output /video/input/test.srt
PgsToSrt 1.1.0.0

2020/06/15 10:09:20.485|INFO|Detected tesseract language data for language 'deu'.
2020/06/15 10:09:21.191|INFO|Starting OCR for 1729 items...
2020/06/15 10:09:21.244|ERROR|Error: Exception has been thrown by the target of an invocation. at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)
at Tncl.NativeLoader.NativeInstance.CreateInstance[T](NativeLoader loader)
at Tesseract.Interop.TessApi.Initialize(NativeLoader loader) in /root/PgsToSrt/Tesseract/Interop/BaseApi.cs:line 355
at Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode, IEnumerable1 configFiles, IDictionary2 initialOptions, Boolean setOnlyNonDebugVariables) in /root/PgsToSrt/Tesseract/TesseractEngine.cs:line 66
at PgsOcr.DoOcr() Exception has been thrown by the target of an invocation.

My system details are:

[root@nvidia out]# tesseract --version
tesseract 3.04.00
leptonica-1.72
libgif 4.1.6(?) : libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7 : libwebp 0.3.0

[root@nvidia out]# uname -a
Linux nvidia.home 3.10.0-1127.10.1.el7.x86_64 #1 SMP Wed Jun 3 14:28:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

[root@nvidia out]# dotnet --version
2.1.807

Do you have any idea about the root cause?
Am I missing something?

BR

Model used for trained data

It doesn't seem like the Tesseract trained data set is optional (i.e. 'fast' vs 'best') and as far as I can tell, you are using 'fast'. Is that the case?

There may also be corruption somewhere in the trained data you have (at least, for eng) as I just noticed totally nonsensical series of characters in the conversion of a single basic word when it is multi-line. Something like...

The brown fox jumps over the lazy
qj2]a%sLo1

Error during execution

Hi - Trying to use your script and got the following error:
PgsToSrt 1.0.0.0

2019/11/26 19:57:26.699|INFO|Detected tesseract language data for language 'spa'.
2019/11/26 19:57:26.783|INFO|Detected tesseract language data for language 'eng'.
2019/11/26 19:57:27.011|INFO|Starting OCR for 606 items...
2019/11/26 19:57:27.114|ERROR|Error: Exception has been thrown by the target of an invocation. Exception has been thrown by the target of an invocation.

Not sure how to generate more debug info. Any assistance is appreciated .. Thanks.

Linux Docker build fails due to .NET 6

The Dockerfile in the current master branch uses a .NET SDK 5.0 base image which can't target .NET 6 targets.

Relevant output for docker build -t pgstosrt .

Step 4/9 : RUN cd /src &&     dotnet restore  &&     dotnet publish -c Release -o /src/PgsToSrt/out &&     mv /src/entrypoint.sh /entrypoint.sh && chmod +x /entrypoint.sh &&     mv /src/PgsToSrt/out /app
 ---> Running in 94e0c48afbe5
  Determining projects to restore...
/usr/share/dotnet/sdk/5.0.101/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.TargetFrameworkInference.targets(141,5): error NETSDK1045: The current .NET SDK does not support targeting .NET Core 6.0.  Either target .NET Core 5.0 or lower, or use a version of the .NET SDK that supports .NET Core 6.0. [/src/PgsToSrt/PgsToSrt.csproj]
The command '/bin/sh -c cd /src &&     dotnet restore  &&     dotnet publish -c Release -o /src/PgsToSrt/out &&     mv /src/entrypoint.sh /entrypoint.sh && chmod +x /entrypoint.sh &&     mv /src/PgsToSrt/out /app' returned a non-zero code: 1

Steps to reproduce:

git clone https://github.com/Tentacule/PgsToSrt.git
cd PgsToSrt

# checkout the latest release v1.4.2 or master at 38fd03e57f
git checkout v1.4.2

docker build -t pgstosrt .

Quick fix:

Change the .NET SDK base image to 6.0
Specify the framework to target net6.0 (because the project specifies both 5.0 or 6.0 as potential targets, one must be explicitly chosen)

Alternatively, leave the .NET SDK base image to 5.0.101
Specify the framework to target net5.0

Error using docker

Hi,

I want to convert .sup (PGS) to .srt with your libs using docker :

docker run -it -v /share/CACHEDEV1_DATA/Multimedia/Movies/Test:/data -e INPUT=/data/Mission.Impossible.Fallout.2018.MULTi.TRUEFRENCH.2160p.UHD.BluRay.REMUX.DV.HEVC-BEO.6.en.sup -e LANGUAGE=eng tentacule/pgstosrt

But I have this error :

2020/12/11 11:36:28.418|ERROR|Error: Exception has been thrown by the target of an invocation. at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture)
at System.Activator.CreateInstance(Type type, BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)
at System.Activator.CreateInstance(Type type, Object[] args)
at Tncl.NativeLoader.NativeInstance.CreateInstance(NativeLoader loader, Type interfaceType)
at PgsToSrt.TesseractApi.Initialize()
at PgsOcr.DoOcr() Exception has been thrown by the target of an invocation.

I have a QNAP NAS.

Thanks in advance for your help.

Erwan

Accept other leptonica's names

On ArchLinux the required library liblept.so is actually called libleptonica.so. I had to make the following symlink in order to make PgsToSrt to read it: ln -s /usr/lib/libleptonica.so.6 /usr/lib/liblept.so.5. Would be good if the program could read the name libleptonica.so by itself.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.