Comments (8)
I've got one working here: https://hub.docker.com/r/dibz15/marker_docker
It works, but right now it re-downloads the necessary resources on each run. If someone figures out how to get those to cache, that'd be great! Nevermind, got the HF models cached in the image now!
from marker.
@agarwalshashank95 You can build off of their image, e.g.
FROM dibz15/marker_docker:latest
RUN pip install ray
RUN pip uninstall -y torch torchvision torchaudio
RUN pip3 install torch torchvision
COPY local.env /usr/src/app/marker/marker/local.env
RUN mkdir /.cache && chmod -R 777 /.cache
with local.env
in the same directory as
TORCH_DEVICE="cuda"
and
USER_ID=$(id -u)
GROUP_ID=$(id -g)
docker run --shm-size=10.24gb --gpus all -v "$PDF_DIR_SANITIZED":/pdfs --user $USER_ID:$GROUP_ID marker:latest python convert.py /pdfs/ /pdfs/
That said, it be great if there were a repo managed Dockerfile that we could all reference ...
from marker.
Here's the repo that I hosted the Dockerfile. I forgot to set it public.
from marker.
@Dibz15 would it be possible to share the Dockerfile for building it locally. It seems the Convert multiple file script "convert.py" doesn't work, probably because of a missing dependency.
from marker.
I started a repo here that uses @Dibz15 's docker image to generate markdown
from marker.
@robinsonkwame Thanks a ton! Didn't realize I could have used the existing Docker itself and built on top of that. This would work perfectly for my use case.
But yes I agree, there should be an official docker that we can all refer to.
from marker.
Hey, sorry I lost track of this. I didn't plan to run mine on a system with CUDA supported, so I didn't even think about that, sorry. Looks like it's been taken care of, though.
from marker.
how do I add fast api to this app?
from marker.
Related Issues (20)
- Need docker deployment with Fast API enabled option HOT 1
- Unable to run. HOT 1
- Force formula recognition even on text
- INFERENCE_RAM setting not preventing CUDA OOM HOT 3
- Error when running marker in docker-compose HOT 2
- Is there a way to restrict the areas of a page that are read? HOT 2
- TypeError in batch processing: HOT 2
- Images HOT 1
- Models' Storage Location HOT 2
- Error no file named pytorch_model.bin, model.safetensors HOT 1
- OSError: We couldn't connect to 'https://huggingface.co' to load this file, HOT 7
- images类型的PDF不支持
- chunk_convert.sh saves no files to output directory
- i want to know how to get score in formula
- Text directions
- "Tables are not always formatted 100% correctly"
- Memory Leak when Converting Long PDFs to Markdown
- [feature request] Roadmap for Converting PDFs to Complex Markup Languages (AsciiDoc, TeX/LaTeX, HTML+CSS)
- access is denied
- Unexpected keyword argument 'interpolate_pos_encoding' HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from marker.