Comments (9)
As a workaround, you can run sudo start ecs
on the instance to get it to progress out of the runnable state.
from raster-vision.
Tagging @azavea/operations in case it is of interest.
from raster-vision.
Hm. It is possible that the root cause is buried in another log (possibly the Docker agent log given this agent's inability to connect).
There have also been a flurry of updates to the ECS agent. Are we on a current version (one that supports Docker 17)?
from raster-vision.
__| __| __|
_| ( \__ \ Amazon ECS-Optimized Amazon Linux AMI 2017.03.g
____|\___|____/
[ec2-user@ip-172-31-38-99 ~]$ docker --version
Docker version 1.12.6, build 7392c3b/1.12.6
from raster-vision.
This is one the problems of running off of a pre-baked AMI, instead of the official ECS AMI with cloud init installing everything else. If we could do the latter, we could simply bump the ECS AMI ID to the latest version; what we should do now is re-bake our custom AMI off of the latest ECS AMI.
from raster-vision.
We can probably take a stab and putting something more reproducible together if you can point us to the current steps.
from raster-vision.
The AMI we are using for Batch instances was generated as follows:
- Start an instance with a 128 GB root volume and 22 GB data volume using the ECS-optimized AMI with id ami-275ffe1 (see http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI_launch_latest.html). There needs to be a 22 GB data volume or Docker will not work for reasons I don't understand. Having the root volume be 128 GB seems wasteful, but it was the easiest way of increasing the disk space, since Batch doesn't provide a way of explicitly setting the disk space for a compute environment.
- Install NVIDIA drivers 375.51 using instructions in http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/accelerated-computing-instances.html
- Run
sudo stop ecs
and thensudo rm /var/lib/ecs/data/ecs_agent_data.json
- Create the AMI
from raster-vision.
The above process was based on the recommendations in http://docs.aws.amazon.com/batch/latest/userguide/batch-gpu-ami.html, although we are not using their recipe for exposing the GPU to the container. Instead, we are using the recipe in https://blog.cloudsight.ai/deep-learning-image-recognition-using-gpus-in-amazon-ecs-docker-containers-5bdb1956f30e. We might want to go with the officially recommended way.
from raster-vision.
I created #74 to capture the task. I agree that the recommended way makes sense to chase. Please comment in that issue's thread with any additional tweaks we may need.
from raster-vision.
Related Issues (20)
- Unable to install RasterVision HOT 3
- Issues with using model bundle for prediction HOT 15
- Cannot import ClassConfig on Kaggle HOT 16
- Cannot save prediction using colors from ClassConfig HOT 4
- Improve unit test coverage of CLI and `Runner`s
- Cannot plot batch with ObjectDetectionVisualizer HOT 4
- Multi-temporal raster source visualizer fails when batch size is 1 HOT 2
- Make it possible to exclude "null" class labels from the computation of metrics HOT 3
- RuntimeError: expected scalar type Long but found Int HOT 10
- Allow user to specify AOI box filtering behavior in sliding window datasets HOT 1
- self._hds cannot be converted to a Python object for pickling HOT 2
- Semantic Segmentation Labels not initializing properly from predictions when extent provided HOT 2
- use my trained modle to prediction ,has wrong happened HOT 2
- RuntimeError: The size of tensor a (82) must match the size of tensor b (64) at non-singleton dimension 3 HOT 4
- Migrate to `pydantic` v2
- MPL notice for use of everett library and LGPL for triangle
- v0.30 release checklist
- `ModuleNotFoundError: No module named 'rastervision.examples'` when running command from examples doc HOT 1
- Add ability to use different Objectdetection models than FasterRCNN HOT 1
- BATCH_CPU_JOB_QUEUE requires a value parseable by str HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from raster-vision.