ucy-linc-lab / fogify Goto Github PK
View Code? Open in Web Editor NEWA Fog Computing Emulation Framework
License: Apache License 2.0
A Fog Computing Emulation Framework
License: Apache License 2.0
I'm trying to set CPU and RAM restrictions for my docker containers but they don't seem to really apply. As far as I can tell, fogify only checks the sum of resources against the host system on startup, instead of actually enforcing them. There are multiple issues with this.
This is also connected to #3 I guess.
Hi,
This is a really great project!
I'm trying to run a Kubernetes cluster in Fogify. For this purpose, I've set up a docker-compose file and shell scripts (see here) that initialize a Kubernetes cluster using the Docker images and bootstrapping procedure from the kind project. Running the cluster with docker-compose works, but with Fogify, the containers keep crashing, because they are not executed in privileged mode.
Steps to reproduce the problem:
docker ps --all
control-plane
container. They will contain the following:WARN: /dev/kmsg does not exist, nor does /dev/console!
INFO: ensuring we can execute mount/umount even with userns-remap
INFO: remounting /sys read-only
mount: /sys: permission denied.
Using a docker-compose file with just the kind base image should allow reproducing the problem as well. When using this option, it is important that the container is configured with the following options:
volumes:
# Ensures that pods, logs etc. are not on the container filesystem.
- "/var"
# Some K8s things want to read /lib/modules.
- "/lib/modules:/lib/modules:ro"
tmpfs:
- "/tmp" # various things depend on working /tmp
- "/run" # systemd wants a writable /run
privileged: true
security_opt:
- "seccomp=unconfined"
- "apparmor=unconfined"
[ui 4/4] RUN pip install /FofigySDK && fix-permissions /opt/conda && fix-permissions /home/jovyan:
11.03 Processing /FofigySDK
11.05 Preparing metadata (setup.py): started
12.48 Preparing metadata (setup.py): finished with status 'done'
12.49 Requirement already satisfied: pandas in /opt/conda/lib/python3.11/site-packages (from FogifySDK==0.0.3) (2.1.1)
12.49 Requirement already satisfied: requests in /opt/conda/lib/python3.11/site-packages (from FogifySDK==0.0.3) (2.31.0)
12.49 Requirement already satisfied: pyyaml in /opt/conda/lib/python3.11/site-packages (from FogifySDK==0.0.3) (6.0.1)
12.49 Requirement already satisfied: matplotlib in /opt/conda/lib/python3.11/site-packages (from FogifySDK==0.0.3) (3.8.0)
12.51 Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->FogifySDK==0.0.3) (1.1.1)
12.51 Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.11/site-packages (from matplotlib->FogifySDK==0.0.3) (0.12.1)
12.52 Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->FogifySDK==0.0.3) (4.43.1)
12.52 Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->FogifySDK==0.0.3) (1.4.5)
12.53 Requirement already satisfied: numpy<2,>=1.21 in /opt/conda/lib/python3.11/site-packages (from matplotlib->FogifySDK==0.0.3) (1.24.4)
12.53 Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->FogifySDK==0.0.3) (23.2)
12.53 Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->FogifySDK==0.0.3) (10.1.0)
12.53 Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->FogifySDK==0.0.3) (3.1.1)
12.54 Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.11/site-packages (from matplotlib->FogifySDK==0.0.3) (2.8.2)
12.64 Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.11/site-packages (from pandas->FogifySDK==0.0.3) (2023.3.post1)
12.64 Requirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.11/site-packages (from pandas->FogifySDK==0.0.3) (2023.3)
12.66 Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests->FogifySDK==0.0.3) (3.3.0)
12.66 Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests->FogifySDK==0.0.3) (3.4)
12.66 Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.11/site-packages (from requests->FogifySDK==0.0.3) (2.0.7)
12.66 Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.11/site-packages (from requests->FogifySDK==0.0.3) (2023.7.22)
12.80 Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib->FogifySDK==0.0.3) (1.16.0)
12.83 Building wheels for collected packages: FogifySDK
12.83 Building wheel for FogifySDK (setup.py): started
14.53 Building wheel for FogifySDK (setup.py): finished with status 'error'
14.55 error: subprocess-exited-with-error
14.55
14.55 × python setup.py bdist_wheel did not run successfully.
14.55 │ exit code: 1
14.55 ╰─> [5 lines of output]
14.55 running bdist_wheel
14.55 running build
14.55 running build_py
14.55 creating build
14.55 error: could not create 'build': Permission denied
14.55 [end of output]
14.55
14.55 note: This error originates from a subprocess, and is likely not a problem with pip.
14.55 ERROR: Failed building wheel for FogifySDK
14.55 Running setup.py clean for FogifySDK
14.90 Failed to build FogifySDK
14.90 ERROR: Could not build wheels for FogifySDK, which is required to install pyproject.toml-based projects
failed to solve: process "/bin/bash -o pipefail -c pip install /FofigySDK && fix-permissions $CONDA_DIR && fix-permissions /home/$NB_USER" did not complete successfully: exit code: 1
Hi,
I've been having trouble getting fogify to properly deploy docker containers. I keep getting a constraints error based on placement. If I run a basic deployment with 3 nodes, I observe the following after deployment
> sudo docker stack ps fogify
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ythsey7iev4oaz2xjibsbta80 fogify_node-1.1 taxi-exp:0.0.1 Running Pending 33 seconds ago "no suitable node (scheduling constraints not satisfied on 1 node)"
phz53b4hxunbn10g1kc5dwot7 fogify_node-2.1 taxi-exp:0.0.1 Running Pending 34 seconds ago "no suitable node (scheduling constraints not satisfied on 1 node)"
um5iw455ixsfebs94wi8wf0nt fogify_node-3.1 taxi-exp:0.0.1 Running Pending 34 seconds ago "no suitable node (scheduling constraints not satisfied on 1 node)"
In order to fix it I removed the contraints on placement specified in DockerBasedConnectors.node_representation
. Specifically the ['placement']
constraints and the if statements about the main_cluster_node. I don't know if this is the proper fix, so happy to hear alternatives.
Thanks,
Tom Ebergen
While trying to enforce bandwidth/latency limits, I'm not sure how the settings finally apply. It certainly seems that latency plays a big role with respect to bandwidth as well. Latency, by itself, is however mostly OK. The problem is the effect it has on bandwidth. I'm running the following setups with 4 containers, deployed as two server/client pairs. These are only some examples that show this problem, I've run a lot more similar setups. I'm using iperf3 to calculate the actual bandwidth between containers.
bandwidth: 100Mbps
(bidirectional) and delay: 3ms
for all 4 containers. Ping results between containers are indeed ~3ms, but bandwidth is actually ~750Mbits/secbandwidth: 100Mbps
(bidirectional) and delay: 30ms
for all 4 containers. Ping results between containers are indeed ~30ms, but bandwidth is actually ~710Mbits/secbandwidth: 100Mbps
(bidirectional) and delay: 60ms
for all 4 containers. Ping results between containers are indeed ~3ms, but bandwidth is actually ~380 Mbits/secbandwidth: 1000Mbps
(bidirectional) and delay: 3ms
for all 4 containers. Ping results between containers are indeed ~3ms, but bandwidth is actually ~7.10 Gbits/secbandwidth: 1000Mbps
(bidirectional) and delay: 30ms
for all 4 containers. Ping results between containers are indeed ~30ms, but bandwidth is actually ~790Mbits/secbandwidth: 10000Mbps
(bidirectional) and delay: 3ms
for all 4 containers. Ping results between containers are indeed ~3ms, but bandwidth is actually ~7.10 Gbits/secbandwidth: 10000Mbps
(bidirectional) and delay: 30ms
for all 4 containers. Ping results between containers are indeed ~30ms, but bandwidth is actually ~780Mbits/secbandwidth: 10000Mbps
(bidirectional) and delay: 60ms
for all 4 containers. Ping results between containers are indeed ~3ms, but bandwidth is actually ~380 Mbits/secIf you want to try similar setups, you can look into the start-network-test.sh
script in this repo: https://github.com/Datalab-AUTH/fogify-db-benchmarks
I should note that on my system, connecting 2 docker containers with no traffic shaping whatsoever (and without fogify) results to actual bandwidth measurements of ~27Gbits/sec, so there is no bottleneck on the host system.
When running deploy()
I get a message that everything has been deployed, but that may not be true, at least not yet.
This is an issue that is not the same as described in #3. It may occur, even if (eventually) all containers can be started.
The problem is that especially with a large number of containers to be started, the process of actually starting them is not instant. In fact, it might take several minutes until all containers are up and running.
In many cases, I can run deploy()
, which returns with a "deployed" message, but if I then run fogify.info()
, I may even get a blank response, or more usually not see all containers up yet. Simply by waiting a few minutes and running fogify.info()
again, I eventually get the correct deployment info with all containers up.
This problem is not apparent on the python REPL/running in jupyter step by step or with a small number of containers, but it becomes apparent when trying to automate things with a larger number of containers.
I had to add a lot of time.sleep()
statements in my code to actually have everything up as it should (like wait 20 seconds for each additional container), but I think the best way to handle this is for deploy()
to return with a successful message only after all specified containers are actually up. I guess a timeout parameter can be added after which we can assume that deployment has failed and deploy()
returns a failing message, while at the same time undeploying all containers that happened to already be up.
Hi,
I tried running a fresh installation of the fogify framework, but ran into issues with certain python packages no longer being supported. For instance I after running
sudo docker-compose build
sudo docker-compose -p fogemulator up
I would see the following error messages
controller_1 | Traceback (most recent call last):
controller_1 | File "/code/fogify/main.py", line 2, in <module>
controller_1 | from agent.agent import Agent
controller_1 | File "/code/fogify/agent/agent.py", line 3, in <module>
controller_1 | from connectors import get_connector
controller_1 | File "/code/fogify/connectors/__init__.py", line 2, in <module>
controller_1 | from . import materialized_connectors
controller_1 | File "/code/fogify/connectors/materialized_connectors/__init__.py", line 1, in <module>
controller_1 | from .DockerBasedConnectors import SwarmConnector, DockerComposeConnector
controller_1 | File "/code/fogify/connectors/materialized_connectors/DockerBasedConnectors.py", line 9, in <module>
controller_1 | from flask_api import exceptions
controller_1 | File "/usr/local/lib/python3.7/site-packages/flask_api/__init__.py", line 1, in <module>
controller_1 | from flask_api.app import FlaskAPI
controller_1 | File "/usr/local/lib/python3.7/site-packages/flask_api/app.py", line 4, in <module>
controller_1 | from flask._compat import reraise, string_types, text_type
controller_1 | ModuleNotFoundError: No module named 'flask._compat'
...
agent_1 | Traceback (most recent call last):
agent_1 | File "/code/fogify/main.py", line 2, in <module>
agent_1 | from agent.agent import Agent
agent_1 | File "/code/fogify/agent/agent.py", line 3, in <module>
agent_1 | from connectors import get_connector
agent_1 | File "/code/fogify/connectors/__init__.py", line 2, in <module>
agent_1 | from . import materialized_connectors
agent_1 | File "/code/fogify/connectors/materialized_connectors/__init__.py", line 1, in <module>
agent_1 | from .DockerBasedConnectors import SwarmConnector, DockerComposeConnector
agent_1 | File "/code/fogify/connectors/materialized_connectors/DockerBasedConnectors.py", line 9, in <module>
agent_1 | from flask_api import exceptions
agent_1 | File "/usr/local/lib/python3.7/site-packages/flask_api/__init__.py", line 1, in <module>
agent_1 | from flask_api.app import FlaskAPI
agent_1 | File "/usr/local/lib/python3.7/site-packages/flask_api/app.py", line 4, in <module>
agent_1 | from flask._compat import reraise, string_types, text_type
agent_1 | ModuleNotFoundError: No module named 'flask._compat
To fix this I added strict versioning to the requirements.txt file so it would look like
Flask==1.1.2
Werkzeug==1.0.1
Flask-API==2.0
requests==2.25.1
docker==5.0.0
Flask-SQLAlchemy==2.5.1
pyyaml==5.4.1
python-dateutil==2.8.1
psutil==5.8.0
py-cpuinfo==8.0.0
netifaces==0.11.0
nsenter==0.2
uWSGI==2.0.19.1
Let me know if you have any questions.
--Tom Ebergen
I'm trying to run command()
, but no matter what I do it doesn't seem to work. I get an OK
message, but the command I specified, never really runs on any of the containers. For example, I'm trying to run a touch /testfile
command, I get an OK
response, but the file never appears in any of the running containers.
Here is an example scenario: on a host with 16GB RAM, I'm trying to run 20 containers with 1GB RAM allocated each.
When running fogify.deploy()
I get an OK
message, that everything is deployed. But that is not true. In fact, at least 4 of my containers are never started and the choice seems to be random. It appears that fogify checks the sum of resources needed against the host system and does not start any containers that exceed the host system resources (but not the actual available host resources, i.e. it checks against the total of 16GB, but not against the 8GB that may possible be actually available at that point, I guess that's another issue?).
I'm not sure if not starting the containers at all is the right course of action here. After all, I am able to start as many (or more) such containers in my system without fogify, or even by using fogify and setting lower RAM limits. But my main issue is that it all happens silently and the user is left to believe that everything went well and all containers are up.
In a scenario such as the one described in #3 not all specified containers are started. However, if I run undeploy()
it reports that all containers have been undeployed (for example 20/20, while only 16/20 were actually deployed in the first place). This is misleading of what actually happens.
Hi!
I am trying to deploy fogify to run an experiment, and I'm trying to run that through the docker compose rather than baremetal.
However even though I set the variables in a .env file and pass that to the containers, the IP of the host is not used to run the agent and controller
Is that an expected behavior or does that mean I missed some setting for the deployment?
Thank you!
Namespace(agent=True, agent_ip='192.168.1.102', controller=False)
Running on http://0.0.0.0:5500/ (Press CTRL+C to quit)
Hi,
I have been working on getting Fogify to work on an ubuntu virtual image running in Vagrant. When attempting to deploy a topology, I was getting the following error from the controller
controller_1 | During handling of the above exception, another exception occurred:
controller_1 |
controller_1 | Traceback (most recent call last):
controller_1 | File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
controller_1 | rv = self.dispatch_request()
controller_1 | File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
controller_1 | return self.view_functions[rule.endpoint](**req.view_args)
controller_1 | File "/usr/local/lib/python3.7/site-packages/flask/views.py", line 89, in view
controller_1 | return self.dispatch_request(*args, **kwargs)
controller_1 | File "/usr/local/lib/python3.7/site-packages/flask/views.py", line 163, in dispatch_request
controller_1 | return meth(*args, **kwargs)
controller_1 | File "/code/fogify/controller/views.py", line 277, in post
controller_1 | res.append(Communicator(get_connector()).agents__perform_action(commands, instance_type=instance_type))
controller_1 | File "/code/fogify/utils/inter_communication.py", line 50, in agents__perform_action
controller_1 | File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 119, in post
controller_1 | return request('post', url, data=data, json=json, **kwargs)
controller_1 | File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 61, in request
controller_1 | return session.request(method=method, url=url, **kwargs)
controller_1 | File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
controller_1 | resp = self.send(prep, **send_kwargs)
controller_1 | File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
controller_1 | r = adapter.send(request, **kwargs)
controller_1 | File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
controller_1 | raise ConnectionError(e, request=request)
controller_1 | requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=5500): Max retries exceeded with url: /actions/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f51bc61f910>: Failed to establish a new connection: [Errno 111] Connection refused'))
I tracked it down to the utils/Communicator class. It looks like agents__perform_action
defaults to 0.0.0.0
for the address of the agents if everything is hosted on one machine. I think this should be changed to agent
so that the internal docker DNS can resolve the URL properly. In vagrant, 0.0.0.0
doesn't work as the virtual machine adopts a different host IP (not 100% sure how it works). Anyway, my fix was to check if socket.gethostbyname(i)
returns 0.0.0.0
. If so, the host name is changed to agent
.
Let me know if there are any questions, this issue is probably more confusing than the others
--Tom Ebergen
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.