Giter VIP home page Giter VIP logo

nm-exp-active-netrics's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nm-exp-active-netrics's Issues

iperf3 parse error

     --- NETWORK MEASUREMENTS ---
    
     --- iperf Bandwidth and Jitter ---
    upload bandwidth: 42.2 Mb/s
    upload jitter: 0.407 ms
    Traceback (most recent call last):
      File "./src/netrics.py", line 291, in <module>
        output['iperf'] = test.iperf3_bandwidth(client=server, port=port)
      File "/usr/local/src/nm-exp-active-netrics/src/netrics/netson.py", line 482, in iperf3_bandwidth
        measured_bw[direction] = iperf_res.split()[6]
    IndexError: list index out of range

this is probably caused by fragile awk NR filtering at https://github.com/chicago-cdac/nm-exp-active-netrics/blob/main/src/netrics/netson.py#L476

eg. a line like the below may or may not appear in the output.
[SUM] 0.0-10.0 sec 31 datagrams received out-of-order

@kyle-macmillan I'd need a confirmation of the precise information you're looking for from the output, can we filter it via grep sender/receiver? thanks, G.

Add personalized survey link to local dashboard

survey needs to be personalized (install ID) and should be launched from the local dashboard.

We are using qualtrics for the survey, here is a link about personal links

We will collect the participants' contact information during a screening questionnaire and can add this and their device ID to the survey link. I'll also give you edit access to the survey. Let me know if you need anything else

Grafana Dashboard Layout/Configuration

Description: Define a dashboard layout to serve as a "model" to be replicated/standardized across the deployment for both test subjects and sampled subjects.
Goal: defining this sooner saves us time later as we won't need to go out and change every single dashboard's settings :-)

sample data inspection

please, validate:

  • 20210526-20210604-nm-mngd-sample6.tgz
  • 20210604-20210610-nm-mngd-sample7.tgz

check for what we can add to the json to facilitate conversation to csv and other formats:

  • timestamp ?
  • public ip, local IP ?
  • geoip location, isp ?

check whether we have sensitive info to remove and of course, missing data, malformation, error in output
question, should we use gz instead of pkl ?

thanks,

G

Deployment Workflow Documentation

Description: Divided into 2 documents:

  • Network Measurement Box Assembly
  • Device Deployment and Activation
    Requirements:
  • cms, active-netrics, passive-netrics, local dash packages tested and validated
  • The definition of both hardware and software to be (pre) installed
  • The definition of "if" or "when" to link random id to time participants email (and the box assembly time? or via GUI survey?)
  • The definition of the box layout
  • Content (?)

Goals:

  • with this documentation, someone can start assembling and delivering the measurement boxes

latency under load iperf error


 --- google ping latency under load ---
Packet Loss Under Load: 0.0%
Average RTT Under Load: 187.713 (ms)
Minimum RTT Under Load: 111.525 (ms)
Maximum RTT Under Load: 256.142 (ms)
RTT Std Dev Under Load: 42.515 (ms)
iperf3: error - unable to send control message: Connection reset by peer

 --- google ping latency under load ---
Packet Loss Under Load: 0.0%
Average RTT Under Load: 13.847 (ms)
Minimum RTT Under Load: 11.297 (ms)
Maximum RTT Under Load: 20.74 (ms)
RTT Std Dev Under Load: 2.701 (ms)
{'speedtest_ookla_download': 578.77356, 'speedtest_ookla_upload': 16.84809, 'speedtest_ookla_jitter': 8.005, 'speedtest_ookla_latency': 15.397, 'speedtest_ookla_pktloss': 0, 'speedtest_ndt7_download': 425.99939920796345, 'speedtest_ndt7_upload': 19.084749348277708, 'speedtest_ndt7_downloadretrans': 0.0, 'speedtest_ndt7_minrtt': 6.937, 'google_packet_loss_pct_under_ul': 0.0, 'google_rtt_min_ms_under_ul': 111.525, 'google_rtt_max_ms_under_ul': 256.142, 'google_rtt_avg_ms_under_ul': 187.713, 'google_rtt_mdev_ms_under_ul': 42.515, 'google_packet_loss_pct_under_dl': 0.0, 'google_rtt_min_ms_under_dl': 11.297, 'google_rtt_max_ms_under_dl': 20.74, 'google_rtt_avg_ms_under_dl': 13.847, 'google_rtt_mdev_ms_under_dl': 2.701}

IRB Revisions

Clarify PII question
Add RCR training from GM
Upload Survey, flyer, and website page

ookla speedtest crash

Description: Sometimes speedtest just crashes.

Retrieving speedtest.net configuration...
Testing from Comcast Cable ([REDACTED])...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Comcast (Chicago, IL) [8.08 km]: 25.077 ms
Testing download speed................................................................................
Download: 170.67 Mbit/s
Testing upload speed............................................................Exception in thread Thread-71:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3/dist-packages/speedtest.py", line 905, in run
    f = self._opener(request)
  File "/usr/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3/dist-packages/speedtest.py", line 537, in http_open
    return self.do_open(
  File "/usr/lib/python3.8/urllib/request.py", line 1354, in do_open
    r = h.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1347, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 289, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: ûD]¾à]ÁSq~=äÍ3v4ÒHTTP/1.1 200 OK

..........................................
Traceback (most recent call last):
  File "/usr/bin/speedtest", line 11, in <module>
    load_entry_point('speedtest-cli==2.1.2', 'console_scripts', 'speedtest')()
  File "/usr/lib/python3/dist-packages/speedtest.py", line 1986, in main
    shell()
  File "/usr/lib/python3/dist-packages/speedtest.py", line 1951, in shell
    speedtest.upload(
  File "/usr/lib/python3/dist-packages/speedtest.py", line 1664, in upload
    self.results.bytes_sent = sum(finished)
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Expected:
We should handle these crashes in Popen execution and log with "ERROR:" prefix (log.error).

netrics returning a position on a perf distribution

on the attempt to make netrics more popular to the general public or even "viral" by democratizing some of the analytics :-) :

after running netrics -s -u (speedtest):

-- Results --
Down: 910Mbps
Up: 10Mbps
About your performance, it did:
- top 55% of the users in your speed tier (global);
- top 45% of the users in your city (Chicago, IL);
- top 68% of the users from your ISP (Comcast);

this can be accomplished by processing data from the db and making it available through a simple serverless query (current WIP).
We'd release the nm-mgmt-collect-http package to be installed on any rpi / jetson nano along with exp-active-netrics and keep data separate from the managed (ours).

challenge: protect the anonymous perf db from bogus data.

@JamesSaxon @feamster

discrepancy between ookla tcp speedtest vs iperf udp

Description: this is a comcast 1gbps down / 40mbps up connection. Iperf UDP looks very close but Ookla speedtest is not even half of it. Speedtest via web gives me ~650Mbps.

image

Expected: speedtest-cli to match the performance we get from the web test (speedtest.net).

production servers

Description: the set up of the 2 new servers (from Princeton)
Requirements:

  • installation of ubuntu (18.x or 20.x)
  • docker
  • accessible via the internet
  • science DMZ required?

Goals:

  • to transfer server components (rabbit MQ and salt-stack) from staging (tt) to a Phase 1 deployment

stop/suspend button

A GUI feature: we need to give users the ability to suspend speedtest measurements to prevent degraded user's experience.

hardware evaluation

Description: we need to know the best hardware to use for Phase 1 deployment considering both passive and active measurements enabled
Requirements:

  • Microtik
  • Ubqt ER-X
  • tp-link 5-port switch
  • netgear 5-port swtich
  • Wifi routers (wifi6 ideally)
  • 1Gbps connection

Goals:

  • Decide Microtik vs ER-X
  • Decide tp-link vs netgear switches
  • Device on the best wifi router
  • The ideal hardware must produce stable lines for longitudinal speedtest/ndt7 measurements with <%1 Retransmission rate

move crontab to toml

Description: crontab / test schedule is currently defined in init.d. It should be defined in the toml instead and only be written to actual cron.d after issuing a command like netrics -S --schedule.

hops-to-backbone measurement error on ATT

I think what you're trying to do here is specific to Comcast and probably won't work with other ISPs.
https://github.com/chicago-cdac/nm-exp-active-netrics/blob/main/src/netrics/netson.py#L324

traceroute -m 15 -N 32 -w3 www.google.com | grep -m 1 ibone
 6  be-32221-cs02.350ecermak.il.ibone.comcast.net (96.110.40.53)  17.727 ms be-32231-cs03.350ecermak.il.ibone.comcast.net (96.110.40.57)  17.889 ms  17.135 ms

Here's our ATT's output:

    Traceback (most recent call last):
      File "./src/netrics.py", line 269, in <module>
        output['hops_to_backbone'] = test.hops_to_backbone(args.backbone)
      File "/usr/local/src/nm-exp-active-netrics/src/netrics/netson.py", line 331, in hops_to_backbone
        hops = int(tr_res_s[0])
    ValueError: invalid literal for int() with base 10: ''

run netrics as non-root user

That can be achieved by creating a user w no home, no password in .deb postinstall script, and adjusting crontab to run the jobs using that username.

better test scheduling

ideal solution imho: test execution lock acquire / wait to prevent multiple executions at the same time.

The --get-times generates a random time (minute) from a fixed 6-minute range calculated from taking netrics.json as initial "feed".
https://github.com/chicago-cdac/nm-exp-active-netrics/blob/main/src/nmexpactive/experiment.py#L144

Maybe there's a better / more configurable way of doing this, considering the fact that we moved netrics.json to /docs for easy access/editing. Maybe the numbers in .. + random.randrange(-3,3) should come from netrics.json.

If this is "too much" to cover right now, maybe we can start by fixing this 1 * * * .., it looks like the south shore speedtest server can get hammered at that minute.
echo "1 * * * * root /usr/local/bin/netrics -s ${upload} >>${log} 2>&1" >> /etc/cron.d/cron-nm-exp-active-netrics
https://github.com/chicago-cdac/nm-exp-active-netrics/blob/main/etc/init.d/nm-exp-active-netrics#L39

influxdb insert error

    {'speedtest_ookla_download': 119.317368, 'speedtest_ookla_upload': 5.895528, 'speedtest_ookla_jitter': 0.502, 'speedtest_ookla_latency': 8.373, 'speedtest_ookla_pktloss': 1.4598540145985401, 'speedtest_ndt7_download': 115.76130482429214, 'speedtest_ndt7_upload': 6.192435825506721, 'speedtest_ndt7_downloadretrans': 0.0, 'speedtest_ndt7_minrtt': 4.724, 'google_packet_loss_pct_under_ul': 0.0, 'google_rtt_min_ms_under_ul': 8.237, 'google_rtt_max_ms_under_ul': 14.122, 'google_rtt_avg_ms_under_ul': 9.361, 'google_rtt_mdev_ms_under_ul': 1.702, 'google_packet_loss_pct_under_dl': 0.0, 'google_rtt_min_ms_under_dl': 8.224, 'google_rtt_max_ms_under_dl': 9.48, 'google_rtt_avg_ms_under_dl': 8.834, 'google_rtt_mdev_ms_under_dl': 0.365}
    Traceback (most recent call last):
      File "./src/netrics.py", line 301, in <module>
        upload(test.results, test.results)
      File "./src/netrics.py", line 193, in upload
        ret = creds.write_points([{"measurement": "networks",
      File "/usr/local/src/nm-exp-active-netrics/venv/lib/python3.8/site-packages/influxdb/client.py", line 603, in write_points
        return self._write_points(points=points,
      File "/usr/local/src/nm-exp-active-netrics/venv/lib/python3.8/site-packages/influxdb/client.py", line 681, in _write_points
        self.write(
      File "/usr/local/src/nm-exp-active-netrics/venv/lib/python3.8/site-packages/influxdb/client.py", line 413, in write
        self.request(
      File "/usr/local/src/nm-exp-active-netrics/venv/lib/python3.8/site-packages/influxdb/client.py", line 378, in request
        raise InfluxDBClientError(err_msg, response.status_code)
    influxdb.exceptions.InfluxDBClientError: 400: {"error":"partial write: field type conflict: input field \"speedtest_ookla_pktloss\" on measurement \"networks\" is type float, already exists as type integer dropped=1"}

(R3) Netrics users exceeding data cap

Description: Netrics is most likely causing, "You’ve used 75% of your data this month", where "75%" can be higher.
Goal: design a test schedule (cron) to reduce the amount of transferred data while keeping a representative number of data points.

speedtest version

Hi @kyle-macmillan and @ggmartins --

I just noticed that the speedtest version is running speedtest --json rather than speedtest -f json. The former is the format for the speedtest-cli version, which is having issues today, and is not officially supported. The latter is from Ookla.

https://github.com/chicago-cdac/nm-exp-active-netrics/blob/b11feffa13a1e369b736fd34df3bd931d5b04557/src/netrics/netson.py#L105

So I suspect we're still using the sivel version --

https://github.com/sivel/speedtest-cli

I confirm that here:

https://github.com/chicago-cdac/nm-exp-active-netrics/blob/b11feffa13a1e369b736fd34df3bd931d5b04557/requirements.txt#L14

We should switch to this one:

https://www.speedtest.net/apps/cli

@kyle-macmillan, do you have time to switch, or @ggmartins could you? I am trying to get a talk written for next week.

Thanks --

Jamie

Add NDT7 Speedtests

We should add the ndt7 tests.

ndt7-client requires go >= 1.12; Ubuntu 18.04 is 1.10 by default,
so check available versions.
Then per go instructions here, download install.

wget https://golang.org/dl/go1.16.3.linux-arm64.tar.gz
tar -C /usr/local -xzf go1.16.3.linux-arm64.tar.gz

Then install the ndt7-client per these instructions

git clone https://github.com/m-lab/ndt7-client-go.git
export GO111MODULE=on
cd ndt7-client-go/
go get ./cmd/ndt7-client

In my bashrc, I put

export GOPATH=~/.bin/go/
export PATH=$PATH:/usr/local/go/bin:~/.bin/go/bin

Then the test is just

ndt7-client -format json -quiet

The output is json, so this should be "trivial" to add in the python .

I don't have time to do this for a week or so, since I'm not as confident with the build system. @ggmartins, if you could incorporate the ndt7-client, I'm happy to add the python.

ndev device scan failed to run

    Failed to resolve "default".
    WARNING: No targets were specified, so 0 hosts scanned.
    
     --- NETWORK MEASUREMENTS ---
...
     --- Number of Devices ---
    Number of active devices: 1

Add local ndt7

@ggmartins, @kyle-macmillan, @tarunmangla, @feamster for discussion:

We should add the ndt7 docker container, to allow for wifi tests.

On jetson, adapting the instructions and README.

git clone https://github.com/m-lab/ndt-server.git
cd ndt-server
docker build . -t ndt-server

# Start up docker for the jetson:
sudo apt install docker.io
sudo usermod -aG docker $USER
install -d certs datadir
./gen_local_test_certs.bash

# -d for daemon; turn off for testing.
docker run -d --network=host                            \
           --volume `pwd`/certs:/certs:ro         \
           --volume `pwd`/datadir:/datadir        \
           --read-only                            \
           --user `id -u`:`id -g`                 \
           --cap-drop=all                         \
           ndt-server                                 \
           -cert /certs/cert.pem                  \
           -key /certs/key.pem                    \
           -datadir /datadir                      \
           -ndt7_addr 192.168.1.4:33007             \
           -ndt7_addr_cleartext 192.168.1.4:38080

Obviously, 192.168.1.4 is the local IP of the server here, which I set as static, from the router.

Then we need a little bit of javascript, adapted from here:

Note that ndt7 server side drops the json.gz in the datadir, so we could also just parse and ship, if there are any

We do still probably want to allow user input, i.e., "my connection is bad," which will require something dynamic (django/flask) beyond the ndt server.

measurement device for Kyle, Nicole and team.

Description: "standalone" raspberry pi 8GB to be direct connected to the router.
Requirements:

  • netrics installed
  • influxdb credentials for direct data ingestion
  • no cms, no mq

Goals:

  • see if the current test schedule is affecting QoE
  • demo the dashboard
  • @JamesSaxon @kyle-macmillan see if you have other things to add there

machine reboot resets crontab

On machine reboot, the crontab /etc/cron.d/cron-nm-exp-active-netrics reverts to a default one, instead of either the user's default or (I would prefer) whatever was edited & installed.

Nicole's speedtest performance drop

Description: Nicole's speedtest performance dropped from ~900Mbps down to ~70Mbps after 3 executions.

image

Running ndt7-client command directly gives me the same result. I wasn't able to run ookla speedtest command directly via cms but through netrics, it gives me the same result.

speedtest netson code JSONDecodeError

 ==============================================================================
    
    You may only use this Speedtest software and information generated
    from it for personal, non-commercial use, through a command line
    interface on a personal computer. Your use of this software is subject
    to the End User License Agreement, Terms of Use and Privacy Policy at
    these URLs:
    
    	https://www.speedtest.net/about/eula
    	https://www.speedtest.net/about/terms
    	https://www.speedtest.net/about/privacy
    
    ==============================================================================
    
    License acceptance recorded. Continuing.
    
    
     --- NETWORK MEASUREMENTS ---
    
     --- Ookla speed tests ---
    Download:	476.695408 Mb/s
    Upload:		23.909872 Mb/s
    Latency:	14.055 ms
    Jitter:		2.48 ms
    PktLoss:	0 Total Count
    Traceback (most recent call last):
      File "./src/netrics.py", line 245, in <module>
        output['ookla'], output['ndt7'] = test.speed()
      File "/usr/local/src/nm-exp-active-netrics/src/netrics/netson.py", line 165, in speed
        return self.speed_ookla(True), self.speed_ndt7(True)
      File "/usr/local/src/nm-exp-active-netrics/src/netrics/netson.py", line 143, in speed_ndt7
        res_json = json.loads(output)
      File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
        return _default_decoder.decode(s)
      File "/usr/lib/python3.8/json/decoder.py", line 340, in decode
        raise JSONDecodeError("Extra data", s, end)
    json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 81)

throughput tcp retransmissions

ndt7-client retransmissions output was crucial to debugging the 5-port switch problem. I think we should definitely incorporate that into ours.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.