galaxyproject / gxadmin Goto Github PK

View Code? Open in Web Editor NEW

23.0 20.0 27.0 1.97 MB

Handy command line utility for Galaxy administrators :rocket:

Home Page: https://galaxyproject.github.io/gxadmin/#/

License: GNU General Public License v3.0

Shell 83.20% Makefile 0.35% Python 16.16% Awk 0.29%

sysadmin galaxy-project automation sql postgres-databases galaxy hacktoberfest

gxadmin's Introduction

gxadmin: Make Galaxy administration magical 🌈

A command line tool for Galaxy administrators to run common queries against our Postgres databases. It additionally includes some code for managing Galaxy and associated features, and other utilities.

Mostly gxadmin acts as a repository for the common queries we all run regularly but fail to share with each other.

It comes with a wide array of useful queries that you might want to run, but you can also easily add more to your installation with local functions. gxadmin attempts to be a very readable bash script and avoids using fancy new bash features.

Subcommand	Functions
query	104
server	22
mutate	22
meta	12
galaxy	12
report	5
filter	3
config	2
cluster	1

This script requires a postgres database and has no plans to support mysql or sqlite3.

If we do support sqlite3 in the future, it will be on a "if you're lucky" type basis, we won't be modifying queries to remove postgres-ism.

Installation

curl -L https://github.com/galaxyproject/gxadmin/releases/latest/download/gxadmin > /usr/bin/gxadmin
chmod +x /usr/bin/gxadmin

Contributing

See CONTRIBUTING.md

Changelog

Contributors

Helena Rasche, BDFL (@hexylena) 🏳️‍🌈🏳️‍⚧️
Nate Coraor (@natefoo)
Simon Gladman (@slugger70)
Anthony Bretaudeau (@abretaud)
John Chilton (@jmchilton)
Gianmauro Cuccuru (@gmauro)
Lucille Delisle (@lldelisle)
Manuel Messner ([email protected])
Kim Brügger (@bruggerk)
Catherine Bromhead (@cat-bro)
Michelle Savage (@hujambo-dunia)

License

GPLv3

Configuration

gxadmin does not have much configuration, mostly env vars and functions will complain if you don't have them set properly.

Postgres

Queries support being run in normal postgres table, csv, or tsv output as you need. Just use gxadmin query, gxadmin tsvquery, or gxadmin csvquery as appropriate.

You should have a ~/.pgpass with the database connection information, and set PGDATABASE, PGHOST, and PGUSER in your environment.

Example .pgpass:

<pg_host>:5432:*:<pg_user>:<pg_password>

InfluxDB

In order to use functions like gxadmin meta influx-post, gxadmin requires a few environment variables to be set. Namely

INFLUX_URL
INFLUX_PASS
INFLUX_USER
INFLUX_DB

Other Environment Variables

Some subcommands may ask you to set these environment variables, if they aren't already.

Variable	Example Value
`GALAXY_CONFIG_DIR`	`/srv/galaxy/config/`
`GALAXY_CONFIG_FILE`	`/srv/galaxy/config/galaxy.yml`
`GALAXY_LOG_DIR`	`/var/log/galaxy/`
`GALAXY_MUTABLE_CONFIG_DIR`	`/srv/galaxy/var/`
`GALAXY_ROOT`	`/srv/galaxy/server/`
`VIRTUAL_ENV`	`/srv/galaxy/venv/`

GDPR

You may want to set GDPR_MODE=1. Please determine your own legal responsibilities, the authors take no responsibility for anything you may have done wrong.

If you want to publish data, first be careful! Second, the GDPR_MODE variable is part of the hash, so you can set it to something like GDPR_MODE=$(openssl rand -hex 24 2>/dev/null) gxadmin query ... for hashing and "throwing away the key"

Local Functions

If you want to add some site-specific functions, you can do this in ~/.config/gxadmin-local.sh (location can be overridden by setting $GXADMIN_SITE_SPECIFIC)

You should write a bash script which looks like. ALL functions must be prefixed with local_

local_cats() { ## : Makes cat noises
	handle_help "$@" <<-EOF
		Here is some documentation on this function
	EOF

	echo "Meow"
}

This can then be called with gxadmin like:

$ gxadmin local cats --help
gxadmin local functions usage:

    cats   Cute kitties

help / -h / --help : this message. Invoke '--help' on any subcommand for help specific to that subcommand
$ gxadmin local cats
Meow
$

If you prefix the function name with query-, e.g. local_query-cats, then it will be run as a database query. CSV/TSV/Influx queries are not currently supported.

gxadmin's People

Contributors

Stargazers

Watchers

gxadmin's Issues

gxadmin query --help searches functions rather than showing help.

make sure [--commit] is consistent across all commands

Command gxadmin query tool-memory-per-inputs not working

I get ERROR: division by zero when trying to run it.

show tool in fail.sh

`gxadmin query jobs-max-by-cpu-hours` seems to be broken

galaxy@sn06:~$ gxadmin query jobs-max-by-cpu-hours
ERROR: column "cpu_hours" does not exist
LINE 10: ORDER BY cpu_hours desc

The function report_user-info() retrieves the wrong user sometimes

When querying with an email address that it's not in the database and contains numbers, it retrieves the wrong user.

Example: [email protected]

If I understood well the WHERE clause in the query:
galaxy_user.email = '$1' -> FALSE because it's not in the database
or
galaxy_user.username = '$1' -> FALSE because it's not in the database
or
galaxy_user.id = CAST(REGEXP_REPLACE(COALESCE('$1','0'), '[^0-9]+', '0', 'g') AS INTEGER) -> Here I think that it would find the user whose id contains '123', but this is not the user id, because the user doesn't exist.

Does anyone know what is the use case for the third condition?

Thanks,

Bea

Idea for new queue-detail query with drm dependent information on cores and mem.

In Galaxy Australia we use Slurm as the DRM for all of our nodes. We have a local query which shows the number of cores and mem requested for each job as well as the handler id, runner id and destination id.

It would be nice to be able to add this into gxadmin with a flag for different DRMs such as torque or HTCondor.

Here is the query we currently use:

local_query-queue() { ##? [--all] [--seconds] [--since-update]: Detailed overview of running and queued jobs with cores/mem info
        # this is a copy of gxadmin query queue-detail with job destination info (cores/mem/partition) added and runner_id, count removed
        handle_help "$@" <<-EOF
                        $ gxadmin local query-queue
                          state  |  id  | extid |                          tool_id                          | username | time_since_creation |       handler       | cores | mem  | partition | destination_id
                        ---------+------+-------+-----------------------------------------------------------+----------+---------------------+---------------------+-------+------+-----------+-----------------
                         running | 4385 | 4011  | upload1                                                   | cat      | 00:01:01.518932     | main.job-handlers.2 | 2     | 6144 |           | slurm
                         queued  | 4387 | 4012  | toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2 | cat      | 00:00:24.377336     | main.job-handlers.2 | 1     | 3072 |           | slurm
                         queued  | 4388 | 4388  | toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2 | cat      | 00:00:13.254505     | main.job-handlers.1 | 1     | 3072 |           | pulsar-nci-test
                         queued  | 4389 | 4013  | toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2 | cat      | 00:00:01.834048     | main.job-handlers.2 | 1     | 3072 |           | slurm
        EOF

        fields="count=9"
        tags="state=0;id=1;extid=2;tool_id=3;username=4;handler=6;job_runner_name=7;destination_id=8"

        d=""
        nonpretty="("
        time_column="job.create_time"
        time_column_name="time_since_creation"

        if [[ -n "$arg_all" ]]; then
                d=", 'new'"
        fi
        if [[ -n "$arg_seconds" ]]; then
                fields="$fields;time_since_creation=5"
                nonpretty="EXTRACT(EPOCH FROM "
        fi
        if [[ -n "$arg_since_update" ]]; then
                time_column="job.update_time"
                time_column_name="time_since_update"
        fi

        username=$(gdpr_safe galaxy_user.username username "Anonymous User")

        read -r -d '' QUERY <<-EOF
                SELECT
                        job.state,
                        job.id,
                        job.job_runner_external_id as extid,
                        job.tool_id,
                        $username,
                        $nonpretty now() AT TIME ZONE 'UTC' - $time_column) as $time_column_name,
                        job.handler,
                        (REGEXP_MATCHES(encode(job.destination_params, 'escape'), 'ntasks=(\d+)'))[1] as cores,
                        (REGEXP_MATCHES(encode(job.destination_params, 'escape'), 'mem=(\d+)'))[1] as mem,
                        (REGEXP_MATCHES(encode(job.destination_params, 'escape'), 'partition=(\d+)'))[1] as partition,
                        COALESCE(job.destination_id, 'none') as destination_id
                FROM job
                FULL OUTER JOIN galaxy_user ON job.user_id = galaxy_user.id
                WHERE
                        state in ('running', 'queued'$d)
                ORDER BY
                        state desc,
                        $time_column_name desc
        EOF
}

Project link seems invalid

usegalaxy-eu.github.io/gxadmin/#/

"Can you check if these jobs are running for me"

Given history

find history id by name (or from list of user's history | grep)
find HDAs in that history
find jobs for those HDAs
find job cluster ID
ssh to job
strace -p -f

find heavily used tools with large runtime

I would like to start optimizing some tools. I think we can find some BA and MA students to optimize our tool stack and with that waste fewer resources.

To prepare a query like gxadmin query tool-usage but with a min/avg/max runtime column would be super useful.

Add a command to decode an ID

Maybe a wrapper around python server/scripts/secret_decoder_ring.py -c config/galaxy.ini decode 4asdfdfadsf.

Thanks @mvdbeek

psql request which could join gxadmin

Hi there,
At the imaging meeting we were wondering if we could monitor the usage of a list of tools both in terms of CPU hours and in term of number of different users.
I came with this request (which I tested for bowtie2 and cufflinks):

tool_list="('bowtie2', 'cufflinks')"
psql -c "COPY
(SELECT
   date_trunc('month', job.create_time  AT TIME ZONE 'UTC')::date as month,
   round(sum((a.metric_value * b.metric_value) / 3600 ), 2) as cpu_hours,
   regexp_replace(regexp_replace(job.tool_id, '/[0-9.a-z+-]+$', '')::TEXT, '.*toolshed.*/repos/[^/]*/[^/]*/', '') as tool_name,
   COUNT (DISTINCT job.user_id) as nb_users
FROM
   job_metric_numeric a,
   job_metric_numeric b,
   job
WHERE
   b.job_id = a.job_id
   AND a.job_id = job.id
   AND a.metric_name = 'runtime_seconds'
   AND b.metric_name = 'galaxy_slots'
   AND job.create_time < NOW() - interval '1 year'
   AND regexp_replace(regexp_replace(job.tool_id, '/[0-9.a-z+-]+$', '')::TEXT, '.*toolshed.*/repos/[^/]*/[^/]*/', '') in $tool_list
GROUP BY
   month, tool_name
ORDER BY
   month DESC)
TO '$PWD/info_tool.csv' WITH (FORMAT CSV, HEADER) ;"

I am not an expert in psql so this request is not optimal.
I don't know if this is interesting to go to gxadmin.

Update short syntax for default vs options

@natefoo wanted to do [foo|bar|baz] (very reasonable.) but we're using | for assignment which is odd to say the least. We should change this out for := in a desire to keep default assignment 'shellish'.

2432:query_tool-popularity() { ##? [months|24]: Most run tools by month (tool_predictions)
4514:query_tool-new-errors() { ##? [weeks|4] [--short-tool-id]: Summarize percent of tool runs in error over the past weeks for "new tools"
4562:query_tool-errors() { ##? [--short-tool-id] [weeks|4]: Summarize percent of tool runs in error over the past weeks for all tools that have failed (most popular tools first)
4610:query_tool-likely-broken() { ##? [--short-tool-id] [weeks|4]: Find tools that have been executed in recent weeks that are (or were due to job running) likely substantially broken
4659:query_user-recent-aggregate-jobs() { ##? <user> [days|7]: Show aggregate information for jobs in past N days for user (by email/id/username)

query_user-disk-usage() size differs from galaxy_user.disk_usage

I am looking at the disk usage by user. For that, I've used the function query_user-disk-usage() that computes the size. I have also seen a field in the table galaxy_user called disk_usage.
I was expecting that size and disk_usage would show the same value, but this is not happening. Should they match? Is disk_usage the value displayed in Galaxy to the users? I'm a bit confused :) Any help? Thanks!

histogram.py needs to be updated to python2

histogram.py is used by gxadmin a bit and is python2 only atm.

@sanjaysrikakulam ported it to python3 here https://gist.github.com/sanjaysrikakulam/ba5b7ea35c34e1ee6c325d57de8eb717

I guess it would be nice to ship this newer version that works with latest Galaxy's

Functions to add

Given a list of job ids (newline separated)
- why they are in new state
- which host did it run on (maybe extend to other metrics? so can aggregate info for queries like "for these specific jobs (which I've obtained from some other list), runtime / max mem usage)
job info needs to display job ID for input jobs in table
hda-info, dataset-info commands
quota info
history contents should display time + file size
maybe a user-history list command with last modified / size
ips → potential users

purge job metrics table from time to time

# Delete text job metrics for successful or old jobs (not useful for debugging).
# Delete numeric job metrics for old unsuccessful jobs (not useful for statistics or debugging).
DELETE FROM job_metric_text USING job WHERE job_id=job.id AND (job.state='ok' OR CURRENT_TIMESTAMP - job.update_time > '1 year');
VACUUM FULL job_metric_text;
DELETE FROM job_metric_numeric USING job WHERE job_id=job.id AND job.state IN ('deleted', 'error') AND CURRENT_TIMESTAMP - job.update_time > '1 year';
VACUUM FULL job_metric_numeric;

From @nsoranzo

"gxadmin query job-state" seems to be broken

(venv) galaxy@sn06:~/server$ gxadmin query job-state 60057230
ERROR:  syntax error at end of input
LINE 4: WHERE id =

Display user quota in "user-info"

When users request quota increase I check their actual disk usage with this very useful command:

gxadmin report user-info EMAIL

It would be cool to see also their current quota, which would help to check whether the quota was properly increased. Is it something that could be easily added? Or maybe: Is there a better way to check the quota for a specific user?

Thank you :)

useful: "query hda-info <hda_id>"

Which information about the history, which dataset, and more importantly which job_id.

Add dalibo support for explain

https://explain.dalibo.com/about

iquery should not output empty tags

e.g. with __DATA_FETCH__ running as a celery task, there is no destination_id, so with gxadmin iquery queue --by destination you can get the following error from telegraf, which results in no collection of any of the measurements output:

E! [inputs.exec] Error in plugin: metric parse error: expected tag at 25:51: "queue_by_destination,state=running,destination_id= count=0"

This has happened in a few other cases as well. But basically, the iquery formatter should probably just fill in any null tags as unknown or something even more obvious like __NULL__.

query queue-detail uses hardcoded timezone..

When using gxadmin query queue-detail in Australia, I noticed that jobs just created were being displayed as having run for 8 hours.. I realised this was the difference between our two timezones..

The timezone for Germany is hardcoded into the query.

SELECT job.state, job.id, job.job_runner_external_id as extid, job.tool_id, $username, date_trunc('hour', (now() - job.create_time - '2 hours'::interval)) as time_since_creation
FROM job, galaxy_user
WHERE state in ('running', 'queued'$d) and job.user_id = galaxy_user.id
ORDER BY state desc, time_since_creation desc

Maybe we need to detect the timezone modifier and do the appropriate math in the query..

It was easy enough to edit my local copy and change the - "2 hours"::interval to - "10 hours"::interval

This occurs as the job creation time in the database is stored in UTC whilst now() returns the current time in the local timezone.

Just as an aside: I've also changed the resolution to minute and added job destination, handler and runner to my version of the query so I can see finer grain times and where each job is ending up..

        handler       | job_runner_name |   destination_id
 ---------------------+-----------------+--------------------
  main.job-handlers.4 | slurm           | slurm_2slots
  main.job-handlers.3 | slurm           | slurm_2slots
  main.job-handlers.3 | pulsar_rest     | pulsar-nci_mid
  main.job-handlers.1 | slurm           | slurm_1slot_upload

Gets added on to the end of my table.. :)

Report needs to support ##?

galaxy@sn06:~$ gxadmin report data-info 110124855
ERROR:  syntax error at or near "OR"
LINE 18: AND (d.id= OR hda.id=)) to STDOUT with CSV DELIMITER E'\t'

query latest-users syntax error

I got a syntax error running a query tool

galaxy@sn04:~$ /usr/bin/gxadmin query latest-users
ERROR:  syntax error at or near "username"
LINE 5: COALESCE(username::text, '__UNKNOWN__') as  as username,

Deprecate uwsgi commands

Now that galaxy et al move to gunicorn.

time queries?

./gxadmin time meta whatsnew

doesn't seem to work currently

interface stability guarantees

currently there is, approximately zero stability guarantees, we can change anything at any time, functions might disappear, etc.

Given that admins might use gxadmin in pipelines I think I should implement something that will give them more stability. The current idea is something like:

query_blah-blah_v1() { ... }
query_blah-blah_v2() { ... }

which could be called as query blah-blah (for the 'latest' version) and maybe blah-blah_v2 for a specific version? When you're writing a pipeline that you want to include a specific version in you'd then use the versioned identifier of the function name, and we'd be free to update/change behaviour of functions.

And for the 'older' version of these functions we'd just move them to a 'graveyard' since we'll never touch them again.

Fix server-groups-allocated-[cg]pu

@selten your new queries are taking more than 40 hours to execute, I can start to see why you added the feature to select which slurps you wanted.

We should try and find a way to fix this query so it can scale

gxadmin report cwd <job_id>

A command that returns the job_working_directory would be pretty cool for debugging.

Alternatively, this could be added to gxadmin report job-info.

track special tool usage

Something like that would be useful to have, especially as iquery:

gxadmin query queue-details | awk -F '|' '$4 ~ "interactive_"' | wc -l

possible issue with jobs-ready-to-run

Trying this on our instance I get:

$ gxadmin query jobs-ready-to-run
ERROR:  column job.stdout does not exist
LINE 66: job.stdout AS job_stdout,

Drop people directly into man page, instead of printing help, maybe

seen during the GAT, that people with small terminals were confused by the help / examples, as they didn't see the error message.

include a few invocation querys/reports

Given an invocation ID, it would be nice to have:

user
workflow_id
since when it is scheduled
in which history it is running, maybe even with link to this history

galaxyproject / gxadmin Goto Github PK

gxadmin's Introduction

gxadmin: Make Galaxy administration magical 🌈

Installation

Contributing

Changelog

Contributors

License

Configuration

Postgres

InfluxDB

Other Environment Variables

GDPR

Local Functions

gxadmin's People

Contributors

Stargazers

Watchers

Forkers

gxadmin's Issues

Recommend Projects

Recommend Topics

Recommend Org