evoldoers / biomake Goto Github PK

View Code? Open in Web Editor NEW

98.0 11.0 9.0 960 KB

GNU-Make-like utility for managing builds and complex workflows

License: BSD 3-Clause "New" or "Revised" License

Shell 1.27% Prolog 90.27% Makefile 6.17% Perl 2.16% NewLisp 0.06% Dockerfile 0.07%

workflows makefiles gnu-make swi-prolog prolog bioinformatics

biomake's People

Contributors

Stargazers

Watchers

Forkers

ihh samuell bioinfotools sjackman debasishmaji samwalrus verceti-ke juanbono

biomake's Issues

Quoted arguments with spaces are incorrectly parsed by wrapper script

c.f. #27

Inclusion of phony targets, and more informative error messages.

Hi,

I'm trying to make a more practical workflow to test, but would like the inclusion of PHONY targets

# Programs
APP_FASTQC='/sw/apps/bioinfo/fastqc/0.11.5/milou/fastqc'
APP_KRAKEN='/sw/apps/bioinfo/Kraken/0.10.5-beta/milou/kraken'
APP_KAT='/sw/apps/bioinfo/KAT/2.1.1/milou/bin/kat'
DB_KRAKEN='/sw/data/uppnex/Kraken/latest'
# Filenames
FASTQ_FILES = bacteria_Str1_R1.fastq  bacteria_Str1_R2.fastq
FASTQC_FILES = $(FASTQ_FILES:.fastq=_fastqc.html)

fastqc : ${FASTQC_FILES}
        @echo "FastQC completed on all files"

%_fastqc.html : %.fastq
        @echo "FastQC -> $^"
        $(APP_FASTQC) -t 6 $^

When trying biomake fastqc I just get the message
fastqc FAILED
Could a more informative error message be given please?

Also when trying biomake -d fastqc, I don't know whether -d isn't working or if the error doesn't have a message.

Warning: verbose: no matching debug topic (yet)
fastqc FAILED

Regards,
Mahesh.

DCGs declared in the prolog/endprolog block don't work

This doesn't work:

prolog
match_test --> ['t','e','s','t'].
endprolog

test { string_chars(TARGET,C), phrase(match_test,C) }:
	echo got here

It gives the following error:

Exception: error(existence_error(procedure,biomake:match_test/2),context($dcg:call_dcg/3,_6756))

This, however, works:

prolog
match_test(['t','e','s','t'],[]).
endprolog

test { string_chars(TARGET,C), phrase(match_test,C) }:
	echo got here

Not sure why, as my understanding is that both versions are theoretically identical.

export requires a blank line after

export FOO=bar

all:

❯❯❯ biomake 2>&1 | head -n1
Exception: error(syntax_error(GNU makefile parse error at line 1 of file Makefile: export FOO=bar),_1534)

@ihh Help?

Implement MAKECMDGOALS variable

c.f. https://www.gnu.org/software/make/manual/html_node/Goals.html

swipl wrapper script can call itself recursively if PATH contains bin/

If PATH is set to include $BIOMAKE/bin, where $BIOMAKE is the checked out repo, then bin/swipl goes into a loop

Problem with swi-prolog 8?

When I try and run biomake in Swi-prolog 8.1.12 it does not work I only get some stuff from
Swi-Prolog:

` root@787e0d9980b6:/home# biomake hello.head
swipl: Usage:
1) swipl [options] prolog-file ... [-- arg ...]
2) swipl [options] [-o executable] -c prolog-file ...
3) swipl --help Display this message (also -h)
4) swipl --version Display version information
4) swipl --arch Display architecture
6) swipl --dump-runtime-variables[=format]
Dump link info in sh(1) format

Options:
-x state Start from state (must be first)
-g goal Run goal (may be repeated)
-t toplevel Toplevel goal
-f file User initialisation file
-F file Site initialisation file
-l file Script source file
-s file Script source file
-p alias=path Define file search path 'alias'
-O Optimised compilation
--tty[=bool] (Dis)allow tty control
--signals[=bool] Do (not) modify signal handling
--threads[=bool] Do (not) allow for threads
--debug[=bool] Do (not) generate debug info
--quiet[=bool] (-q) Do (not) suppress informational messages
--traditional Disable extensions of version 7
--home=DIR Use DIR as SWI-Prolog home
--stack_limit=size[BKMG] Specify maximum size of Prolog stacks
--table_space=size[BKMG] Specify maximum size of SLG tables
--shared_table_space=size[BKMG] Maximum size of shared SLG tables
--pce[=bool] Make the xpce gui available
--pldoc[=port] Start PlDoc server [at port]

Boolean options may be written as --name=bool, --name, --no-name or --noname`

This is using the SWI-Prolog docker container latest version. If I load a container for swipl:7.7.25 then biomake works fine.

Implement static pattern rules

See https://www.gnu.org/software/make/manual/html_node/Static-Usage.html
and https://www.gnu.org/software/make/manual/html_node/Static-versus-Implicit.html

all: foo.gz bar.gz

foo bar: %:
	date >$@

foo.gz bar.gz: %.gz: %
	gzip -k $<

❯❯❯ make -n
date >foo
gzip -k foo
date >bar
gzip -k bar
❯❯❯ biomake -n 2>&1 | head -n1
Exception: error(syntax_error(GNU makefile parse error at line 3 of file Makefile: foo bar: %:),_2270)

Can't pass in dependencies to $(bagof)

Example:

test-$(A):
	echo $(bagof X,X=$(A))

Expected:

$ biomake test-foo
echo foo
foo

will check this out in next few days

Prevent "no matching debug topic (yet)" messages

e.g. running biomake -d in a directory without a Makefile gives

Warning: verbose: no matching debug topic (yet)
Warning: build: no matching debug topic (yet)
No Makefile found

Looking at the source for debug/1, linked below, it seems like it's always going to print a message. Guess we could write our own version of debug.

http://www.swi-prolog.org/pldoc/doc/swi/library/debug.pl?show=raw

Explore ways in which to pass prolog clauses into Make syntax

Currently if you want to use LP extensions then you need to use prolog/Makeprog syntax for the entire Makefile.

There are a variety of ways to smuggle in prolog syntax to the Makefile, but these have varying degrees of elegance.

The most straightforward is to embed the prolog in comments, e.g.

# PROLOG: mammals <-- Deps, {findall( t([X,'.',Y,'.alignment']),  mammal_pair(X,Y),  Deps )}.

This should be straightforward to implement.

But this is a little unsatisfying, as it obscures the new target.

A more makefile-esque way might be:

MYDEPS := $prolog( findall( t([X,'.',Y,'.alignment']),  mammal_pair(X,Y),  % ) )

...

mammals: $(MYDEPS)

The challenge here is that prolog doesn't return values, it unifies, so we need a way to state that the 3rd argument of this findall/3 clause should be bound to the stated variable. We could just use a symbol like % as I have above or some other convention.

perhaps this is clearer if the pattern is specified in our $prolog(...) pseudo-make function:

MYDEPS := $prolog( %, findall( t([X,'.',Y,'.alignment']),  mammal_pair(X,Y),  % ) )

SLURM with multiple dependencies

There appears to be an errant space character before the comma in
--dependency=afterok:165337 ,afterok:165339

❯❯❯ biomake -Q slurm
…
Submitting job: sbatch -o /home/sjackman/work/redcedar/.biomake/slurm/out/all -e /home/sjackman/work/redcedar/.biomake/slurm/err/all   -n16 --dependency=afterok:165337 ,afterok:165339  --parsable /home/sjackman/work/redcedar/.biomake/slurm/script/all >/home/sjackman/work/redcedar/.biomake/slurm/job/all
sbatch: error: Unable to open file ,afterok:165339

Oddities with use of MD5 hashes instead of timestamps

Makefile-simple:

a:
	echo a > $@

b: a
	echo b > $@

first iteration works as expected:

$ biomake -f Makefile-simple -H b
 echo a > a
Target b not materialized - build required
echo b > b
b built

changing the timestamp but leaving contents unmodified for upstream target:

$ touch a
$ biomake -f Makefile-simple -H b
 echo a > a
b is up to date

this isn't what I'd expected; there should be no need to re-execute the command to make a?

I am seeing similar behavior with -C. It seems to be regenerating the targets rather than simply recomputing the md5, which is what I believe is intended.

I can look into this further.

Set environment variables with export foo=bar [feature request]

A GNU Makefile can set an environment variable like so:

export foo=bar
all:
	sh -c 'echo $$foo'

❯❯❯ make
sh -c 'echo $foo'
bar
❯❯❯ biomake
Exception: error(syntax_error(GNU makefile parse error at line 1 of file Makefile: export foo=bar),_1650)

More Prolog recipes for extending rules

The ability to add Prolog clauses to rules is the unsung feature of biomake: potentially quite powerful, but lacking clear use cases & recipes.

How about a few examples in the README? Using re_match in the rule, as you can now do because biomake includes pcre, is one quick & powerful way to use this feature. We could put this into the README along with other examples e.g.

rule only fires if TARGET file is writable
rule only fires if TARGET file is older than one week
rule only fires if all DEPS files are under 1Mb in size
results go in directory X unless source file is Y, in which case they go into Z (use prolog/endprolog with a little prolog database that specifies the results dir)

Add biomake to bioconda

https://bioconda.github.io/contribute-a-recipe.html

SLURM: Option to join stdout and stderr

If SLURM's bsub -e option (standard error file) is not specified, the default is to join stdout and stderr, which is my preferred behaviour.
See http://www.glue.umd.edu/lsf-docs/man/bsub.html

Refactor the README into readthedocs

The README has gotten a bit lengthy. The ordering is a bit odd, as I think the prolog exegesis should probably come after the more pragmatic stuff. I was about to do the reordering but then thought we may as well go all the way to RTD

biomake -Q slurm prints an error when the subdirectory does not exist

Even though it prints this noisy error message, it seems to complete successfully. It looks like may be failing to create the file foo/.biomake/slurm/job/bar.

foo/bar:
	mkdir -p $(@D)
	touch $@

$ biomake -Q slurm foo/bar
Target foo/bar not materialized - build required
Exception: error(existence_error(directory,/home/sjackman/tmp/foo/.biomake),context(system:make_directory/1,No such file or directory))
  [25] backtrace(99) at /home/sjackman/.linuxbrew/Cellar/swi-prolog/7.6.4/libexec/lib/swipl-7.6.4/library/prolog_stack.pl:444
  [24] prolog_exception_hook('<garbage_collected>',_528,539,501) at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/biomake.pl:107
  [23] <meta call>
  [22] make_directory("/home/sjackman/tmp/foo/.biomake") <foreign>
  [21] catch(utils:make_directory("/home/sjackman/tmp/foo/.biomake"),error(existence_error(directory,"/home/sjackman/tmp/foo/.biomake"),context(...,'No such file or directory')),utils:fail) at /home/sjackman/.linuxbrew/Cellar/swi-prolog/7.6.4/libexec/lib/swipl-7.6.4/boot/init.pl:371
  [20] utils:safe_make_directory("/home/sjackman/tmp/foo/.biomake") at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/utils.pl:314
  [19] utils:biomake_make_subdir_list('/home/sjackman/tmp/foo','<garbage_collected>') at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/utils.pl:306
  [18] utils:biomake_private_filename_dir_exists('foo/bar',[slurm,"job"],_894) at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/utils.pl:300
  [17] queue:run_execs_with_qsub(slurm,rb('foo/bar',[],true,["mkdir -p foo"|...],v(_1012,'foo/bar',[],...)),[],[queue(slurm),...|...]) at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/queue.pl:59
  [15] biomake:dispatch_run_execs('<garbage_collected>',[],[queue(slurm),...|...]) at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/biomake.pl:468
  [14] biomake:run_execs_and_update('<garbage_collected>',[],'<garbage_collected>') at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/biomake.pl:451
  [13] biomake:build('foo/bar',[],[queue(slurm),...|...]) at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/biomake.pl:164
  [11] '$apply':forall('<garbage_collected>',user:build(_1344,...)) at /home/sjackman/.linuxbrew/Cellar/swi-prolog/7.6.4/libexec/lib/swipl-7.6.4/boot/apply.pl:51
  [10] build_toplevel([queue(slurm),...|...]) at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/cli.pl:41
   [9] main at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/cli.pl:17
   [8] '<meta-call>'('<garbage_collected>') <foreign>
   [7] catch(user:(...,...),_1568,'$toplevel':true) at /home/sjackman/.linuxbrew/Cellar/swi-prolog/7.6.4/libexec/lib/swipl-7.6.4/boot/init.pl:371

Note: some frames are missing due to last-call optimization.
Re-run your program in debug mode (:- debug.) to get more detail.
Submitting job: sbatch -o /home/sjackman/tmp/foo/.biomake/slurm/out/bar -e /home/sjackman/tmp/foo/.biomake/slurm/err/bar     --parsable /home/sjackman/tmp/foo/.biomake/slurm/script/bar >/home/sjackman/tmp/foo/.biomake/slurm/job/bar
foo/bar queued for rebuild

Handle splitting over multiple lines

Currently assignments can be multi-line, but GNU make defines a more detailed syntax for this:
https://www.gnu.org/software/make/manual/html_node/Splitting-Lines.html
https://www.gnu.org/software/make/manual/html_node/Splitting-Recipe-Lines.html

Splitting dependencies over multiple lines only works if backslash preceded by space

this works:

foo: bar1 \
 bar2
	echo done $^

this fails to parse:

foo: bar1\
 bar2
	echo done $^

easy to modify a Makefile to conform, but not clear why this should be the case

Retain the job number

I'd like to retain a record of the job number, so that I can later use sacct -j ID to fetch the run time and memory usage for that job. Could the .biomake/slurm/job/ file be moved after the job completes, rather than being deleted? Perhaps moved from .biomake/slurm/job/ to .biomake/slurm/done/.

Observe the SHELL environment variable

Changing the shell to either bash -o pipefail or zsh -o pipefail is useful to catch a failed intermediate stage of a pipe. For example by default cat non-existant-file | gzip >$@ is successful because the gzip is succesful, even though cat failed. Setting -o pipefail causes this pipe to fail correctly.

It's also useful to use zsh to time every command that's run by biomake like so:

export SHELL=zsh -opipefail
export REPORTTIME=1
export TIMEFMT=time user=%U system=%S elapsed=%E cpu=%P memory=%M job=%J

Usually SHELL is used to changed the shell from the default /bin/sh to say zsh -o pipefail, but here's a fun example changing it to python.

SHELL=python

all:
	print("1 + 2 = "); print(1 + 2)

❯❯❯ make
print("1 + 2 = "); print(1 + 2)
1 + 2 = 
3
❯❯❯ biomake
Target all not materialized - build required
print("1 + 2 = "); print(1 + 2)
sh: -c: line 0: syntax error near unexpected token `"1 + 2 = "'
sh: -c: line 0: `print("1 + 2 = "); print(1 + 2)'
While building all: Error 2 executing print("1 + 2 = "); print(1 + 2)

Consider eliminating the -l option

The -l option ("Iterates through directory writing metadata on each file found") is a little odd, and dates back to the earliest (pre-GNU compatible) version of biomake. I'm not sure what it's for. Do we want to keep it @cmungall? I'm adding in more options through the CLI, and single-character options are at a premium, and I don't think there's a good test for this option anyway(?)

Bio-Linux compatibility

I tried BioMake under Bio-Linux 8.0.8 based on Ubuntu 14.04.5, LTS but encountered problems:

download and install

root@wildcat:/usr/local/src/bioinformatics/BioMake#
git clone https://github.com/evoldoers/biomake.git
apt install swi-prolog
prolog --version
SWI-Prolog version 6.6.4 for amd64

test

root@wildcat:/usr/local/src/bioinformatics/BioMake/biomake#
make |& tee make.log
git clean -fd t/target || true
/usr/local/src/bioinformatics/BioMake/biomake/bin/swipl_wrap -q -t test -l prolog/test/test
ERROR: /usr/lib/swi-prolog/library/prolog_stack.pl:458:
dynamic/1: No permission to modify static procedure `prolog_exception_hook/4'
Defined at /usr/local/src/bioinformatics/BioMake/biomake/prolog/test/test.pl:23
[8] backtrace(99) at /usr/lib/swi-prolog/library/prolog_stack.pl:376
[7] prolog_exception_hook(error(...,...),_G810,97,none) at /usr/local/src/bioinformatics/BioMake/biomake/prolog/test/test.pl:25
[6]
[5] string_chars([70|...],_G867)
[4] announce([70|...]) at /usr/local/src/bioinformatics/BioMake/biomake/prolog/test/test.pl:319
[3] test at /usr/local/src/bioinformatics/BioMake/biomake/prolog/test/test.pl:36
ERROR: announce/1: Undefined procedure: string_chars/2
Exception: (5) string_chars([70, 65, 73, 76, 85, 82, 69, 32|...], _G775) ? EOF: exit

check versions of SWI-Prolog available in Ubuntu repo's

root@wildcat:/usr/local/src/bioinformatics/BioMake/biomake# apt-cache policy swi-prolog
swi-prolog:
Installed: 6.6.4-2ubuntu1
Candidate: 6.6.4-2ubuntu1
Version table:
*** 6.6.4-2ubuntu1 0
500 http://gb.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages
100 /var/lib/dpkg/status
5.10.4-3ubuntu1 0
500 http://archive.ubuntu.com/ubuntu/ precise/universe amd64 Packages

Any ideas?

Tony.

Cyclic dependency makes biomake sad

Typing biomake xyz with the following Makefile causes biomake to bail out with the error "Exceeds maximum length of dependency chain". GNU Make does not exhibit this behavior.

%: %.foo
	touch $@

plugin idea: automatic metadata annotation

Reproducibility and provenance are increasingly important.

Makefiles and Makefile-like solutions such as biomake help with reproducibility; if the recipe and input files are provided in a github repo then in theory it is easy to re-executed and hopefully get the same answer.

However, if the final output files are submitted to a data repository, the provenance may not be immediately obvious. Initiatives such as BD2K are emphasizing the importance of metadata on all digital objects, which includes analysis results. Of course it is possible to manually annotate these artefacts, but why do that when this can be automated.

It should be possible for any file derived from biomake to immediately see a graph of objects used to derive it, together with complete metadata on each; this includes standard filesystem metadata e.g. timestamp but additional metadata too. See also https://github.com/W3C-HCLSIG/HCLSDatasetDescriptions

This may be a heavyweight feature so may be best implemented as some kind of plugin.

Fragile line number tracking in Makefile parser

The gnumake_parser.pl DCG attempts to track the number of lines of each grammar clause, to track the current line number of the file (for the purposes of error reporting). This cumulative approach to line-number counting is extremely fragile, and broken for some clauses. Most notably, newline characters in code_list and char_list (which are low-level primitives used all over the place in gnumake_parser.pl) are not tracked.

As a concrete example, biomake will report a parse error at line 5 of the following Makefile, when the error is actually at line 10:

$(warning This is a single line)

$(warning\
This\
is\
a\
split\
line)

BROKEN

test:
	echo OK

Arguably this is not a major problem, since it only occurs in certain function contexts, and not in the main contexts (recipes and assignments). However, it would be good to fix it, and have robust line-number reporting for parse errors.

One way to fix it would be to count newlines in char_list and code_list, and propagate them upwards through the DCG. That is probably the quickest way to fix things, but deepens the technical debt. The more robust way to handle this issue is to lex the Makefile before parsing it, adding in line-number tokens at the start of every line - but that would require more refactoring.

Question about job submissions on PBS using biomake

This is more a usability question, I am just trying to learn how to use biomake with job submissions:

I have the following Makeprog file:

'$(Base).noheader.csv' <-- '../../withdrawls/$(Base).csv',
    'tail -n +2 ../../withdrawls/$(Base).csv > $(Base).noheader.csv'.

'$(Base).plain' <-- '$(Base).noheader.csv',
    'sed "s/,//g" $(Base).noheader.csv > $(Base).plain'.

'both.cat' <-- ['trip_quad.plain',
    'consent_withdrawn.plain'],
    'cat consent_withdrawn.plain trip_quad.plain > both.cat'.  %line 38 cat needs new line

'wocs.txt' <-- ['both.cat'],
    'uniq -u both.cat > wocs.txt'.

Which works fine to concatenate two csv files (consent_withdrawn.csv and trip_quad.csv), and turn them into a plain list file called wocs.txt. I can run this with :

biomake wocs.txt

I have tried to submit the job to our queuing system with :

biomake -Q pbs wocs.txt

This makes a set of jobs:

`newblue4:/var/spool/mail$ qstat -u myusername
master.cm.cluster:
Reqd Reqd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time

8867488.master.c alspacdata veryshor trip_quad.nohead -- -- -- -- 01:00:00 Q --
8867489.master.c alspacdata veryshor trip_quad.plain -- -- -- -- 01:00:00 H 01:00:00
8867490.master.c alspacdata veryshor consent_withdraw -- -- -- -- 01:00:00 Q --
8867491.master.c alspacdata veryshor consent_withdraw -- -- -- -- 01:00:00 H 01:00:00
8867492.master.c alspacdata veryshor both.cat -- -- -- -- 01:00:00 H 01:00:00
8867493.master.c alspacdata veryshor wocs.txt -- -- -- -- 01:00:00 H 01:00:00 `

But the jobs all get deleted and I do not have any successful builds (for example a message):

`
From [email protected] Tue Nov 5 17:31:41 2019
Return-Path: [email protected]
X-Original-To: alspacdata@newblue1
Delivered-To: [email protected]
Received: by newmaster2.acrc.bris.ac.uk (Postfix, from userid 0)
id CBD6932436E; Tue, 5 Nov 2019 17:31:41 +0000 (GMT)
To: [email protected]
Subject: PBS JOB 8867489.master.cm.cluster
Precedence: bulk
Message-Id: [email protected]
Date: Tue, 5 Nov 2019 17:31:41 +0000 (GMT)
From: [email protected] (root)

PBS Job Id: 8867489.master.cm.cluster
Job Name: trip_quad.plain
Aborted by PBS Server
Job deleted as result of dependency on job 8867488.master.cm.cluster
`

What am I doing wrong in the job submissions for the builds, how is this meant to work?

passing qsub/sbatch parameters to individual make rules.

Hi,

I couldn't find it in the documentation (or I just didn't understand it), but how can one pass slurm parameters to specific make rules. As I understand it currently, one passes sbatch parameters on the command line, but then all make targets are processed with those options. I don't see how one can say tell this rule to run with 4 cores and time of 2-00:00:00, while this other rule should be submitted with -n 16 and -t 3-12:00:00. Is there some prolog rule one can include with the make rule?

Regards,
Mahesh Binzer-Panchal.

Tag a stable release

There's been a number of improvements to Biomake since the last stable release, 0.1.3. Could you please tag a new stable release?

Implement multiple rules per target

See https://www.gnu.org/software/make/manual/html_node/Multiple-Rules.html

In the following example, biomake does not create the file data.tsv.

all: report.html

report.rmd:
	printf '```{r}\nlibrary(ggplot2)\nlibrary(readr)\nggplot(read_tsv("data.tsv"), aes(x=x, y=y)) + geom_point()\n```\n' >$@

data.tsv:
	printf "x\ty\n1\t1\n2\t4\n3\t9\n" >$@

%.html: %.rmd
	Rscript -e 'rmarkdown::render("$<", "html_document", "$@")'

report.html: data.tsv

❯❯❯ make -n
printf '```{r}\nlibrary(ggplot2)\nlibrary(readr)\nggplot(read_tsv("data.tsv"), aes(x=x, y=y)) + geom_point()\n```\n' >report.rmd
printf "x\ty\n1\t1\n2\t4\n3\t9\n" >data.tsv
Rscript -e 'rmarkdown::render("report.rmd", "html_document", "report.html")'
❯❯❯ biomake -n
  printf '```{r}\nlibrary(ggplot2)\nlibrary(readr)\nggplot(read_tsv("data.tsv"), aes(x=x, y=y)) + geom_point()\n```\n' >report.rmd
 Rscript -e 'rmarkdown::render("report.rmd", "html_document", "report.html")'
Target all not materialized - build required
❯❯❯ biomake
…
Error: 'data.tsv' does not exist in current working directory

Execution halted
While building report.html: Error 1 executing Rscript -e 'rmarkdown::render("report.rmd", "html_document", "report.html")'

make test succeeds even if tests fail

We have travis checks now, see #6

But it seems make test always passes, even if some tests fail

also, swipl is forgiving of compilation errors like this:

ERROR: /Users/cjm/repos/plmake/prolog/biomake/biomake.pl:27:
        Exported procedure parser_utils:to_strings/2 is not defined

md5 is BSD specific

The md5 tests run md5 on the command line. This seems specific to BSD/OSX. Ubuntu has md5sum (not sure if args are the same). This causes the travis jobs to fail on 79.

Not sure if best option is to include a wrapper script and/or make the code test for availability of either and use the installed one, thoughts @ihh ?

Welcome message should draw version number from pack.pl

We currently encode version in two places, pack.pl in top level should be canonical

Change prolog/biomake/cli.pl to draw from this

Backslashes in dependencies make biomake sad

Makefile

all: a \
	b

a b:
	touch $@

Observed output

❯❯❯ biomake
# Target all not materialized - build required
all FAILED

Expected output

❯❯❯ make
touch a
touch b

Mirror S3 bucket using aws s3 sync

Keep it simple:

biomake --sync s3://mybucket/my/path
- aws s3 sync --delete s3://mybucket/my/path . before building any targets
- aws s3 sync --delete . s3://mybucket/my/path after building all targets
later can add other URIs to --sync

Prefer more specific rules over more general ones

When multiple pattern rules exist, GNU Make prefers the more specific rule over more general ones, that is to say, the pattern rule with the shortest stem (the shortest string matched by %). See the last few sentences of https://www.gnu.org/software/make/manual/html_node/Pattern-Match.html

In the following example, GNU make uses the combined sort | gzip rule, whereas biomake uses the separate sort and gzip rules.

all: words.sort.txt.gz

.SECONDARY:

words.txt: /usr/share/dict/words
	cp $< $@

%.sort.txt: %.txt
	sort -o $@ $<

%.txt.gz: %.txt
	gzip -c $< >$@

%.sort.txt.gz: %.txt
	sort $< | gzip >$@

❯❯❯ make -n
cp /usr/share/dict/words words.txt
sort words.txt | gzip >words.sort.txt.gz
❯❯❯ biomake -n
   cp /usr/share/dict/words words.txt
  sort -o words.sort.txt words.txt
 gzip -c words.sort.txt >words.sort.txt.gz
Target all not materialized - build required

The workaround in this case is to put the %.sort.txt.gz: %.txt rule above the other two rules.

biomake does not actually abort on ^C.

It is incredibly difficult to trigger biomake to exit with SIGINT (^C). You basically have to kill it, and all of the subprocesses which biomake generates.

Presumably there is some way to override the interrupt handler of prolog to actually exit sanely.

plugin/library idea: parsers and/or sniffers for common bioinformatics file formats

It would be useful to be able to auto-detect file formats in a Makefile, and condition the rules on the formats of the files rather than just relying on the filenames.

this could be achieved by including a library of parsers for standard formats (GFF, FASTA, Newick, etc). some of these are already in blipkit...

of course, parsing a big file every time it tried to match a rule would probably make Biomake grind to a halt. maybe there is a way to optimize it though - e.g. storing as metadata in the .biomake directory.

Use built-in hash predicates of library(crypto) for portability

As of SWI-Prolog 7.4, library(crypto) provides built-in predicates for computing hashes of data and files.

I have uploaded a primer that contains more information about various options and encodings:

https://www.metalevel.at/prolog/cryptography

Please consider using library(crypto) as a more portable alternative to calling external programs.

Thank you for your work on biomake!

Pattern rules with multiple targets

Pattern rules may have more than one target. Unlike normal rules, this does not act as many different rules with the same prerequisites and recipe. If a pattern rule has multiple targets, make knows that the rule’s recipe is responsible for making all of the targets. The recipe is executed only once to make all the targets. When searching for a pattern rule to match a target, the target patterns of a rule other than the one that matches the target in need of a rule are incidental: make worries only about giving a recipe and prerequisites to the file presently in question. However, when this file’s recipe is run, the other targets are marked as having been updated themselves.

See https://www.gnu.org/software/make/manual/html_node/Pattern-Intro.html

In this example, the command touch ook.a ook.b should be run only once. biomake runs it twice.

all: foo bar ook.ab

foo bar ook:
	touch $@

%.a %.b: %
	touch $*.a $*.b

%.ab: %.a %.b
	cat $^ >$@

❯❯❯ make -n
touch foo
touch bar
touch ook
touch ook.a ook.b
cat ook.a ook.b >ook.ab
❯❯❯ biomake -n
 touch foo
 touch bar
   touch ook
  touch ook.a ook.b
  touch ook.a ook.b
 cat ook.a ook.b >ook.ab

SnakeMake parser

Implement a parser to convert SnakeMake rules into prolog rules, as the current GNU makefile parser does for GNU makefiles (and the trivial prolog parser for prolog).

Implement by changing shell to python, via #54

Parsing of do-not-echo at symbol not quite right

this works in make and biomake:

foo:
    @echo hello

the effect is not to echo the echo command.

however, that evol-doer @fbastian has come up with a cunning recipe that stymies biomake:

https://github.com/obophenotype/developmental-stage-ontologies/blob/97fa5b7e176f8c7f72cd5c7115d2fc36f35e8bb3/src/Makefile#L16-L17

Biomake macro for MD5 hash of a file

Can leverage MD5 hash management code invoked by --md5-hash

Implement order-only prerequisites

see here: https://www.gnu.org/software/make/manual/html_node/Prerequisite-Types.html

Use If-Modified-Since & Content-MD5 HTTP headers to examine remote dependencies

In thinking about using biomake to build a continuous-integration data aggregation pipeline (@cmungall), I'm imaging a model for database adapters that basically gives you a macro of the form:

BIOMAKE_CURL(downloaded_file_path,url_of_file)

At a crude level this can just be imagined as expanding into the following Makefile recipe (indeed, one could retain legacy compatibility if using this with GNU Make by defining BIOMAKE_CURL to expand into something like this):

downloaded_file_path:
    curl --output $@ url_of_file

However, behind the scenes, biomake will attempt to propagate the dependency check across the connection, either by

using the HTTP If-Modified-Since header with a GET method to check if the remote file has a later modification time than the local copy, or
(if running with --md5-hash) using a HEAD method to retrieve the HTTP header, and comparing the Content-MD5 header field to the locally stored MD5 hash.

The Content-MD5 idea is a bit dicey because it may not be well-supported (e.g. Apache can do it but only by computing the MD5 hash every time; it doesn't cache it). We could pretty easily whip up a node-express plugin that would cache the hash, I expect.