sakerbuild / saker.build Goto Github PK
View Code? Open in Web Editor NEWA modern build system focusing on fast and incremental builds for all project sizes
Home Page: https://saker.build
License: GNU General Public License v3.0
A modern build system focusing on fast and incremental builds for all project sizes
Home Page: https://saker.build
License: GNU General Public License v3.0
This issue servers as a place of discussion for build trace related implementation.
A build trace is a collection of data about a build execution. It collects information about various aspects of the build and presents it to the user in a way that allows easier debugging of performance and build issues.
The build trace should provide the following information:
Implementation
The build trace implementation should consist of two parts.
If the build is configured to have the build executor machine present as a cluster, then the build may deadlock as the machine will not be able to initialize as a cluster.
This is caused by the common locking scheme that is employed for execution and cluster initialization. The build actually finishes successfully, however, the cluster initialization of the executor machine will halt.
Don't configure the executor machine as a cluster.
Fix the locking scheme in SakerProjectCache
class.
The deadlock detection mechanism of saker.build depends on the ThreadGroup
API. The code itself is not really robust as lingering threads may cause the deadlock not to be noticed. Additionally, the issue #2 is also a side effect of the current mechanism.
As per https://mail.openjdk.java.net/pipermail/loom-dev/2020-July/001471.html, the intetion for the ThreadGroup
API is to be deprecated over time. It also includes heavy synchronization and may not be reliable enough for proper use for deadlock detection. The solution for this is to revise the current implementation and come up with some different mechanism for deadlock detection. This may include placing additional restrictions on the build tasks.
One solution might be is to only allow waiting for other tasks on the main thread of a task. The main thread is the one that Task.run()
is invoked on. This could make it easier to detect the deadlocks as only a single thread needs to be kept in check for the detection.
In order for this to work additional APIs may need to be added to allow waiting for multiple tasks at once. Existing task implementations also need to be checked and tested to be conforming.
With this solution the build will be considered as deadlocked if all running main threads are in waiting state.
This would also make the deadlock detection faster as no polling would be required, the deadlock would be detected instantaneously as the last thread enters the waiting state.
The solution would still allow retrieving task results without waiting on worker threads. E.g. getFinished
should work.
Inner task threads also need to be checked for deadlock. Starting new tasks should also be constrained to the main task threads or main inner task threads.
Saker.build primarily uses case-sensitive representation of files and their hierarchies. This may cause some synchronization or other file management issues when dealing with case-insensitive file systems.
In general, we expect developers to use proper casing when referencing files. That is, they should reference the files in a case-sensitive way in case-insensitive file systems as well. It is an acceptable error if there is a case-insensitive file conflict during build.
However, there may be issues when the developers change the capitalization of files, therefore triggering build tasks and various other file related operations. (E.g. synchronization) These effects and possible scenarios should be examined as part of this issue.
Feature description
The build tasks in the script could be declared based on the enclosing task context. It should reduce the boilerplate that surrounds a task invocation. E.g.:
some.task(
Configuration: .config(data),
# note the two dots
Parameter: ..foo.bar(baz),
)
The above would be equivalent to:
some.task(
Configuration: some.task.config(data),
Parameter: some.foo.bar(baz),
)
Starting a task declaration with a dot can mean that the following task identifier is to be interpreted against the enclosing task.
Workarounds
Use fully qualified task names.
Use-case
For long task names this can reduce the boilerplate surrounding it. This is mostly prevalent when tasks are called in a way that they can be configured with other tasks that bear the same name.
Non-goals
The solution should not take multiple levels of task declarations into account. The solution should only be a syntactic enhancement, and the behaviour of actual task invocation and name resolution should not change.
Other aspects
We should also examine how this feature integrates with task names. Whether or not we should allow multiple dots at the start of task identifiers, and what may be the limit of it. In other aspects, we may not want to use too many preceeding dots, as that would limit readability.
Is there a chance to change the license of Saker and related projects to use a permissive license like AL2.0 and/or MIT.
Given the following scenario:
The build execution can be stopped by manually interrupting the execution thread. This can be done inside an IDE, however, it's unclear for command line execution.
This behaviour can be distruptive in case of CI builds, as the build will never stop.
In general, advising task authors to perform the waiting for input task first, and do the work last should be enough. This is already the recommended workflow for build tasks implementations, therefore there won't be much change.
Another solution is to allow the above transitive waiting, but require the build task authors to delegete the Thread.join
call through the build system. In this case we can detect the number of waiting threads.
There's a chance that this issue may remain open for a prolonged amount of time and be a known bug of saker.build. Generally, this is a rarely occurring bug that can be mitigated by proper implementations of the build tasks. The delegating through build system solution is still a viable partial solution that has a high chance to be implemented.
We should be able to call a build target in the same file directly as a build task rather than using the include
task.
That is, the following:
build() {
$compile = include(compile, Input: 123)
}
compile(
in input
) {
# ...
}
Should turn into this:
build() {
$compile = compile(Input: 123)
}
compile(
in input
) {
# ...
}
Reasoning
It's much more simpler. The intention is clear, and it can't really be confused with tasks that come from other places.
We already limit task names which consist of only a single component to be reserved by the scripting language. This enhancement takes advantage of this as it automatically includes the declared build targets to be directly callable.
Conflicts
If the user declares a build target that conflicts in name with already existing builtin tasks, then the name resolution would be ambiguous.
In this case the build target should not shadow the builtin tasks. That is, they cannot be replaced by a build target.
In cases where the user declares a build target with the same name as a builtin task, a warning can be emitted.
Reasons:
The main reason why we don't allow shadowing builtin tasks is that modifying one part of the build script should not affect the behaviour of another part without clear indication. This makes the build scripts more maintainable.
CI builds are often performed from a clean state, and rarely use incremental builds. This means that there is no reason for the build system to store and persist the incremental state between builds, as they aren't going to be used anyway.
There should be an option that instructs the build system to throw the incremental state that was created during the build away, and don't write it to the disk, or attempt to load it at the start of the build. This could improve the performance of the build, as well as could lower the memory usage somewhat.
This configuration options should also be available for build tasks to query, as they themselves can optimize their behaviour based on this.
Important to note that the build system itself shouldn't automatically configure itself for CI mode by querying environment variables or others. This option should be explicit.
Relevancy
This option could increase the performance of the saker.java.test
task, as it will need less instrumentation to run. It would still need some instrumentation, as the file synchronization needs to be performed for the test cases. That could also be avoided if there was an option that marks the tests as not using files. In that case, any kind of instrumentation can be avoided.
This issue servers as a place of discussion for build cache related implementation.
As of the current state (2020.01.12.), there is a basic implementation of the build cache that passes tests. It is not finished, the build daemons doesn't support this feature yet, and there is no persistence behind the build caches. There is a memory based implementation that is used for testing only.
For the implementation, we should consider the following.
Task requirements
Evaluate what kind of requirements do we impose on tasks that can be cacheable.
Communication
Communication with the build cache can be done in two ways. Either using the saker.rmi library, or using a more common protocol.
Since the usage of the build cache for other purpose than with the saker.build system is not a design goal, the saker.rmi solution seems more appropriate.
Either way, the build cache will be accessible through an abstract interface and the protocol could be replaced without disruption.
Security
The build caches will usually run on a shared server that is accessible from outside. There needs to be some security measures that ensure that only the authorized clients can access data from the cache, and only the authorized clients can publish to the build cache.
The authorization could be implemented using certificates that will be used for an SSL connection with the build cache server. The server examines this certificate, and provides access to the features that the client is allowed to use.
The certificates doesn't need to be issued by some known provider, it can be managed in-house by the maintainer of the build cache. In general, there should be read and write certificates that the server recognizes.
The read certificates can be used to download content from the build cache, but doesn't allow publishing. This can be used by the developers. The write certificate allows publishing to the cache. It should be used on CI servers that publishes the results to the cache. These results can later retrieved by the clients of the cache.
Performance
We need to determine when are the suitable use-cases for the build cache to be used during build execution. If the build cache is contacted for small incremental changes, then it can degrade performance. However, if we only use the build cache for clean project builds, then it may be used too rarely to provide an advantage.
This part is open for discussion. We probably should do some heuristic based cache tries.
Settings
In order for the build cache to work, one very likely needs hash based file change tracking.
When a build is run with a build cache, we could change the default mechanism to be hash based insteda of file attributes. It can be overridden by the user, but the default may be changed.
Persistence
The published build cache data should be persisted by the server. The frequently queried data could be kept in memory.
A mechanism for efficient lookup and storing should be implemented. The build cache works with byte blobs and not structured data. We could either use some third party library/software, or implement our own solution. Using third party software may impose license and other maintenance related restrictions on the build system.
There are some unreleased resources for the loaded classpaths by the build system. This may include script and repository classpaths.
Some tests of the build system fails sometimes when the loaded classpath metric is asserted. Each test should clean up the loaded classpaths after themselves, however, there are still failures sometimes for the tests.
The tests are flaky, they don't always fail. This might be an issue with the test metric instrumentation, or the classpath loading.
This is not a breaking issue for using the build system. If confirmed, it may cause higher resource usage for long running daemons, or locked file-system resources if classpaths are loaded from JARs.
When this issue can be closed if fixed?
Well, lets say 2 months after the fix if the issue no longer surfaces. Due to the flakiness of the bug, we can't reliably determine if it has been actually fixed. Or create a test that reproduces the issue.
If an input parameter is specified for a build task, and that task doesn't recognize that parameter, a warning should be issued during build.
This should be done by enhancing the TaskUtils.initParametersOfTask
method to warn about the unrecognized parameters.
This enhancement is only applicable if a task uses the default parameter initializing behaviour specified by TaskUtils
(I.e. the @SakerInput
and related annotations). If a task initializes the parameters manually by overriding ParameterizableTask.initParameters
, then it is the responsibility of the task to issue warnings.
The issued warnings should report the position of the build task in the script.
There are many cases where repeating configuration may occur. In build scripts where the same tasks are invoked multiple time with slightly different inputs, some task inputs need to be duplicated.
An example:
saker.maven.resolve(
Dependencies: #...
Configuration: {
Repositories: #...
}
)
saker.maven.resolve(
Dependencies: # different from above
Configuration: # same as above, but repeated
)
A quick solution for this is to export the value of the recurring configuration into a static or global variable:
static(THE_CONFIGURATION) = # ...
saker.maven.resolve(
Dependencies: #...
Configuration: static(THE_CONFIGURATION)
)
saker.maven.resolve(
Dependencies: # different from above
Configuration: static(THE_CONFIGURATION)
)
Works, but the developers still needs to assign the same parameter for each task invocation. This makes the build script harder to write and maintain.
A solution should be made to avoid repeating the same inputs for the tasks.
We could add support for this in the core build system. This can cross scripting language boundaries, and needs an unified way to define default inputs for given tasks.
This is harder to achieve, as this may limit different script language developments.
This solution requires support from each scripting language. This has the advantage that the default values can be handled by the scripting language implementation, therefore the values can be defined in a way similar to the script language.
The location of the build script that contains the default values could be set with a scripting option. This allows full customization form the build language part.
Go with option 2. The defaults support should be implemented by each build language by themselves.
For SakerScript:
A new script option is added, that is a path to the defaults file.
E.g.
-SOdefaults.file=defaults.build
The option value is to be interpreted against the working directory of the build execution.
The defaults file should be part of the script language configuration. That is, the language configuration wildcard should match the defaults file as well.
The syntax of the defaults file is similar to normal build script files, additionally with the following rules:
defaults()
build task. See below.defaults()
taskThis built-in task can only be used in the defaults build file.
This task has an unnamed first parameter that is a literal or list of literal task names that the defaults apply to. Any additional parameter is the default value for the given task.
The task can only be used as a top level expression. That is, cannot be used in if
s, foreach
es, and as nested expressions. It has no return value.
The defaults file:
defaults(
my.task,
Parameter: val1
Default2: 123
)
With a build file:
my.task(
Second: foo
Default2: abc
)
Will equal to writing the build file:
my.task(
Second: foo
Default2: abc
Parameter: val1
)
Any parameter that was set as default in the defaults file can be overridden at the call site.
This should not require any modification to build system API. The defaults file should be read as part of a new internal build task. This ensures that a single defaults file is only read at most once per build execution, and is not re-read in case of incremental builds.
The defaults file should report no build targets.
The script positions in the defaults file should be correctly reported in case of exceptions.
The script modelling implementation should be modified to add proposals newly added defaults()
task, but only in the defaults file. Information retrieval for the defaults()
task should be available in non-defaults files as well.
Add syntax for switch expressions in build script.
One commonly occurring scenario is to assign some variable or perform some action based on the value of another variable. Currently this can be done using if-else statements:
$platform = # ...
if $platform == win32 {
# ...
} else if $platform == macos || $platform == ios {
# ...
} else {
abort("Unrecognized platform { $platform }")
}
This is harder to maintain as the variable name needs to be repeated and is uncomfortable to edit.
This can be solved by introducing switch expressions.
switch $platform {
case win32 {
# ...
}
case macos, ios {
# ...
}
default {
abort("Unrecognized platform { $platform }")
}
}
There is no fallthrough between the case blocks.
Multiple values to match for can be separated using a comma.
The statement basically checks the value of the subject for equality with each case label in order and executing the first one that matches.
The execution order is the following:
default
branch if present.Further improvements
The switch expression can also be improved to return a value. In this case the block may be omitted, and a returned value can be represented with a separator colon :
. Similarly to the foreach
expression:
$value = switch $platform {
case win32: 123
case macos, ios: 456
default: 999
}
It is not a syntactic error to not declare a return value for a given label, in that case the expression will return no-value that will result in an exception when its dereferenced.
The result value can be combined with blocks:
$value = switch $platform {
case win32 {
print("platform is win32")
}: 123
case macos, ios: 456
default {
abort("Unrecognized platform { $platform }")
}
}
Essentially both the block and result value are optional, but at least one of them must be present. (Similarly to foreach
)
No local variables can be declared for switch expression blocks. (As they don't run in a loop, target-level variables can be used.)
Analyze if rerunning a subset of the tasks given a set of inputs is possible.
Example:
The build system should support dynamic reallocation of the computation tokens of duplicated inner tasks to increase concurrency.
A scenario when this matters:
In cases where task A is started first, it could quickly exhaust all of the computation tokens available on a PC. This causes task B to wait until A is finished as it cannot start due to the lack of computation tokens.
With some additional timing information, we can see how this affects build times.
Task A: 25 CPUmin
Task B: 1 CPUmin
Task C: 4 CPUmin
If the current PC has 5 computation tokens, it will run as follows without token reallocation:
Task A, Task B, Task C = 5 + 1 + 4 = 10 min
With token reallocation, Task B and C can run alongside Task A (as 1 token from A was reallocated to B and C), however, Task A would take longer.
5 mins of running 20 CPUmin of Task A (4 tokens) and 1+4 CPUmin for Task B and C.
1 min for running the remaining 5 CPUmin of Task A.
This totals in 6 minutes that is much less than without reallocation.
Solution is to dynamically reduce the allocated inner task computation tokens of a given task until a minimum of 1 (so it still runs) and reallocate them to new tasks.
Currently we require commas to be used in parameter lists, map entries, list entries, and in other places. Examine if these commas could be omitted in some cases.
In general, if one declares the parameters, entries, elements on separate lines, using commas are not stricly necessary as the new line could be interpreted as a separator.
Example:
my.task(
Param1: 123,
Param2: 456,
)
$list = [
1,
2,
3,
]
$map = {
Key1: val1,
Key2: val2,
}
buildtarget(
in inparam,
in defin = 123,
out outparam,
out defoutparam = 456,
)
Becomes:
my.task(
Param1: 123
Param2: 456
)
$list = [
1
2
3
]
$map = {
Key1: val1
Key2: val2
}
buildtarget(
in inparam
in defin = 123
out outparam
out defoutparam = 456
)
While this solution is still as readable as the above, it can be easier to edit the script itself in an IDE.
Inserting a new element in the enumeration doesn't require the developer to start with the insertion of a new comma, but rather can just insert the entry with a new line.
Describe the bug
I was following the guide on how to publish packages to maven and the error occurred
To Reproduce
To reproduce follow This
Expected behavior
It to successfully build the jar into build/saker.jar.create/output.jar
Environment information
Bug nature
Build execution
$ java -jar saker.build.jar -bd build export
java.lang.NoSuchFieldError: VERSION_FULL_COMPOUND
at saker.build:9:14-79
at saker.build:7:11-152
at saker.build:7:2-152
at saker.build:4:1-182
at saker.build:20:11-26
at saker.build:20:11-33
at saker.build:20:2-33
at saker.build:17:1-173
at saker.maven.classpath.main.MavenClassPathTaskFactory$1.run(MavenClassPathTaskFactory.java:99)
Exception in thread "main" saker.build.exception.BuildExecutionFailedException: saker.build.task.exception.MultiTaskExecutionFailedException: BuildTargetBootstrapperTaskIdentifier[buildFilePath=wd:/saker.build, buildTargetName=export, workingDirectory=wd:, buildDirectory=]
at saker.build.launching.BuildCommand.runBuild(BuildCommand.java:661)
at saker.build.launching.BuildCommand.call(BuildCommand.java:558)
at saker.build.launching.Launcher.lambda$parse$1(Launcher.java:1404)
at saker.build.launching.Launcher.callCommand(Launcher.java:1807)
at saker.build.launching.Launcher.main(Launcher.java:1785)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at saker.build.launching.Main.main(Main.java:67)
Build configuration
https://github.com/Sipkab/example-gh-maven
cloned to https://github.com/unlimitedcoder2/example-gh-maven
so I could upload the package to gh maven
Project Loom introduces virtual threads for the JVM. It doesn't support unscheduling virtual threads that are in a synchronized
block. As virtual threads will be a very important feature for the build system, and it can greatly improve performance, it is necessary to switch to using locking primitives instead of synchronized
blocks where appropriate.
The usages of synchronized
blocks should be refactored where I/O or RMI operations are expected to happen.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.