adoptium / aqa-test-tools Goto Github PK

Home of Test Results Summary Service (TRSS) and PerfNext. These tools are designed to improve our ability to monitor and triage tests at the Adoptium project. The code is generic enough that it is extensible for use by any project that needs to monitor multiple CI servers and aggregate their results.

License: Apache License 2.0

JavaScript 8.59% HTML 1.12% CSS 0.13% Shell 0.06% Batchfile 0.01% Ruby 0.06% Jupyter Notebook 89.72% Python 0.21% EJS 0.09% Dockerfile 0.01% Groovy 0.01%

aqa-test-tools's People

Contributors

Stargazers

Watchers

aqa-test-tools's Issues

Add Parsers & Perf Graph for AcmeAir

Problem Description

Parsers need to be added for the following benchmarks so that they can be monitored using the existing tools (Perf Graphs) and tools under development (Tabular View), allowing the summarization and visualization of the perf results.

Benchmarks Parsers to Add

Acmeair: https://github.com/acmeair/acmeair
Octane: https://chromium.github.io/octane/

Steps for Adding a Parser

I'm adding some steps here that I provided in the internal GitHub so that others can refer to this documentation for adding more parsers in future.

Our parsers in Test Result Summary Service (https://github.com/AdoptOpenJDK/openjdk-test-tools/tree/master/TestResultSummaryService/parsers) expect all the results to be in the Jenkins output, so that we don't need to parse results from different files. Hence, we might need to output the results from some log file to the main Jenkins output. If TRSS is monitoring that Jenkins pipeline and it's marked as "Perf" in the build monitoring list (as we discussed in our meeting), then it would parse all the results once the builds are done.

All Jenkins builds should output the following info (Testci string, Benchmark Name, Benchmark Variant, Product) at the start of each benchmark build.

Sample Output:

echo "********** START OF NEW TESTCI BENCHMARK JOB **********"
echo "Benchmark Name: LibertyStartupDT Benchmark Variant: 17dev-4way-0-256-qs"
echo "Benchmark Product: jdk8u181-b13-openj9-0.9.0"

We use the following regexes in TRSS for parsing that info as shown below:

const benchmarkDelimiterRegex = /[\r\n]\*\*\*\*\*\*\*\*\*\* START OF NEW TESTCI BENCHMARK JOB \*\*\*\*\*\*\*\*\*\*[\r\n]/;
const benchmarkNameRegex = /[\r\n]Benchmark Name: (.*) Benchmark Variant: .*[\r\n]/;
const benchmarkVariantRegex = /[\r\n]Benchmark Name: .* Benchmark Variant: (.*)[\r\n]/;
const benchmarkProductRegex = /[\r\n]Benchmark Product: (.*)[\r\n]/;

For startup and footprint, we start an app multiple times (usually 8 times). Before each run, we print "Warm run <Iteration#>" (i.e. outerRegex), which works as an outer regex to get smaller section to parse. Then we use another regex (i.e. regex) to parse the metric value from each smaller section. So we can parse results for each iteration. You can see sample parsed data here (#73 (comment)).

For throughput, we just use one regex (i.e. "Throughput: ") for AcmeAir. AcmeAir can use the same parser for startup, which is already there in the code.

Snippet for Liberty Startup

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/524d16e3784c17f4af6cee75f9105eb929397792/TestResultSummaryService/parsers/BenchmarkMetric.js#L13-L28

Sample Liberty AcmeAir Startup Job:

Warm run 0 //Outer Regex
Footprint (kb)=148944 //Inner Regex
Startup time: 4530 //Inner Regex
...
Warm run 1 //Outer Regex
Footprint (kb)=148120 //Inner Regex
Startup time: 4134 //Inner Regex

...
# Other Iterations

Sample Liberty AcmeAir Throughput Job:

Throughput: 2424.19 //Regex

To Add a Perf Graph

For adding a perf graph widget, you can refer to one of the existing perf graphs such as ODM: https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/master/test-result-summary-client/src/Dashboard/Widgets/Graph/ODM.jsx

Simplify Benchmark Parser Design

Problem Description

Currently, the benchmark parser design is slightly complicated since it uses terminology such as regexRepeat and outerRegex, which requires one to have a good understanding of BenchmarkParser.js in order to add parsers for any new benchmarks.

We want to simplify the design as much as possible so that one can easily add parsers with having minimum knowledge of the codebase. Simplifying the parser design would be significantly helpful once we start adding more and more benchmarks to PerfNext and Openjdk-tests Framework.

Proposed Changes

1) Remove regexRepeat

It's confusing when regexRepeat should be set to true or false as different benchmarks require different value as shown below.

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/524d16e3784c17f4af6cee75f9105eb929397792/TestResultSummaryService/parsers/BenchmarkMetric.js#L13-L19

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/524d16e3784c17f4af6cee75f9105eb929397792/TestResultSummaryService/parsers/BenchmarkMetric.js#L29-L34

In BenchmarkParser.js, we should just use regex to split blocks to see whether there are multiple values for that metric.

2) Factor out outerRegex

Currently, each metric has its outerRegex, something that's a bit redundant. The benchmark should have the outerRegex instead of each metric having that since outerRegex would be the safe for all metrics under a benchmark. We'll stick to this design to simplify things.

As shown below, both Footprint and Startup metrics have the same outerRegex, and hence it could be moved out to benchmark level. This step would significantly reduce code size once we add more combinations of benchmarks and metrics.

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/524d16e3784c17f4af6cee75f9105eb929397792/TestResultSummaryService/parsers/BenchmarkMetric.js#L13-L25

3) Soft-code Liberty

Currently, parser for Liberty throughput is hard-coded to run # 5. This could change depending on the number of warmup and measure runs. Also, it doesn't take care of storing the values from multiple measure runs. We need to store the values of each measure run in the value array, similar to how we do it for Startup and Footprint metrics.

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/524d16e3784c17f4af6cee75f9105eb929397792/TestResultSummaryService/parsers/BenchmarkMetric.js#L8

4) Remove redundant checks from BenchmarkParser.js

Some checks might not be needed.

5) Add detailed comments

This would save time for someone who isn't familiar with the Benchmark parser code and just wants to add regexes for new benchmarks. Also, we wouldn't need to provide instructions every time as done here: #119 (comment)

Assigned Contributors

I'll be working with Dong (@dhlee49) on the new design.

Support for Third Party Code Used in PerfNext

Before PerfNext was open-sourced, we were using a couple of third party libraries. While open-sourcing, we had to remove them when the code was moved to the open AdoptOpenJDK openjdk-test-tools repo since we shouldn't maintain third party code here.

Due to other higher priority issues and resource constraints, we had just removed the third party libraries without adding proper support for some of them.

Currently, one would be required to get those libraries from various places in order to run or deploy PerfNext directly from openjdk-test-tools repo. Hence, we need to add proper support for that third party code by using either online hosted libraries or npm modules so that it's easy for anyone to use PerfNext.

I'll be working on adding this support.

TRSS install throws error on got-2

I am trying to use TRSS for our daytrader3 application. When is do a npm install, i get the following error

npm ERR! code E404
npm ERR! 404 Not Found: [email protected]

npm ERR! A complete log of this run can be found in:
npm ERR! $HOME/.npm/_logs/2018-09-24T05_17_54_494Z-debug.log

The log file says :
error code E404
error 404 Not Found: [email protected]

The node version is v10.8.0 and the npm version is v6.4.1.

Update trss to monitor newly named test jobs

Since the test job renaming, no test results are reported. Assume its a naming convention thing...

TRS Server can't use Jenkins Password that has Special Characters

Relevant Code Snippet

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/414ec78d4edadf6542ba2a53357e73037494ba21/TestResultSummaryService/JenkinsInfo.js#L64-L77

JenkinsInfo.js can't use the password that has special characters such as "!" to communicate with Jenkins properly, giving 401 Unauthorized error.

Helpful Links

https://github.com/jansepar/node-jenkins-api
We tried replacing the original code with this snippet but didn't have much luck:

// Password that needs to be %-encoded
const { URL } = require('url');
const jenkinsUrl = new URL('https://[email protected]');
jenkinsUrl.password = 'some_weirdPASSWORD123!@#$%^&*()~`\\/;\'';
var jenkins = jenkinsapi.init(jenkinsUrl.href);

Workaround

Can use a password that doesn't have special characters.
Can use a token instead of password: https://stackoverflow.com/questions/45466090/how-to-get-the-api-token-for-jenkins.

Define API for aggregate dashboard data

From #16, data that matters:
[ # pass, # fail, # excluded, # lastPass, # lastFail, # lastExcluded, platform, impl, version, testGroup ]

definitions:
TA = totalsArray is [totalPass, totalFail, totalSkipped, totalExcludes]
sdkID = combo of Jenkins Job/BuildID info and SHAs that uniquely define the sdk binary being tested (shas of CL, VM, OMR, ...)

API:
TA = getTotals(sdkID)
TA = getTotalsForGroup(testGroup, sdkID)
TA = getTotalsForPlatform(platform, sdkID)
TA = getTotalsForJDKImpl(impl, sdkID)
TA = getTotalsForLevel(impl, sdkID)
TA = getTotalsForPlatformAndGroup(platform, testGroup, sdkID)
TA = getTotalsForLevelPlatformImpl(level, platform, impl, sdkID)
TA = getTotalsForLevelGroupPlatformImpl(level, group, platform, impl, sdkID)
sdkID = getPreviousBuildID(sdkID)
sdkID = getPreviousReleaseID(sdkID)

The underlying implementation of this API can be optimized and reuse common code (as most methods do the same activity, pulling pass/fail/skip/excludes data from the DB and add them to create totalsArray). In initial MVP, excludes data is not populated.

Note: we will use the combo of url, buildName and BuildNum as sdkID until we update the parser code to store shas of CL, VM, OMR.

Store Build Configuration Output For Each Benchmark Run

Problem Description

Originally, we were running multiple iterations of the benchmarks in one Jenkins build (i.e. Old design). Now, we're moving to a new design in #24 in which a parent build would launch multiple child builds, each child build acting as a single benchmark iteration. Hence, we'll be running multiple child jobs instead of one Jenkins builds with multiple iterations inside so that we can interleave Jenkins builds.

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/eb9c0d302759787e15afe96fe8d24b0e2b1f907c/TestResultSummaryService/parsers/BenchmarkParser.js#L5

Currently, we're just getting the output for all the iterations and not storing any output before the first ********** START OF NEW TESTCI BENCHMARK JOB **********. This is a bug with the old design since we would only show the the output of the first iteration (value.tests[0]._id) when the user clicks on a Jenkins build even though that Jenkins build may have multiple iterations, which won't be displayed. This problem will be fixed with the new design.

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/eb9c0d302759787e15afe96fe8d24b0e2b1f907c/test-result-summary-client/src/Build/TopLevelBuilds.jsx#L73-L86

Proposed Changes

We should store the complete output or some useful info before the start of each benchmark (i.e. Before ********** START OF NEW TESTCI BENCHMARK JOB ********** is printed). This would help in having access to the output for the steps that would have been used to download and configure all the benchmark material before the actual benchmark run.

Support Plugin structure

Different users may want to add additional functions or store additional information into database. The current design is not flexible to allow users to do so without changing the core structure. We need to create a plugin structure to allow users to add their own functions and features. The idea is that the application can run its main features with or without the plugins. Also, the plugin files may or may not need to be stored in this repo. The application will search the folder and run all plugins available at runtime.

Set LocaleProvider once

<LocaleProvider locale={enUS}> is set in several places. We should only need to set <LocaleProvider locale={enUS}> once.

https://github.com/AdoptOpenJDK/openjdk-test-tools/search?q=%3CLocaleProvider+locale%3D%7BenUS%7D%3E&unscoped_q=%3CLocaleProvider+locale%3D%7BenUS%7D%3E

Add logo on the header

It would be nice to have the Adopt logo added on the header.

Color needs to be changed to white as the header is black
It should be clickable and link to main page

BlueOcean link does not work for builds under folders

BlueOcean link work for regular builds, but it does not work for builds under folders:

For example, jdk11u-aix-ppc64-hotspot build is under the folder build-scripts/job/jobs/job/jdk11u, TRSS returns BlueOcean link:

https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/blue/organizations/jenkins/jdk11u-aix-ppc64-hotspot/detail/jdk11u-aix-ppc64-hotspot/129

But the actual link should be https://ci.adoptopenjdk.net/blue/organizations/jenkins/build-scripts%2Fjobs%2Fjdk11u%2Fjdk11u-aix-ppc64-hotspot/detail/jdk11u-aix-ppc64-hotspot/129

Dynamically Fetch Perf Pipeline Names & Support Multiple Metrics in Perf Graphs

Problem Description

Currently, Perf Graph uses hard-coded names for pipelines. We should dynamically get this list from the database by looking at all the perf pipelines stored in it as it's done for Tabular View.

I've shown relevant snippets from ODM graphs, but other graphs have the same issue.

Hard-coded Pipeline Names
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/628e800b2e8a7005e80a156dd0ebf2b41ae39eb4/test-result-summary-client/src/Dashboard/Widgets/Graph/ODM.jsx#L11-L13

Also, Perf Graphs currently support the display of only one metric. We should display all metrics related to a benchmark run on the graphs. We should have the option of switching on and off the display of different metrics so that it allows us to limit the data for easier visualization.

Support for one metric only
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/628e800b2e8a7005e80a156dd0ebf2b41ae39eb4/test-result-summary-client/src/Dashboard/Widgets/Graph/ODM.jsx#L111-L116
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/628e800b2e8a7005e80a156dd0ebf2b41ae39eb4/test-result-summary-client/src/Dashboard/Widgets/Graph/ODM.jsx#L185

Furthermore, we currently assume that the the supported metric is in index 0. That assumption was only valid initially when we were just parsing one metric for ODM. As we add more metrics in the parser, the order could be different.

No longer true Assumption
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/628e800b2e8a7005e80a156dd0ebf2b41ae39eb4/test-result-summary-client/src/Dashboard/Widgets/Graph/ODM.jsx#L90

Proposed Changes

To fetching pipeline names, we can use the same or similar query used in Tabular view.
To display all metrics, we'll need to loop through the metrics array and show their data on different lines.

This issue will be looked at after we add the support for aggregated data in #106 and clean up the Perf Graph code in #118.

Filter by result in job list view seems broken

From list view, in the Result column, trying to filter by status returns in error, see screen grabs for details:

Store HW Specific Benchmark Variables for Perf Testing

Currently, for internal testing, we are using master_machine_list.xml, a file that has info about all machines regarding their HW and specific benchmark variables, with PerfNext to dynamically populate HW specific benchmark variables. We would need a similar file for enabling perf testing on AdoptOpenJDK.

Some Options for Storing HW Specific Benchmark Variables

1) Use Environment Variables for Jenkins Node

https://<JenkinsURL>/computer/<machineName>/api/

Pros: 1) Everything in one place 2) Can possibly be integrated with the new machines data file generated from openjdk-jenkins-helper for PerfNext: adoptium/jenkins-helper#25 3) PerfNext and Openjdk-tests Framework can use just one file that has all info about the machine and the HW specific benchmark variables

Cons: 1) Hacky! We won't really be using the environment vars as expected. Instead of using the vars in bash (i.e. $VAR_NAME), we'll be storing the XML data and parsing that. 2) "Agent Config History" doesn't keep track of changes for environment vars. 3) Not friendly UI to edit configs

2) Use some existing Git Repo

Pros: 1) Clean approach 2) Git takes care of versioning 3) Can be easily edited

Cons: 1) Won't integrate with the new machines data file generated from openjdk-jenkins-helper for PerfNext: adoptium/jenkins-helper#25 2) PerfNext and Openjdk-tests Framework would need to use 2 separate files instead of just one that has all the info about the machine and the HW specific benchmark variables

Ignore invalid build url when monitoring builds

Currently, we do not check the exception message. We try to connect to build url 5 times regardless what kind of exception we get and set the status to Done.

In fact, if the error is 404 (invalid url), we need to ignore this url. But for other errors (i.e., ESOCKETTIMEDOUT), we should retry and keep the build status to NotDone. In this way, we can try it later in the next round.

Extend Functionality for Dynamically Fetching HW Specific Environment Variables

Problem Description

For running any benchmark, we need to fetch the HW specific variables that change depending on the machine selected. We also populate these HW specific variables on PerfNext GUI, as shown below in the screenshot, so that users are able to change the affinity and other commands if needed.

Currently, environment variables are hard-coded for just one machine right now. So, you would need to manually change environment variables such as CLIENT, DB_MACHINE, LIBERTY_HOST and AFFINITY while running some benchmarks, especially the ones that require more than one machine, such as Liberty DayTrader.

One example of hard-coded configs:
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b528a63ea146080f4bb1430740a734c2af87df7e/PerfNext/config/benchmarks/data_simple/Liberty.xml#L18-L31

Proposed Solution

Task 1:

We should remove all machine specific environment variables from the various XML configs files under PerfNext/config/benchmarks/data_simple since they are already present in the /config/master_machine_list.xml, which has all the machine specific configs, a file that the user provides while deploying PerfNext.

Snippet from master_machine_list.xml

                    <capability id="41" name="LibertyDayTrader" bits="all">
                        <property id="1" name="client">perfxposh10G</property>
                        <property id="2" name="dbMachine">perfxposh10G</property>
                        <property id="3" name="dbHome">/home/db2inst1/</property>
                        <property id="4" name="dbName">day30r</property>
                        <property id="5" name="dbUserName">db2inst1</property>
                        <property id="6" name="appServer">DayTrader3</property>
                        <property id="7" name="dbPort">50000</property>
                        <property id="8" name="libertyPort">9080</property>
                        <property id="9" name="scriptName">tradelite.sh</property>
                        <property id="10" name="clientWorkDir">/java/perffarm/liberty</property>
                        <property id="11" name="libertyHost">kermit10G</property>
                    </capability>

Task 2:

While PerfNext was being developed internally, I had already added majority of the functionality for fetching HW specific environment variables such as CPU Affinity variables by coding various functions (shown below) for it in PerfNext/public/lib/js/util.js. We need extend this functionality to fetch other HW specific environment variables that might be missing.

Snippet Code from PerfNext/public/lib/js/util.js:
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b528a63ea146080f4bb1430740a734c2af87df7e/PerfNext/public/lib/js/util.js#L423-L635

Assigned Contributors

Awsaf (@pinicman) from my team would work on adding this functionality.

Performance Analysis Tools (Proposal from Developer JumpStart Tech Challenge)

Proposal Name: Performance Analysis Tools
Proposal Owner: Piyush Gupta / Shelley Lambert / Lan Xia
Technical Mentor: Piyush Gupta
Team: @AdamNatale @Variony @acarnegie @armaanfarhadi @kguirguis

Temporary Branch for the Development of these Features:

https://github.com/AdoptOpenJDK/openjdk-test-tools/tree/pat

Blurb (Short overview of the Proposal):

Since performance is crucial for any product that we use in our lives, developers are always striving to evaluate and boost our performance on various workloads by running the latest releases and development builds against different benchmarks and by identifying opportunities for compiler optimizations. As part of this JumpStart Challenge, we’re looking for people to brainstorm and develop a new solution that would enhance our capabilities to spot performance issues with ease.

Currently, Performance Measurement & Analysis (PMA) team and the Runtimes test team collaborate to build new tools and infrastructure to adapt to the changing requirements of users and to the open-source development concepts. These tools such as PerfNext and Test Result Summary Service (TRSS) have been pushed to the open Adopt OpenJDK repo: https://github.com/AdoptOpenJDK/openjdk-test-tools. Under TRSS, we have a dashboard for displaying performance results from daily runs. While this dashboard has some basic functionality of displaying numbers, we could add new features for identifying and monitoring regressions and automating the investigation of these issues. This in turn would improve the efficiency of our performance monitoring and drive faster turnaround as issues are detected.

Please describe the business problem your customers (e.g. external clients, internal team, etc.) are experiencing OR the improvement/opportunity that could be brought to them.

Our PMA team manages the performance monitoring and problem investigation for Eclipse OpenJ9 (https://www.eclipse.org/openj9/) and Java releases from AdoptOpenJDK (https://adoptopenjdk.net/) on all supported hardware platforms. We are also responsible for publishing official performance scores for each Java release. Since performance is of paramount importance to the Java customers, we strive to evaluate and boost our performance on various workloads by running our latest releases and development builds against different benchmarks and by identifying opportunities for compiler optimizations.

Due to large number of benchmark variants and platforms, it’s challenging to identify performance regressions and gains. Currently, we’ve relied on a tool called Traffic Lights that helps in displaying the results from the performance runs. This tool is old and not very flexible and lacks performance monitoring abilities. As a result, we need to develop a new solution that would enhance our capabilities to spot performance issues with ease.

Developers need to know quickly whether their changes cause performance regressions. The sooner this is discovered, the ‘cheaper’ it is to correct and fix the code that introduced the regression. Developers depend on PMA team to run benchmarks, measure and analyze performance results. The PMA team is understaffed and cannot possibly keep up with the growing number of requests from dev team. An effort has begun to make it MUCH easier for developers to run benchmarks and analyze results themselves. Easy-to-use tools empower developers, making them more autonomous and our projects more agile.

What is the key issue (customer pain) or benefit that motivates the need for this project?

Key issue: Performance testing is hard and not standardized, making it difficult for developers.
Key benefit: With easier tooling and approaches, we ‘crowd-source’ the task of performance measurement, empower the development team and make projects more agile.

We have some features we already want to see incorporated which we understand are common tasks manually done by developers. Some of these include use of profiling tools, and looking at additional inputs (such as JIT or GC logs to gather and correlate more data for problem determination).

Better data visualization of results is also an area of great interest. Here is the data we gathered, what is the most compelling way to represent it, to that its quickly communicated and shared with interested parties.

We need to brainstorm the features that need to be added to TRSS and then choose and implement the ones that would provide most benefit to all developers. Currently, PMA team members would be required to look at the graphs and carry out further investigations by launching some more runs and identifying the commit that might be responsible for the regression.

Developing these new tools would benefit everyone since we’ll be able to triage new regressions more easily. Having automated monitoring abilities would significantly reduce PMA team’s workload, allowing it to go deeper into the code issues and to help developers to resolve issues faster.

How might the results of the project be used after the Challenge?

Results of this challenge would be reviewed and potentially incorporated into our live tools.

What are the key technical and business goals?

Technical: Design and develop new features that would help in identifying and investigating Java performance regressions with ease

Business: Display performance results and identify regressions such that the PMA team can improve efficiency while scaling up on Eclipse OpenJ9 performance coverage. Easily articulate the benefits of our products to potential customers.

What specialized skills might be beneficial for the project?

Experience with web development (React, Node.js, JavaScript)
Ability to contribute to the new Jenkins based performance infrastructure and to develop new features to meet performance analysis needs
Data visualization experience
Statistical analysis

Store build information in a file

BuildStatus.jsx contains information of builds that TRSS needs to monitor in dashboard (not insert into db).

 OpenJ9: {
        url: "https://ci.eclipse.org/openj9/",
        builds: [
            "Pipeline-Build-Test-All",
            "Pipeline-Build-Test-JDK8-linux_390-64_cmprssptrs",
...

Instead of keeping these information in *.js, we should keep it in a .json file. In this way, change the information does not require re-build and each user can have their own file.

New feature to add brief notes to a particular pipeline or job

To aid concurrent triage efforts (where several people are triaging builds at the same time), add ability to annotate jobs in TRSS

This can also be a location where automated search for existing issues dumps links to found issues, to further aid triage.

Extract Perf Results From Adopt Perf Builds

Currently, TRSS can't parse the perf builds running at Adopt (https://ci.adoptopenjdk.net/view/Test_perf/) as perf builds using perf parser (i.e. BenchmarkParser). We mainly need to do the following in order to achieve that:

Decide on a convention for benchmark name and variant since Adopt tests (https://github.com/AdoptOpenJDK/openjdk-tests/tree/master/perf) just use testCaseName and doesn't print out the regexes such as benchmark name and variants expected by TRSS. Only Liberty test does so.

Since benchmarks are classified in folders on Adopt, maybe we can use the folder name or something corresponding to that for benchmark name and we can use the testCaseName as the variant. I need to look into that more.

Related Issue: adoptium/aqa-tests#1144

Add missing parsers for benchmarks such as Dacapo and others.

Use Aggregated Perf Results in Perf Compare

Problem Description

With #73, we've added the ability to aggregate the perf results from multiple iterations. We should update Perf Compare to use and display aggregated perf results in order to comply with the new design.

Proposed Changes

Perf Compare should use the aggregated data for any build that's passed in the input for test or baseline build.

It should be able to support the following comparisons:

Master Build vs Master Build
Child Build vs Child Build
Master Build vs Child Build

Since we'll be using the aggregated data for each build, Perf Compare would get a significant boost in displaying the results.

Data to Display:

For parent build:

Display the aggregated data for the parent and the raw data for each of its child build.

For child build:

Display the aggregated data and the raw data for each test iteration.

Assigned Contributors

I'll be working with Sophia (@sophiaxu0424) from my team to add this feature.

Add Machine Schema for Validating Machine Data File for PerfNext

Currently, PerfNext fetches the latest machine list from a server as shown below.

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/eb9c0d302759787e15afe96fe8d24b0e2b1f907c/PerfNext/app/apis/machines.js#L8-L12

PerfNext expects the machine data file to be in a certain format so that it can be parsed accordingly and be used for populating the machine list and hardware environment variables associated with them.

We need to add the machine scheme that's expected by PerfNext so that this schema can be used to validate any machine data that could be generated on any Jenkins server before being used with PerfNext. For example, this machine schema could be used by adoptium/jenkins-helper#25 for validation.

Set up Testci on Adopt public server

We'd like to set up Testci on Adopt public server, which is a great source for monitoring and triaging the AdoptOpenJDK test jobs.

#https://github.com/AdoptOpenJDK/openjdk-tests/issues/850

#adoptium/aqa-tests#850

Originally posted by @sophiaxu0424 in #52 (comment)

Add Parsers for Various Benchmark Variants from PerfNext, Perffarm & Adopt

Problem Description

Currently, TRSS supports parsing of both PerfNext and Perffarm jobs. While PerfNext builds that use Jenkins are parsed right away after runs are done, Perffarm builds are parsed when Perf Compare is used.

While TRSS does have some parsers, it doesn't have benchmark parsers for many of the benchmark variants that exist on PerfNext and Perffarm, configurations that are used frequently. As a result, we need to manually parse them for the time being.

Furthermore, even if we have parsers for some configs, we parse only some metrics and haven't added support for others, which might be important to get full performance evaluation of a build.

Some of these configs were newly added to launcher tools in order to support new variants and benchmark versions. Also, parsers could have been missed for some of the less popular benchmark variants, something that we should still add in order to extend coverage.

Benchmark Parsers to Add

Perffarm & PerfNext:

Different Startup and throughput variants for Liberty
Various ODM variants
HiBench
SPECjbb2015
Any other missing

Adopt:

BumbleBench
Dacapo
Idle Micro
Liberty
Renaissance

Proposed Changes

Add new Perffarm variants to BenchmarkVariants.
Add new PerfNext variants to BenchmarkMetricRouter and BenchmarkMetric. PerfNext configs can be found here.

Assigned Contributors

I'll be working with Dong (@dhlee49) from my team to work on this design.

Update test parser logic

Recently some of the builds produced large output (~150M-200M).

For example:
https://ci.adoptopenjdk.net/view/all/job/openjdk11_j9_external_extended_tomcat_x86-64_linux/6/console

With #59, TRSS only stores the last 12M of output per test/build. But before storing, TRSS will process the whole output (in this case ~150M-200M) to figure out related info. If it is a test build, TRSS will break the whole output per test and store test related output individually.

In this case, TRSS needs to process ~150M-200M output to figure out how many tests within the output. Test parser is using regex to find a match for tests. It is very CPU intensive when the output is large. Currently, 100% CPU is used and UI stops responding to other requests.

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 7492 root      20   0 1896376 732992  26264 R 100.0  9.0 507:41.88 node
 7259 mongodb   20   0  307916  70356  24696 S   0.7  0.9   3:35.06 mongod

Instead of using regex to find test output, we may need to update the test parser logic to process the whole output line by line. And if a test is found, store the test related info.

If needed, we can further update the logic to stream the output. Process it block by block.

Aggregate and sub-aggregate tests dashboard

Nice enhancement idea from Martijn:

Use the external tests to track Java version support amongst popular libraries and frameworks so we can also identify which ones need help.

Display a name-of-project-java-version-tested-against-<pass|fail>

Matrix such as this (with hyperlinks to actual builds):
application status jdkversion implementation platform
scala-jdk8_j9 pass 8 j9 x64_linux_docker
scala-jdk9_hs pass etc...

where we have columns for application, jdkversion, implementation, platform, status (so they can be sorted by same) that includes all apps, elasticsearch, wildfly, etc for jdk8, 9, 10, 11

Fetch Latest Machines Data File

Problem Description

Currently, PerfNext expects the machine data file, master_machine_list.xml, to be placed at /config/master_machine_list.xml. PerfNext's backend has an API called '/api/machinesInfo' as shown below, which is used by PerfNext's frontend to get all the HW related info about machines that are available on PerfNext to be used for performance runs. More details regarding HW specific environment variables can be found in #32.

API Call from Frontend:
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b528a63ea146080f4bb1430740a734c2af87df7e/PerfNext/public/lib/js/benchmarks.js#L41-L45

Backend API:
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b528a63ea146080f4bb1430740a734c2af87df7e/PerfNext/app/apis/machines.js#L14-L19

Not fetching this file directly from the original source (i.e. build server) would require one to update this file manually every time if some new machine is added to the build server or if some machine specific configuration has changed. Otherwise, PerfNext would not reflect the most updated machine data.

Proposed Solution

Instead of storing that machine file (i.e. master_machine_list.xml), PerfNext should directly download it from the build server (i.e. Jenkins, Axxon or something else) or anywhere where that machine data file might be hosted as PerfNext does for fetching build info in PerfNext/app/apis/builds.js.

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b528a63ea146080f4bb1430740a734c2af87df7e/PerfNext/app/apis/builds.js#L12-L33

This solution would make sure that PerfNext always accesses and uses the latest machine data file from whatever location specified by the PerfNext host.

User should provide the URL for machine data file in PerfNext/config/APP_DATA.json and set any credentials required to download it in the PerfNext/config/credentials.json file.

Assigned Contributors

Awsaf (@pinicman) from my team would work on adding this functionality.

Use Common Benchmark Metric Router Design for All Perf Tools

Problem Description

Currently, TRSS has 2 parsing mechanisms for extract benchmark metric values.

First mechanism is used for Jenkins, which is used by PerfNext and Adopt builds, to parse and store results automatically when a build finishes.

Second mechanism is used for Axxon schedular, which is used by Perffarm currently, to extract numbers from a CSV results file that's generated by Perffarm. Some files related to this mechanism are also used in all frontend perf tools such as Tabular View and Perf Compare.

Maintaining 2 mechanism adds too much redundancy, making it harder to maintain and support different tools.

Proposed Changes

Unify both the parsing mechanisms.
Add higherbetter and units to BenchmarkMetric. Currently, BenchmarkMetric doesn't have that information and we rely on BenchmarkVariants for it.
Get rid of BenchmarkVariants file.
Make an API so that all perf tools can request the backend for the BenchmarkMetric file and use it accordingly.
Update perf tools to use that API to fetch BenchmarkMetric file instead of using BenchmarkVariants file.

Assigned Contributors

I'll be working with Dong (@dhlee49) from my team to work on this design.

Add API to get twitter info for Smart Media

We would like to get twitter information and git clone for Smart Media project. For now, we can start with simple twitter query. For example:

client.get('search/tweets', {q: req.query.search, count: 100}, function(error, tweets, response)...

https://www.npmjs.com/package/twitter

Move Duplicate Code for Perf Graphs to Common Utils Library

Problem Description

Currently, we have 3 Perf widgets for Dashboard: DayTrader, ODM and SPECjbb2015. There is significant duplication of code between those 3 widgets since we just copied and modified the code from the first widget instead of using a common library while adding a new widget every time.

All perf graphs have the same purpose of displaying perf results for different benchmarks run on different platforms. Besides some specific data, everything else is the same among those widgets as shown below in the screenshots.

The graphs would have some minor feature difference because some of the features added by #84, where not extended to all 3 perf widgets.

As a result, it's not easy to add new widgets for new benchmarks without duplicating code from some existing widget.

Proposed Changes

We should use a library to keep the common code, in order to avoid duplicating code for different benchmark widgets. For example, currently, utils.js just has one function parseSHA, which is used in multiple widgets but there is still enough scope to clean up code by moving common code to this library.

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/524d16e3784c17f4af6cee75f9105eb929397792/test-result-summary-client/src/Dashboard/Widgets/Graph/utils.js#L1-L2

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/524d16e3784c17f4af6cee75f9105eb929397792/test-result-summary-client/src/Dashboard/Widgets/Graph/ODM.jsx#L159-L160

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/524d16e3784c17f4af6cee75f9105eb929397792/test-result-summary-client/src/Dashboard/Widgets/Graph/DayTrader3.jsx#L151-L152

TRS Code Incorrectly Uses Jenkins "Build" instead of "Project" At Various Places

Test Result Summary is using the Jenkins term "build" incorrectly instead of "project" in some places. So it's easy to get confused when the code might actually be referring to "project" and not the "build" in various places.

Sorry to be pedantic. I know it's not a big deal but it's would be good to update the wrong references so that it's easier to understand and maintain the code.

Terminology

https://jenkins.io/doc/book/glossary/

Project
A user-configured description of work which Jenkins should perform, such as building a piece of software, etc.
Example: PerfNext-Pipeline, Grinder, Daily-ODM

Build:
Result of a single execution of a Project
Example: Different Builds: 1, 2, ..., 99, 100...

Job
A deprecated term, synonymous with Project.

Some references that should use "project" instead of "build":

TRSS

BuildMonitor.js (File Name)
EventHandler.js (Function Name: monitorBuild())
getTopLevelBuildNames.js (File Name & var: "buildName")
getBuildHistory.js (File Name)

TRSC

TopLevelBuilds.jsx (/api/getBuildHistory?buildName=${buildName}&url=${url} & builds[url][buildName] in updateData())

MongoDB

buildList (Collection Name)

Node Packet Manager for TRSC (yarn vs npm)

Currently, README.md for TRSS says that yarn should be used for installing modules (i.e. yarn install) but we actually have test-result-summary-client/package-lock.json, which we get when we use npm (i.e. npm install).

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/cd7fa61f2dc94b7ce67ff78275ffd38911c52cec/TestResultSummaryService/README.md#L33

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/cd7fa61f2dc94b7ce67ff78275ffd38911c52cec/test-result-summary-client/package-lock.json#L1-L100

We should either update the README.md from yarn install to npm install or add the yarn.lock and remove package-lock.json if we use to use yarn install.

Remove React Workshop Guide from TRSC README.md

README.md for TRSC (https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/master/test-result-summary-client/README.md) has been copied from https://github.com/reach/react-fundamentals-workshop/blob/master/3-state/README.md. We should remove this guide from our repo since it's already outdated as shown in the screenshot and add a hyperlink to the original README.md from reach/react-fundamentals-workshop so that everyone can refer to the most updated info. This change would also free up our README.md since the guide is currently taking unnecessary space so that developers can focus on the info related to this specific project.

MongoDB needs to be upgraded

getTotals() API uses $graphLookup which is introduced in MongoDB 3.4.

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/72c0d5791123355a55f36a7a00d6f49a299d72b1/TestResultSummaryService/Database.js#L109

MongoDB on the server machine is MongoDB shell version: 3.2.22 and we got following error when running the query:

name: 'MongoError',
  message: 'Unrecognized pipeline stage name: \'$graphLookup\'',

MongoDB needs to be upgraded.

Ability to Configure User Specific Data such as Build Server URL

Currently, PerfCompare uses a hard-coded string http://perffarmServer as a sample Build Server URL at the following places:

openjdk-test-tools/test-result-summary-client/src/PerfCompare/PerfCompare.jsx
/openjdk-test-tools/TestResultSummaryService/routes/getPerffarmRunCSV.js

We need to use a variable instead of http://perffarmServer as a placeholder in order to set the Build Server URL, which is used to fetch benchmark results from the build server. One of the solutions could be to read user specific information such as URLs to various servers from a configuration file into placeholder variables. We might need a slightly different way of doing so for TRC client.

This could be done similar to how PerfNext deals with user specific configs. PerfNext reads the user specific data from PerfNext/config/APP_DATA.json in /openjdk-test-tools/PerfNext/app.js when the server starts. That's just an example so we can explore for better solutions.

Sophia Xu (@sophiaxu0424) would be working on this feature.

Dynamically Populate Machine List

Problem Description

Currently, PerfNext displays a static list of machines, available for launching benchmark runs, as shown below. PerfNext front uses this API /api/machines to get the list of machines. PerfNext host is required put this machine list inside /config/machines.json.

API Call from Frontend:
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b528a63ea146080f4bb1430740a734c2af87df7e/PerfNext/public/lib/js/benchmarks.js#L35-L39

Backend API:
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b528a63ea146080f4bb1430740a734c2af87df7e/PerfNext/app/apis/machines.js#L10-L12

Static Population of Machine List
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b528a63ea146080f4bb1430740a734c2af87df7e/PerfNext/public/lib/js/util.js#L4-L13

Proposed Solution

PerfNext should dynamically populate the dropdown menu for machine list depending on the platform selected by using the machine data file to search for all available machines that meet the platform and benchmark requirements. More info about machine data file can be found here in #33.

Assigned Contributors

Awsaf (@pinicman) from my team would work on adding this functionality.

Optimize Tabular View Code

In the Tabular view (#37), there are few places where the code seems to pretty costly to run. In interest of time and considering the first commit for Tabular View, we'll revisit the code for it to see how we can optimize it in order to reduce CPU usage.

For example, we are using distinct to get unique values for platforms and benchmarks. distinct is expensive to run.

Snippet from getTabularData.js

const platforms = await db.distinct("buildName", query);
const benchmarks = await db.distinct("aggregateInfo.benchmarkName", query);

For more details, please refer to #131 (comment).

Aggregate Perf Results From Multiple Benchmark Iterations

Problem Description

Currently, we don't aggregate numbers for multiple benchmark iterations when each Jenkins build is stored in the database. As a result, all results such as average, median and confidence interval need to be calculated when Perf Compare is used to compare 2 builds. This design is not preferred due to the following reasons:
1) It takes time to generate Perf Reports through Perf Compare.
2) Aggregated results are not stored so they would need to be generated every time they are needed even though they don't change.
3) It requires more CPU time and puts unnecessary pressure on the database.

These issues should be resolved with the proposed changes mentioned below. This would significantly improve the speed of getting results, which would be needed for different views such as Dashboard (#28) and Tabular View (#37).

Proposed Changes

Move the math library from frontend to backend: https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/eb9c0d302759787e15afe96fe8d24b0e2b1f907c/test-result-summary-client/src/PerfCompare/lib/BenchmarkMath.js
Generate all the aggregated results for a master build that may have single or multiple child jobs and store them in the parent object in testResults collection.

Additional data that needs to be added to Parent Object: benchmarkName, benchmarkVariant, benchmarkProduct, testData.
Additional data in testData for Parent Object: Aggregated numbers for all metrics: Mean, Median, Confidence Interval, Min, Max, StdDev
Note: For Liberty startup, there will only be 1 index in testData.metrics.[0].value for parent object.

Instead of Perf Compare generating the perf numbers, it should just make a request to the backend to query the database and fetch the stored results.

Assigned Contributors

Sophia (@sophiaxu0424) from my team will work on this feature.

Tabular View for Comparing Baseline and Test Builds for Selected Platforms & Metrics

Background About Benchmarking

For benchmarking, we always launch several iterations of a benchmark with a specific build to get performance results for various metrics such as throughput and startup time. These relative numbers are not very useful since they could change when benchmark is run on another platform, when the machine state isn't identical or when the configs are slightly different. Hence, we always use a baseline to gauge the performance of a newer test build.

While comparing baseline and test builds, it's important to use a relative number (Build 1 Score/Build 2 Score) instead of an absolute number (Build 1 Score - Build 2 Score) to look at the performance gap since the absolute number doesn't really mean much, could change and could have significantly varying range.

We usually use this formula to comparison:

Scenario	Example of Metrics	Comparison Formula
Higher is better	Throughput	Test Build/Baseline Build
Lower is better	Startup time, Footprint	Baseline Build/Test Build

Details about the Proposed Feature

Test Result Summary (TRS) should have the ability to create and show tabular views for comparing baseline and test build. Each view should show the relative comparison between baseline and test build in percentages corresponding to one specific metric and platform in a result cell. These result cells should be painted with different colors to classify the performance according to the table shown below.

Color Scheme for Result Cells

These tabular views would be extremely helpful in finding regression. I'm going to show the benefits of these tabular views with 2 examples.

Example 1:

SPEC Benchmarks

The tabular view above shows the results of all the SPEC benchmarks run on different platforms. From the results above, we can identify the following regressions easily:

x64 Regression (OS: Linux, Windows, macOS) for SPECjEnterprise & SPECjbb2015
Multi-benchmark Regression (SPECjEnterprise & SPECjbb2015) for x64 (Same as the 1st)
Single Platform Regression (Linux s390x) for SPECjbb2005

Example 2

Micro Benchmarks

The tabular view above shows the results of all the micro benchmarks run on different platforms. From the results above, we can identify the following regressions easily:

Cross-platform Linux Regression (HW: x64, ppcle64 & s390x) for ILOG ODM
Power Regression (OS: Linux & AIX) for HiBench

Requirements for Tabular Views

Basic requirements of this tabular comparison view:

Should show the relative comparison percent between baseline and test build for all platforms and metrics selected for that specific view.
Should the show the results of the latest test build against the latest runs for the baseline.
Should identify the score ranges with different colors.
Hovering over a result cell should show some basic information such as Java versions, confidence intervals and average scores
Open a new window with an unique URL when a result cell is clicked to show the full details of the runs such as score of each iteration for all metrics from those runs for both baseline and test build. This detailed view should be the same as the one shown when one clicks on the detailed view URL from the graph view, being developed for issue #28.
Should be configurable to show different platforms and metrics that are selected by the user for that specific view.
Ability to show the historic data for all previous weeks. This ability would help in finding the first build that showed a regression.
Ability to use one baseline build with different test builds of the same platform, even though that baseline build may not have been interleaved (More details about interleaving here: adoptium/aqa-tests#850 & #24) with any of those test builds. Let's say, you want to have 2 table views: one for comparing OpenJDK8-OpenJ9 GA vs OpenJDK8-Hotspot Latest and another for OpenJDK8-OpenJ9 GA vs OpenJDK8-OpenJ9 Latest. So we have 2 views, both of which use the same baseline OpenJDK8-OpenJ9 GA. While running these 3 builds, we could have interleaved the baseline build with one of the two test builds (i.e. OpenJDK8-OpenJ9), so we wouldn't want to run the baseline again with the second test build (OpenJDK8-Hotspot) since the baseline would essentially give the same score, a move that would save significant machine time.

Advance Requirements for Tabular Views

Ability to show "Best So Far" build from all the data (To be included in Graph Timeline view as well)
Ability to monitor specific cell
Ability to show the difference between current and previous week for all cells
Ability to show only the results cells that have changed since last week
Ability to check and uncheck a specific cell to monitor for possible regression
Ability to link a GitHub issue to one or more cells

Assigned Contributors

My team would work on adding this functionality.

Handle document size > 16M

Max document size is 16M in MongoDB https://docs.mongodb.com/manual/reference/limits/

Some of the builds produce very large output (~57M) for 4 tests:
https://ci.adoptopenjdk.net/view/all/job/openjdk11_hs_externaltest_x86-64_linux/169/console

Even TRSS only stores each test output per document, it is still too large to insert.

2:12:56 PM - debug: update newData url=https://ci.adoptopenjdk.net, buildNameStr=openjdk11_hs_externaltest_x86-64_linux, buildNum=169, _bsontype=ObjectID, 0=92, 1=139, 2=226, 3=78, 4=160, 5=198, 6=233, 7=26, 8=158, 9=73, 10=192, 11=212, type=Test, status=Done, timestamp=1552592655173, buildUrl=https://ci.adoptopenjdk.net/job/openjdk11_hs_externaltest_x86-64_linux/169/, buildDuration=46983740, buildResult=FAILURE, parserType=Test, machine=Jenkins, total=0, executed=0, passed=0, failed=0, skipped=0, startBy=upstream project "build-scripts/jobs/jdk11u/jdk11u-linux-x64-hotspot" build number 141, artifactory=null
2:12:56 PM - error: Exception in BuildProcessor:  message=document is larger than the maximum size 16777216, name=MongoError, stack=MongoError: document is larger than the maximum size 16777216
    at Function.MongoError.create (/Users/lanxia/workspace/ttss/openjdk-test-tools/TestResultSummaryService/node_modules/mongodb-core/lib/error.js:31:11)
    at toError (/Users/lanxia/workspace/ttss/openjdk-test-tools/TestResultSummaryService/node_modules/mongodb/lib/utils.js:139:22)
    at addToOperationsList (/Users/lanxia/workspace/ttss/openjdk-test-tools/TestResultSummaryService/node_modules/mongodb/lib/bulk/unordered.js:154:51)
    at UnorderedBulkOperation.raw (/Users/lanxia/workspace/ttss/openjdk-test-tools/TestResultSummaryService/node_modules/mongodb/lib/bulk/unordered.js:387:7)
    at bulkWrite (/Users/lanxia/workspace/ttss/openjdk-test-tools/TestResultSummaryService/node_modules/mongodb/lib/collection.js:646:12)
    at /Users/lanxia/workspace/ttss/openjdk-test-tools/TestResultSummaryService/node_modules/mongodb/lib/collection.js:540:5
    at new Promise (<anonymous>)
    at Collection.insertMany (/Users/lanxia/workspace/ttss/openjdk-test-tools/TestResultSummaryService/node_modules/mongodb/lib/collection.js:539:10)
    at Collection.insert (/Users/lanxia/workspace/ttss/openjdk-test-tools/TestResultSummaryService/node_modules/mongodb/lib/collection.js:835:15)
    at OutputDB.populateDB (/Users/lanxia/workspace/ttss/openjdk-test-tools/TestResultSummaryService/Database.js:16:25)
    at Promise.all.tests.map (/Users/lanxia/workspace/ttss/openjdk-test-tools/TestResultSummaryService/DataManager.js:101:51)
    at Array.map (<anonymous>)
    at DataManager.updateBuildWithOutput (/Users/lanxia/workspace/ttss/openjdk-test-tools/TestResultSummaryService/DataManager.js:98:55)
    at process.internalTickCallback (internal/process/next_tick.js:77:7), driver=true

We are actively working on reducing the test output and splitting the large builds into smaller ones: adoptium/aqa-tests#834

I think TRSS should handle this case gracefully. For now, we should store last ~12M output (leave some space for other data) .

Parsing issue with latest build

TRSS cannot display the Test build page. I suspect this is related to test output change. We need to check the parser code and make sure the console output can parse and display properly.

Update Benchmark Versions & Add Profiling Parameters to PerfNext Launcher

Problem Description

Currently, all benchmark configs are using older version of benchmarks. We should update them to these latest benchmarks.

	Current Version	Version to Use
Liberty	19.0.0.2	19.0.0.9
ODM	8.8.1	8.10.0
SPECjbb2015	SPECjbb2015GMR	specjbb2015v101_jaxb24

Also, we recently extended the support for profiling tools to ODM from Liberty. In future, this profiling ability would be extended further to other benchmarks. As part of that work, we also added the capability in our scripts to use the profiling parameters passed by the user or to set them to default if nothing's specified.

These profiling parameters should be added to the relevant benchmark configs so that user can tune these params according to their needs.

We should add all these parameters mentioned below and anything else that I might have missed:

PROFILING_TOOL
PROFILING_JAVA_OPTIONS
PROFILING_PROFILE_TIME
PROFILING_SLEEP_TIME
PERF_SAMPLING_PERIOD
PERF_EVENT

Proposed Work

Update benchmark versions
Add any new or missing profiling parameters to PerfNext Launcher

Selected jobs are the ones that we care about:

https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b3510949c0d6f24650f48d393769ec5d10c35c2d/PerfNext/config/benchmarks/data_simple/Liberty.xml#L3
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b3510949c0d6f24650f48d393769ec5d10c35c2d/PerfNext/config/benchmarks/data_simple/Liberty.xml#L239
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b3510949c0d6f24650f48d393769ec5d10c35c2d/PerfNext/config/benchmarks/data_simple/ODM.xml#L3
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b3510949c0d6f24650f48d393769ec5d10c35c2d/PerfNext/config/benchmarks/data_simple/ODM.xml#L27

After updating the versions and adding new profiling options, we should test our changes with both ODM and Liberty for all the supported profiling tools for each benchmark. We should make sure to test one (i.e. DT7, any ODM) of the 2 jobs from each benchmark suite extensively (i.e. Run all the 7 currently supported profiling tools as mentioned below). For the other jobs (i.e. DT3, other ODM job), we could do some sanity testing to make sure things work.

Default Testing: Parameters (Avoid setting profiling parameters unless required. You might just need to set PROFILING_TOOL. Leave the non-profiling options as default.)

Non-default Testing: Explicitly set all profiling parameters to some value similar to their default but not the same as default value. For example, if the default value for PROFILING_SLEEP_TIME is 60, test it with 30 or something.

Currently Supported Profiling Tools:

jprof tprof
jprof scs
jprof callflow
jprof calltree
jprof rtarcf
perf stat
perf record

Increase the TRSS Server efficiency by running multiple node in parallel

With the current code, we are using a single node when running TRSS server. As we monitor more and more projects, the TRSS server can be very busy to process the Jenkins outputs. As a result, TRSS server may not be able to response the client requests in a timely manner.

With the current structure, we can easily break into two node servers:

frontend - for responding to client requests (mainly for querying database)
backend - for querying Jenkins and inserting/updating database.

We do not want to use fork() here because it will complicate the logic and add lots of if conditions. And these two tasks are very different.

Once two node servers are created, we can further improve the efficiency by leveraging multiple processors on the machine. We can use fork() to create multiple backend workers and we can process Jenkins jobs in parallel.

Note: a flag is needed to keep the worker id and to keep track of which job is processing by which backend worker. Also, we should set a timeout after which we assume the worker is dead and we can restart the worker.

In summary, there are the following steps:

Break node server into two: frontend and backend
configure forever service on the server
create multiple backend workers
update readme

Support build deletion

In order to avoid to have too much data in database, we need to support build deletion. Users need to provide # Builds to Keep when they add new build into the build monitor list.

Delete button in build monitor list will not delete history builds. It will only remove the build from the build monitor list.

Before we insert any build into db, the program will check number of builds in database. If the number > # Builds to Keep, older builds will be deleted. Otherwise, skip deletion.

Ability to Interleave Performance Runs for Baseline & Test Builds

Background About Interleaved Runs

While doing the benchmarking comparison between different builds, it's crucial to interleave runs for both baseline and test builds in order to get the most consistent and reliable results.

There are various machine factors that could affect the numbers between multiple iterations, even though they might be running with the same configs. Hence, interleaving runs helps in avoiding those issues and makes sure that the same factor would affect both baseline and test build runs. In order to keep this issue short, I won't get into the benefits and scenarios of interleaved runs.

If T = Test Build; B = Baseline Build, # = Iteration

Interleaving Run Pattern:
Do alternate iteration of each baseline and test build in a ping-pong fashion.
T1, B1, T2, B2, T3, B3

Non-interleaved Run Pattern:
Do all iterations for one build and then do all iterations for another.
T1, T2, T3, B1, B2, B3

Related Issue for Openjdk-tests Framework

adoptium/aqa-tests#850

Background About PerfNext

Currently, PerfNext does not have the capability of launching interleaved runs. It uses non-interleaved runs and launches all iterations under one Jenkins job by using a loop for iterations as shown below:

iteration=0 
while [ "$iteration" -lt 3 ]
do 
echo "Start of iteration $iteration" 
echo ""
echo "********** START OF NEW TESTCI BENCHMARK JOB **********"
echo "Benchmark Name: LibertyStartupDT Benchmark Variant: 17dev-4way-0-256-qs"
echo "Benchmark Product: pxa6480sr6-20190123_02"
echo ""
# Export benchmark vars
export JDK_OPTIONS="-Xmx256m"

## HW Specific Environment Vars ##
# Export HW vars
bash ./bin/sufp_benchmark.sh 
echo "End of iteration $iteration" 
iteration=$((iteration+1)) 
done

Background About TRS

Once the Jenkins build is done, TestResultSummaryService (TRSS) stores the raw Jenkins build output, which has the output for all benchmark iterations for a build, in MongoDB. Then, TRSS parses the data (i.e. /openjdk-test-tools/TestResultSummaryService/parsers/BenchmarkParser.js) for each iteration by using regex for the string: echo "********** START OF NEW TESTCI BENCHMARK JOB **********", which is printed in Jenkins output for each iteration, and stores the results in another collection in MongoDB. These parsed numbers for various benchmark metrics are used to calculate various important numbers such as average, confidence interval, min, max and median for each build, enabling test-result-summary-client (TRSC) to display performance charts and Perf Compare to compare 2 builds.

Proposed Features for PerfNext:

When baseline is not checked, launch the test build using a parent pipeline build that launches child builds for each iteration.
For example, PerfNext -> Jenkins Parent Pipeline -> Launches Child Parent Pipelines for test build with the sequence: T1, T2, T3
When baseline is checked, launch both test and baseline builds using a parent pipeline build that launches child builds for alternate iteration of each baseline and test build in a ping-pong fashion.
For example, PerfNext -> Jenkins Parent Pipeline -> Launches Child Parent Pipelines with the sequence: T1, B1, T2, B2, T3, B3
Output parent pipeline build URL to user

Proposed Features for TRS:

Redesign the parser to use the parent pipeline build URL to browse through each child pipeline build and parse data for each iteration

Assigned Contributors

Members from my team, Awsaf (@pinicman) and Sophia (@sophiaxu0424) will be starting to work on the PerfNext and TRS features respectively.

Enable PerfNext To Use Openjdk-tests Framework

Background

Currently, PerfNext uses its own pipeline to launch benchmark jobs, a process requiring several setup steps such as downloading and setting up benchmark and SDK packages.

Proposal

PerfNext should use the Openjdk-tests Framework (https://github.com/AdoptOpenJDK/openjdk-tests) by using its pipeline scripts, which have several functionalities including the ones that PerfNext needs. This move would streamline the execution of performance tests into the CI pipelines that are used for other system and functional tests.

Currently, PerfNext has an API called /api/benchengine/submit (aka BenchEngine) that generates the necessary scripts and submits the request to Jenkins.
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/b528a63ea146080f4bb1430740a734c2af87df7e/PerfNext/app/apis/BenchEngine/parser.js#L42-L54

As shown above, /api/benchengine/submit API currently sends 2 scripts to Jenkins Pipeline:
setupScript: It does all the setup tasks such as downloading benchmark and SDK packages.
benchmarkScript: Exports all the necessary environment variables and runs the main benchmark script.

We could get rid of the setupScript since Openjdk-tests Framework can already do that. We would still need the ability to generate benchmarkScript so that developers are able to change default configs and PerfNext can pass the custom benchmark script to Openjdk-tests Framework to run.

Details

I'll be adding more details to this issue soon.

Show Aggregated Perf Results in Perf Graph View

Problem Description

With #73, we've added the ability to aggregate the perf results from multiple iterations. We should update Perf Graph View to use and display aggregated perf results in order to comply with the new design.

Proposed Changes

Besides the info that's already displayed, show all aggregated data for each build:

max
min
median
stddev
CI
iteration

Assigned Contributors

I'll be working with Sophia (@sophiaxu0424) from my team to add this feature.

Delay Generation of HW Specific Variables

Problem Description

Currently, we fetch the latest machine data file every time PerfNext is loaded. If some benchmark is selected and the machine data hasn't been fetched yet—fetching that can take a few seconds—then we can get an error while generating HW environment variables.

Error

parser.js: Entering generateHWENV()
/Users/piyush/Work/Git/openjdk-test-tools/PerfNext/app/apis/BenchEngine/parser.js:453
	    var envVar = HW_ENV[property].$.name;
	                                    ^

TypeError: Cannot read property 'name' of undefined
    at generateHWENV (/Users/piyush/Work/Git/openjdk-test-tools/PerfNext/app/apis/BenchEngine/parser.js:453:38)
    at /Users/piyush/Work/Git/openjdk-test-tools/PerfNext/app/apis/BenchEngine/parser.js:393:23
    at /Users/piyush/Work/Git/openjdk-test-tools/PerfNext/app/apis/BenchEngine/parser.js:423:9
    at FSReqWrap.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:53:3)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] start: `node app.js`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /Users/piyush/.npm/_logs/2019-04-01T23_40_41_512Z-debug.log
Piyushs-MacBook-Pro:PerfNext piyush$

Related Snippets

Frontend: Call to Backend for Getting Machine List
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/524d16e3784c17f4af6cee75f9105eb929397792/PerfNext/public/lib/js/benchmarks.js#L34-L40

Backend: Get machine list
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/524d16e3784c17f4af6cee75f9105eb929397792/PerfNext/app/apis/machines.js#L7-L24

Snippet to Generate HW Specific Variables
https://github.com/AdoptOpenJDK/openjdk-test-tools/blob/524d16e3784c17f4af6cee75f9105eb929397792/PerfNext/app/apis/BenchEngine/parser.js#L448-L460

Proposed Changes

We need to find a workaround to prevent this issue.

adoptium / aqa-test-tools Goto Github PK

aqa-test-tools's People

Contributors

Stargazers

Watchers

Forkers

aqa-test-tools's Issues

Problem Description

Benchmarks Parsers to Add

Steps for Adding a Parser

To Add a Perf Graph

Problem Description

Proposed Changes

Assigned Contributors

Relevant Code Snippet

Helpful Links

Workaround

Problem Description

Proposed Changes

Problem Description

Proposed Changes

Some Options for Storing HW Specific Benchmark Variables

1) Use Environment Variables for Jenkins Node

2) Use some existing Git Repo

Problem Description

Proposed Solution

Task 1:

Task 2:

Assigned Contributors

Temporary Branch for the Development of these Features:

Blurb (Short overview of the Proposal):

Please describe the business problem your customers (e.g. external clients, internal team, etc.) are experiencing OR the improvement/opportunity that could be brought to them.

What is the key issue (customer pain) or benefit that motivates the need for this project?

How might the results of the project be used after the Challenge?

What are the key technical and business goals?

What specialized skills might be beneficial for the project?

Problem Description

Proposed Changes

Data to Display:

Assigned Contributors

Problem Description

Benchmark Parsers to Add

Proposed Changes

Assigned Contributors

Problem Description

Proposed Solution

Assigned Contributors

Problem Description

Proposed Changes

Assigned Contributors

Problem Description

Proposed Changes

Terminology

Some references that should use "project" instead of "build":

TRSS

TRSC

MongoDB

Problem Description

Proposed Solution

Assigned Contributors

Problem Description

Proposed Changes

Assigned Contributors

Background About Benchmarking

Details about the Proposed Feature

Example 1:

Example 2

Requirements for Tabular Views

Advance Requirements for Tabular Views

Assigned Contributors

Problem Description

Proposed Work

Background About Interleaved Runs

Related Issue for Openjdk-tests Framework

Background About PerfNext

Background About TRS

Proposed Features for PerfNext:

Proposed Features for TRS:

Assigned Contributors

Background