samasri / omr Goto Github PK

This project forked from eclipse/omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes

License: Other

CMake 0.63% Makefile 1.32% HTML 7.16% C++ 78.94% C 9.25% Assembly 1.17% SourcePawn 0.10% TeX 0.87% M4 0.07% Lex 0.01% Yacc 0.01% Shell 0.06% Objective-C 0.05% PowerShell 0.01% Ruby 0.01% Python 0.22% JavaScript 0.02% CSS 0.01% Perl 0.08% sed 0.01%

omr's People

Contributors

Watchers

Forkers

snadi ualberta-smr

omr's Issues

Automate the creation of the database using a python file

In order to use Python to run the SQL queries, an extra package needs to be installed. In addition, we will need to add extra configuration for the user; that is: providing the username and password of the database. Hence, for now I will keep all the SQL queries in one file and the user would have to run these queries in their database.

Make (Namespace, ClassName) Unique in Class Table

In the Class table, make the couple: Namespace, Classname unique in order to make sure no duplicate records are found

Implicit function declarations are not separated from other declarations

std still appearing in the output sometimes

When running the tool only on CodeGenerator.cpp, std classes where blocked from the hierarchy. However, when running on all source files in amd64 and i386, std classes still appear.

Adapt project to a newer Clang version

After revising the recently documented bug, I found out that it was fixed in newer versions of clang. Hence, I am planning to use a new clang version to OMRStatistics.

Class name disappears when there is no namespace

When a class that is not inside a namespace is processed, an error happens.

Overloads should not have methods that are not overloaded

Find an example class hierarchy to virtualize

The objective is to try virtualizing a class hierarchy that has a minimal number of overrides, even though we still do not have accurate data about the call sites. The plan is as follows:

Suggest 3 hierarchies to Xiaoli and Robert which has the least amount of overrides between their classes.
Xiaoli and Robert pick one of these 3 hierarchies that is least critical in OMR in order to convince developers that it is a good idea to try to virtualize it to test out the performance advantage.
Samer virtualizes all functions in the picked hierarchy
CI tests are performed
Developer tests are performed to make sure no parts of the OMR library got broken by our changes
Performance tests are performed to measure impact on performance done by this virtualization

Hierarchy VIsualization - Not the proper UML format

I need to make the following edits:

Make boxes go from bottom to top
Change borders to rectangles instead of circles
Make arrows similar to the inheritance arrows in UML figures

Adapt the schema and database to the conventions.

Right now the database has only primary keys, hence no foreign keys for instance. Also, the schema has no relations between its tables. We need to fix that

Tool does not handle cases with classes having multiple parents

Overloads Visualization - Make clickable link the first occurence signature of the function

Virtualizing CodeGenerator fails

Tasks

Remove OMR_API from PRs
Try to build it locally and fix issues by creating default implementations

Description of problem

The PR to virtualize CodeGenerator is failing. The main problem seems to be from virtualizing generateSwitchToInterpreterPrePrologue(TR::Instruction*, uint8_t, uint8_t).
Error message:

../../libtestjit_base.a(OMRCodeGenerator.o):(.data.rel.ro._ZTVN3OMR3X865AMD6413CodeGeneratorE[_ZTVN3OMR3X865AMD6413CodeGeneratorE]+0x1d0): undefined reference to `OMR::X86::CodeGenerator::generateSwitchToInterpreterPrePrologue(TR::Instruction*, unsigned char, unsigned char)'

This function is declared in OMR::X86::CodeGenerator however it is not defined in the OMR project. It is, however, defined in J9::X86::CodeGenerator. Hence, when compiling the OMR project independently of OpenJ9 (as in Travis CI), the linker would not be able to find a definition for the virtual function which causes an error.

I tried adding = 0 to the end of the functions declaration making it pure virtual; since pure virtual functions are allowed not to have a declaration which caused another error:

../../compiler/compile/OMRCompilation.cpp:188:56: error: invalid new-expression of abstract class type ‘TR::CodeGenerator’
    return new (comp->trHeapMemory()) TR::CodeGenerator();

Virtualize CodeGenerator

This issue is to track the progress while virtualizing CodeGenerator. Since this virtualization would need a change in OMR and OpenJ9 projects, a related issue has been opened in samasri/openj9#1. Also, the setup section explains how to build openj9 locally. After that, I will be following the virtualization steps I documented in my wiki earlier.

Setup

In order to start virtualizing the functions, I have to build openJ9 and rebuild it after every change I do; I built it initially by doing the following:

Copy the DockerFile that is specific for Linux host machines.
Navigate to the directory where you want to build and run: docker build -t openj9 -f Dockerfile .
docker run -it -v [HOST_PATH]:/root/openj9 openj9
- Replace HOST_PATH with the absolute path of the current directory you are in and add /docker. This command starts a docker container from the docker image we built in the previous command and mount ./docker to a directory inside the container; so that we can extract the built openj9 files later.
Now you are supposed to be inside the docker image, run: git clone https://github.com/ibmruntimes/openj9-openjdk-jdk8 ./openj9
cd openj9
bash ./get_source.sh
bash ./configure --with-freemarker-jar=/root/freemarker.jar
make all
exit

Now you have the built openj9 files in ./docker/build/, you can edit the files as you wish and then rebuild by executing the following 2 commands:

Same command as in step 3 (execute the command from the same directory you executed step 3 also)
From inside the docker image: make all

Step 0: Add OMR_API

In 1060174, I added OMR_API as an annotation

Step 1: Removing OMR_EXTENSIBLE

I inline-commented out OMR_EXTENSIBLE from the following places:
- OMR::CodeGenerator
- OMR::X86::CodeGenerator
- OMR::X86::AMD64::CodeGenerator
- OMR::X86::I386::CodeGenerator
- OMR::P::CodeGenerator
- OMR::Z::CodeGenerator
- J9::CodeGenerator
- J9::X86::CodeGenerator
- J9::X86::AMD64::CodeGenerator
- J9::X86::I386::CodeGenerator
- J9::P::CodeGenerator
- J9::Z::CodeGenerator
- TR::CodeGenerator

Getting the list of functions to virtualize

I executed this query on the database:

SELECT bc.namespace as 'BaseNamespace', bc.className as 'BaseClassName', bf.signature as 'Signature', oc.namespace as 'OverridingNamespace', oc.className as 'OverridingClassName'
FROM Override as o
INNER JOIN Function as bf on o.baseFunctionID = bf.id
INNER JOIN Function as of on o.overridingFunctionID = of.id
INNER JOIN Class as bc on bf.classID = bc.id
INNER JOIN Class as oc on of.classID = oc.id
WHERE bc.className='CodeGenerator';

I outputted the result on a file and processed it with a python script such that I only get the unique signatures of functions; since some functions might be repeated because in the override table the following two overrides (although they have the same signature), are different records:

Base Namespace	Base ClassName	Signature	Overriding Namespace	Overriding ClassName
OMR	CodeGenerator	CodeGenerator()	OMR::X86	CodeGenerator
OMR::86	CodeGenerator	CodeGenerator()	OMR::X86::AMD64	CodeGenerator

Functions to virtualize

The signatures of CodeGenerator functions that are overridden in openJ9 are found below; I will be ticking them out after virtualizing them, rebuilding the project to make sure it is still working, and committing them to my openJ9 fork and omr fork.

PS: I am assuming we should not virtualize the constructors

Plugin is giving segfault when class is provided with no hierarchy

Run OMR on OpenJ9

Use the instructions to build OpenJ9 in Docker.
- PS: If make failed: execute git checkout f66506e7a528340d176260c3af47c63cf66979ec inside openj9-openjdk-jdk9/openj9 and try again
Use Nazim's instructions of running BruteClang on OpenJ9 to get the OpenJ9 source file list that is supposed to be processed by OMRStatistics
Automate the process of getting this source file list by editing the OMRStatistics build system to automatically get the source file list from the OpenJ9 build system
Get list of includes and rest of configuration used in OMRChecker to use when running OMRStatistics
- Find all the required variables to be defined for OMRChecker to run independently from the whole build system
- Define these variables in the build system of OMRStatistics
- Run it on all archs
Fix errors

Use CallGraph to track function calls

Following up from #46, the plan is to use CallGraph to track function calls instead of finding CXXMemberCallExpr in function bodies.

Some classes show that they have parent classes, but the parent classes would have a null pointer

Classes visited when processing call functions are different than the ones visited in VisitCXXRecord

How the Function Table is Populated

The Function table is filled from information obtained from VisitCXXRecord. In VisitCXXRecord, classes are filtered by the following if statements before being processed by recordParents(std::string, std::string) (signature is specified since there are two recordParents functions):

if(!decl || !decl->isClass() || !decl->hasDefinition()) return true;
if(HMConsumer::shouldIgnoreClassName(decl)) continue;

The way class declarations are being retrieved is by either one of the two ways:

VisitCXXRecord function is being called by RecursiveASTVisitor and hence the decl is provided from that class.
We traverse the bases of a declaration provided by the VisitCXXRecord which we can get in the form of QualType. Try to convert them to CXXRecordDecl. If the last was successful (not null), it means this points to a parent class decl. This method excludes some (but not all) template classes (as described in #18). At this stage, the function declarations are coming from the AST. Hence, in case of template classes, the signature of functions depending on generic types still have generic parameters.

How the FunctionCall Table is Populated

When processing call functions (in processCallExpressions function), the class declaration is coming from the CXXMemberCallExpr which processes all templates. At this stage, generic types of templates are already resolved. Hence, signatures of functions depending on the generic classes will have concrete class types since the receiver is an instance of a template with its generic types already resolved.

The problem

Consider the following hypothetical function:

template <class T> void setValue(T value);
setValue<int>(5);

In the Function table, we will have the signature of that function as follows: void setValue(T).
However, in the FunctionCall table, we will have the signature as follows: void setValue(int)
Hence, the same function will have different signatures in FuncationCalls based how it is instantiated. Therefore, there are functions in FunctionCall that cannot be linked to FunctionIDs in Function since it is not possible to match their signatures with the Function table data.

Test5 is not passing

The tool does not report any overloaded functions

Tool is not guaranteed to be reading all the classes in the source code

Summary: non-class structures (like structs and unions) and some template classes are not processed

In VisitCXXRecord function, we ignore all CXX records that are not classes (hence, unions and structs are ignored).
In addition, when trying to iterate through parents of a CXXRecordDecl, we use an iterator that iterates from bases_begin() to bases_end(). The iterator points to a QualType, which is a more specific Type. The Type can sometimes be converted to a CXXRecordDecl. OMRStatistics, inspired from OMRChecker code, ignores all Type objects that are not converted to CXXRecordDecl objects. However, after some tinkering with these ignored Type objects, we realize that they might be useful.

Override Visualizations do not work on Google Chrome

Override Visualizations are implemented using JavaScript; the script uses AJAX to read a local file, then processes it one record at a time to create the visualization on a web page. Unfortunately, using AJAX on local file is not allowed on Google Chrome. More information and possible solutions are found on this link

Add Function IDs back to the Database

Add UNIQUE constraint on Function(Signature and ClassID)
Record all functions in the Function table
Re-add ID to functions
Change Override table to have only 2 columns: BaseFunctionID, OverridingFunctionID

Find example to Support Matthew's Theory in the SPLC Paper

Edit the plugin to find function calls to support the issue #16 in SPLC paper

Some hierarchies are being printed with extra '-->' at the end

When fixing issue #42, some hierarchies are being printed with an extra '-->' now. Need to find the reason and fix it.

Some hierarchies do not show in the output

Example:
This hierarchy exists: TR::AutomaticSymbol --> OMR::AutomaticSymbol --> TR::RegisterMappedSymbol --> OMR::RegisterMappedSymbol --> TR::Symbol --> OMR::Symbol
Another hierarchy that exists is: TR::ParameterSymbol --> OMR::ParameterSymbol --> TR::RegisterMappedSymbol --> OMR::RegisterMappedSymbol --> TR::Symbol --> OMR::Symbol

The second hierarchy does not show in the output.

This problem can be generalized by saying all cases where 2 hierarchies have different bases but same top (merge together at some point), one of them does not show in the output

Not all function calls are visited

The way the plugin works

The way the tool extracts the function calls is by visiting every class, and processing the body of each function in the classes. If the body contains any CXXMemberCallExpr statement, the statement is processed and function call information are extracted from it. Other statements are ignored.

Problem

If the statement is an assignment and it contains CXXMemberCallExpr on its right hand side, the function is not processed since the main statement is not of type CXXMemberCallExpr. Fixing this specific case is simple, however a CXXMemberCallExpr can be inside any type of statements, and hence it is not efficient to handle it case by case.

Examples

self->a() is processed by the plugin
int a = self->a() is not processed by the plugin
if(cond) int a = self->a() is not processed by the plugin (and is an example of a CXXMemberCallExpr statement nested in other statements

Keep track of whether a hierarchy is extensible or not

Edit the output such that the .hierarchy file would have a field specifying if a hierarchy is extensible or not

Merging Databases between OpenJ9 and OMR

When updating both databases together, I got some conflicts. This issue is to track how I am solving them.

Conflicts

For example when adding queries from one table in OMR and the same table in OpenJ9, I would get different queries having the same IDs, since OMRStatistics is operating independently in each projects and starts the ID counter from 0 in both projects; so I end up having ID#1 linked to one record in OMR and for another project ID#1 would be linked to another record in OpenJ9. That was a general problem between most tables (File table, Class table, Function table, and Hierarchy table).
Other conflicts are reported below:

When merging the File table: No other conflicts
When merging the Class table: (namespace, className) were being repeated between both projects, so I had to skip the records with duplicate (namespace, className). However, the records I am dropping are being referenced by records in the Function table. So I had to keep track of the class IDs I am dropping as duplicates in order to replace them with the right IDs when being referenced in the Function table.
When merging the Function table: After replacing the duplicate Class IDs with the original ones, I got duplicate records that I removed. I also had to keep track of omitted function IDs, since they are being referenced in the Override table.
When merging the Override table: Same as Function table processing, I replaced function IDs with their original ones based on my tracking of removed function IDs in the Function table. After that I removed duplicate records.

Example

Consider the following example (from the OMR/OpenJ9 source code)

In Class table (problem 1)

In OMR

ID	ClassName	Namespace	isExtensible
469	OMR	Linkage	1
1345	TR	S390zOSSystemLinkage	0

In OpenJ9

ID	ClassName	Namespace	isExtensible
1946	OMR	Linkage	1
2942	TR	S390zOSSystemLinkage	0

Here we have 2 examples of classes; each having 2 different IDs depending on if its being processed in OMR or OpenJ9. When merging both databases together, we will get an error that (className, Namespace) are being repeated.

In Function table (problem 2)

In OMR

ID	Name	Signature	ClassID	isVirtual	isImplicit	FileID
13000	isSpecialNonVolatileArgumentRegister	isSpecialNonVolatileArgumentRegister(int8_t)	469	1	0	1216
29676	isSpecialNonVolatileArgumentRegister	isSpecialNonVolatileArgumentRegister(int8_t)	1345	1	0	1975

In OpenJ9

ID	Name	Signature	ClassID	isVirtual	isImplicit	FileID
46407	isSpecialNonVolatileArgumentRegister	isSpecialNonVolatileArgumentRegister(int8_t)	1946	1	0	59391
69010	isSpecialNonVolatileArgumentRegister	isSpecialNonVolatileArgumentRegister(int8_t)	2942	1	0	60138

Here, we have 2 examples of functions referencing the same class technically, however they are referencing different class IDs. The first function (first record in each table) is referencing the class OMR::Linkage, however since the last has different IDs in each project (shown in the description of problem 1), we are getting this error. Same for the second function (second record in each table) when referencing TR::S390zOSSystemLinkage.

In Override table (problem 3)

OMR

BaseClassID	OverridingFunctionID
13000	29676

OpenJ9

BaseClassID	OverridingFunctionID
46407	69010

Here, both records are implying that OMR::Linkage::isSpecialNonVolatileArgumentRegister(int8_t) is being overloaded by TR::S390zOSSystemLinkage::isSpecialNonVolatileArgumentRegister(int8_t). However, due to the problems above, we are having 2 records.

Task Summary

Merge SQL databases with a python script
- Errors are happening: for example classes are repeated between the omr and openj9 database.
- Merge File table
- Merge Class table
- Merge Function table
- Merge Override table

Update database to have more information about the hierarchies

Edit the Polymorphism table to include a hierarchyID
Add a table that links between each hierarchyID and its base class

After these additions, the database would have all the information needed to reproduce the hierarchies from it.

allFunctions output has duplicate entries

Some functions have 2 entries in allFuctions, the only difference between the entries is the value of isFirstOccurence or isVirtual
Examples from allFunctions:

FunctionName; FunctionSignature; IsFirstOccurence; Namespace; ClassName; isImplicit; isVirtual

operator delete;operator delete(void *,TR::Region &);0;TR;VPConstraint;0;0
operator delete;operator delete(void *,TR::Region &);1;TR;VPConstraint;0;0

rexBits, rexBits(); 1; OMR::X86; Instruction; 0; 1
rexBits, rexBits(); 1; OMR::X86; Instruction; 0; 0

The tool gives a segmentation fault on source code of this fork

In order to try the tool I have to build OMR from its source code. Hence, I have a seperate copy of the built OMR in my workspace, different than the source code I use to commit here. Apparently, the built OMR was from an earlier version of the source base so when I recently tried to clone this repo, build it and run the tool I go seg faults.
The seg fault seems to happen when calling collectMethodInfo() in HandleTranslationUnit(). Surprisingly, the code works if I put a print statement before and after collectMethodInfo().

Namespaces and Class Names in allClasses and hierarchy outputs are generated using different methods, hence the class names do not match

In hierarchy output, the namespaces and class names are printed from the classHierarchy map, which is filled in HMRecorder::recordParents(). The last extracts the qualified class name from the visited CXXRecordDecl which naturally consists of the namespace and the class name.
In allClasses output, namespaces and class names are printed from the isExtensible , which is filled in HMRecorder::recordParents(), however they are then passed to HMConsumer::shouldIgnoreClassName() which seperates the namespace and the classname.

That function extracts classes like TR_X to have namespace: TR and className: X.
Right now, the python file output/getDatabaseSQL.py does the same conventions as in shouldIgnoreClassName() in order to have consistent outputs (an being able to get classIDs from them), however this should be fixed.

Change list of classes that _OMRStatistics_ runs on

Right now, I'm copying all the class lists from the list used for OMRChecker and changing the variable names for it to work, however this is not a good idea because when changing the version of OMR where the OMRStatistics is put in, the class list is not changed dynamically. Hence, I will need to connect the tool to the list of classes found in omr/fvtest/compilertest/build/files/

Organize project outputs

Move the python script that generates the database to /omrstatistics/database/ directory (instead of /omrstatistics/output/
Edit Makefile so that inside the /omrstatistics/output/_ directory, all architecture-related outputs are collected in one directory

Output CSV files for overloads and overrides should be delimeted by ";"

since some method signatures have commas

When recording methods, parameter names are taken and not their type

for example, if a function declaration such as the following signature is being processed:

f(int a, std::string b)

the tool will record f(a, b) whereas it should be recording f(int, std::string) instead.

Tool should differentiate between virtual and non-virtual functions

Implementation: an isVirtual field will be added in the MethodTracker, and it will be a mandatory parameter in the constructor.

Printing: an isVirtual field will be added to the output csv columns. If the first occurrence of a function was virtual, all records of the function in overrides and overloads outputs will have the isVirtual field containing true

Visualization: color code will be added to virtual functions. For now, implicit functions will have a priority in coloring over virtual function. In other words, if a function is both, implicit and virtual, it will have the color of an implicit function.

Functions that only appear once in the code still appear in the overloads output

Add outputs in order to create database and deal with anomalies

I will be editing the tool in order for it to output the following extra files:
weirdHierarchies: Hierarchies that have some of their classes extensible and the others are not.
allFunctions: csv file that has all functions
allClasses: csv file that has all classes
the allFunctions and allClasses outputs are needed since the csv files we have only show overloaded/overriden functions; they also only show classes that have parents (in the hierarchy). For the completeness of the database, we will need all functions and classes in the source code.

Get number of overrides in each hierarchy

We need this information to write the paper. So I will be editing the hierarchy and adding this information as a field of the hierarchy CSV records. Hence, the hierarchy CSV will become: NumberOfOverrides;isExtensible;Hierarchy

When having subnamespaces, the output shows only first part in the namespace section

for example OMR::X86::CodeGenerator would result:
Namespace: OMR
Class: X86::CodeGenerator
This is because the way namespace is taken is by separating by the first "::" found.

Change primary key in Overrides table

At the moment, the PR in Overrides is all of the 3 columns: FunctionSig, BaseClassID, and OverridingClassID.
The plan is to change the PR to FirstClassID, FunctionSig: the pair would be a foreign key to the pair in Function table: Signature, ClassID. This is one of the consequences of removing the id field from issue #32

Separate locations in FunctionCall table

Currently, the FunctionCall table contains 3 columns: CallerFunctionID, CalledFunctionID, CallSite.
In case of having macros, the CallSite column would have 2 locations: the actual location and the spelling location.

For example:
File1.cpp would have the following line: CREATE
File2.cpp would have the following line:#define CREATE void a() {b->a()}

In that case the first location in the CallSite column would be File1.cpp, and the spelling location would be File2.cpp. The total value in the column would be: File1.cpp <Spelling=File2.cpp>. Since, this format is hard to make use of when analyzing data in SQL, the plan is to create another column called SpellingLocation in FunctionCall that can be null if this is not the case.

IDs are not consistent

Suppose we have the following 2 hierarchies:
C --> B --> A
F --> E --> D
Each of these classes have a function a()
The following is the functions table:

ID	Signature	Class (first occurrence)
1	a()	A
2	a()	D

The overrides table would be the following:

FunctionID	BaseClassName	OverridingClassName
1	A	B
1	B	C
1	D	E
1	E	F

Although for the last 2 rows in the overriddes table, the ID should be 2

Duplicate Hierarchies in the output

Some times the output will give a hierarchy and duplicate part of it in another hierarchy. An example is as follows:

TR::CodeGenerator --> TestCompiler::CodeGenerator --> OMR::X86::AMD64::CodeGenerator --> OMR::X86::CodeGenerator --> OMR::CodeGenerator
OMR::X86::AMD64::CodeGenerator --> OMR::X86::CodeGenerator --> OMR::CodeGenerator

Not all classes are being printed in allFunctions

In order to print the output in allFunctions, classes in each hierarchy are traversed, and functions in each class are outputted. This method skips all the classes that have no parents or children and hence are not recorded in hierarchies
Solution:

Add each class with no parent to an empty hierarchy so that it can be processed with the hierarchies.
Add isSingle field to Hierarchy
When printing hierarchies, filter them so that single hierarchies are not printed

Clean the code from _isFirstOccurrence_ fields and adapt visualization

After the meeting on Wednesday March 7th, 2018, we decided to remove the Overloads table and hence the isFirstOccurrence tracking is of no use anymore. Although the table has been taken off the database and schema, but we still need to remove any trace of the firstOccurrence in the code.
For visualization, we just put the function name at the beginning instead of the whole signature

function signatures contain the keyword "class" in the parameter list

I find functions like: CodeGenerator(const class OMR::CodeGenerator &) but I cannot find such a signature with the code. This problem has to do with how clang read the functions and how I am forming the signature. Since I am getting the function name and the list of parameters from clang and then manually making the signature, which might have some flaw.

Create a FunctionCall table in the database to record all function calls

Record function calls from OMR and OpenJ9 namespaces and find a function in an extensible class with the lowest number of calls.
Take into consideration nested functions, that is: if a function a() is inside a function b(), if b() is called twice this means that a() is called twice also.
Create a table in the database for all call sites.

samasri / omr Goto Github PK

omr's People

Contributors

Watchers

Forkers

omr's Issues

Tasks

Description of problem

Setup

Step 0: Add OMR_API

Step 1: Removing OMR_EXTENSIBLE

Getting the list of functions to virtualize

Functions to virtualize

How the Function Table is Populated

How the FunctionCall Table is Populated

The problem

The way the plugin works

Problem

Examples

Conflicts

Example

In Class table (problem 1)

In OMR

In OpenJ9

In Function table (problem 2)

In OMR

In OpenJ9

In Override table (problem 3)

OMR

OpenJ9

Task Summary

Recommend Projects

Recommend Topics

Recommend Org