Comments (3)
Ok than, I will check if tests will be passed after this optimization and make pull request. Otherwise I'll tell that tests are not passed
Actually, i forget to notice couple of things about this optimization. Yeah, firstly it's better for cache. Secondly, when you make 'I' continuous index, it is easier to vectorize this nested loops. As far as I remember, the function spends more than 50% of the time in this nest of loops, so it is pretty hot place. Of course, it depends on architecture, level of optimization of other functions and order of input matrices. But in my practice, I saw that it's hot place. Actually, I have an opportunity to check performance on couple of architectures, I can share results here. Anyway, I think it is important to change the order of loops here
Thanks for your answer
from lapack.
These are three good points. (1) Replacing this (I,J) loop with a (J,I) loop should give better performance for column-major matrices. (2) Changing the loops (from (I,J) to (J,I)) might change the chosen pivot in case of a draw between two entries, and so might change the permutation. (3) These outputs (while different) are equally valid complete pivoting factorization P A Q = L U.
It is not clear how much performance gain there would be, if any. That being said, I feel that, whenever possible, in LAPACK, we want to write our loops with column major in mind and so, just for sake of consistency, I feel it is better to have (J,I) loop than (I,J) loop here. It would be nice to know if there is a practical gain in practice.
It is not clear how problematic a routine with (J,I) loop would be in the current software stack. For example, would the (J,I) loop variant pass our own LAPACK Test suite? But more generally would it be a problem for some applications who expect the (I,J) loop in case of a tie. I do not know.
My opinion: All in all, I would be fine with reversing the loops from (I,J) - current, to (J,I) - proposed. If it passes the LAPACK test suite, then I think that should be fine and we could merge this.
from lapack.
fixed with #1023
from lapack.
Related Issues (20)
- Overwriting `CMAKE_INSTALL_RPATH` in CMakeLists.txt HOT 1
- vectorization of lapack routines. HOT 7
- Allow installing binaries to subdirectory while keeping CMake package scripts in the same place HOT 1
- xORMQR and xLARFB discrepancy with documentation HOT 17
- ?hbmv and ?hpmv and the documentation notes seem to be inconsistent
- lapack_sgeev routine not going to all routines shown in the call graph while debugging. HOT 17
- poor accuracy with dsterf
- in-place (scaled) matrix transposition: imatcopy HOT 6
- Uninitialized variables in BLAS test HOT 4
- Less accuracy due to FMAs HOT 8
- cblas_dgbmv: Result of row-major is not consistent to column-major HOT 6
- Should we allow C++? HOT 5
- Build failure with -Werror=lto-type-mismatch HOT 12
- Xerbla_array, Xerbla and lsame in BLAS or LAPACK? HOT 1
- LAPACK master does not compile with CMake 3.9
- Highly scale-dependent efficiency of ZHEEVR and DSYEVR HOT 3
- BDSQR workspace does not take into account fallback into LASQ1 HOT 1
- *syevr eigenvalues depend on whether eigenvectors are computed or not HOT 11
- xSTEMR: are zero eigenvalues sufficiently accurate?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lapack.