kimwalisch / primecount Goto Github PK

View Code? Open in Web Editor NEW

309.0 23.0 41.0 7.96 MB

🚀 Fast prime counting function implementations

License: BSD 2-Clause "Simplified" License

Shell 5.58% C++ 87.49% CMake 3.91% C 2.44% Roff 0.47% Batchfile 0.10%

prime-numbers math number-theory openmp primes avx512 arm-sve primepi

primecount's People

Contributors

Stargazers

Watchers

primecount's Issues

Integer overflow in phi(x, a)

It is a nice improvement, thank you!

I also saw your main loop in src/phi.cpp stops recursing when we know all further answers come from the cache. That's a big help. I further split it to stop looping entirely when xp <= primes[i] (we can just add in the right number of -SIGN values).

Something I noticed earlier:

time ./primecount --phi 1e9 100858536
1

real	0m0.579s
user	0m0.532s
sys	0m0.046s

time ./primecount --phi 1e9 900858536
-850011001

real	0m5.578s
user	0m5.143s
sys	0m0.422s

I think this is both performance and correctness. Both these should take essentially zero time to return 1.

If you can get a reasonable prime count upper bound or nth prime lower bound, you can use it in the main phi(x,a,threads) function before generating all the primes, to quickly decide you can return 1. E.g.
if (a >= prime_count_upper(x)) return 1;

Whether this is a case worth optimizing is up to you. The correctness problem with negative returns seems to be primes[a] being an int32_t so overflowing to negative values. Therefore the if statement after that doesn't get used like it should. Perhaps some limit checks before generating the primes.

Originally posted by @danaj in danaj/Math-Prime-Util#48 (comment)

Formulas for segmenting Gourdon's C formula

uint64_t max_b = pi[x_star];

// For primes[b] > sqrt(z):

// derived from high
uint64_t x_div_y = x / y;
uint64_t min_1st_prime1 = min(x_div_y / high, primes[pi_x_star]);
uint64_t min_index_1st_prime1 = pi[min_1st_prime1] + 1;

// derived from low
uint64_t min_1st_prime2 = min(isqrt(low), primes[pi_x_star]);
uint64_t min_index_1st_prime2 = pi[min_1st_prime2] + 1;

// Just like in the A formula the result of the formula above may be
// off by 1. Below is a workaround to fix this issue. I am currently
// unsure if the result could ever be off by more than 1.
uint64_t prime2 = primes[min_index_1st_prime2];
uint64_t m2 = max(prime2, x / ipow<T>(prime2, 3));
if (m2 < primes.back() && x / (prime2 * primes[pi[m2] + 1]) < low)
    min_index_1st_prime2 += 1;

// Another option would be to add a check when computing the
// second prime inside the actual algorithm
if (x / low < primes[b] * primes[min_m + 1])
    continue;

// derived from m > (x / prime^3)
uint64_t min_1st_prime3 = min(iroot<3>(x_div_y), primes[pi[x_star]]);
uint64_t min_index_1st_prime3 = pi[min_1st_prime3] + 1;

// derived from m > (z / prime)
uint64_t min_index_1st_prime4 = pi[isqrt(z)] + 1
uint64_t min_index_1st_prime = max(min_index_1st_prime1, ..2, ..3, ..4)

Question regarding PhiTiny test

primecount/test/phi_tiny.cpp

Line 54 in 49b286f

std::vector<char> sieve(size, 1);

@kimwalisch
Could you explain this line to me, please? I mean a step-by-step explanation of all the pieces of the line. And another thing: could you set up discussions, so I wouldn't need to open issues, please? I would really appreciate.

need to set LD_LIBRARY_PATH for default install on Debian 11

Not 100% sure why, but I cannot run primecount installed with the default procedure into /usr/local/ on Debian 11.
Specifically, I see

$ ldd /usr/local/bin/primecount
        linux-vdso.so.1 (0x00007ffc34ff6000)
        libprimecount.so.7 => not found
        libprimesieve.so.9 => not found
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f4428ba7000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f4428a63000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f4428a49000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4428884000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f4428d9d000)

Does one need to set some cmake options (for buidling shared libraries/exacutables with rpath?) while building in such a case?

(Normally speaking one ought to be able to produce an executable knowing location of each dylib - I know that cmake goes its own strange way there, something to do with https://gitlab.kitware.com/cmake/community/-/wikis/doc/cmake/RPATH-handling - but I don't get details).

Add C API

Hi, @kimwalisch.
Can you add C API as in the case of primesieve?
It's easier for making language bindings with C API.
Thank you.

Infinite loop in Li(x) & RiemannR(x) on Linux i386

These 3 tests cause an infinite loop on Linux i386 (e.g. Debian 12.5 i386):

test/Li.cpp
test/RiemannR.cpp
test/nth_prime.cpp

Test results.txt on Windows

I have written a basename() function which needs to be tested on Windows using both MSVC and GCC:

string basename(string path)
{
  while (path.back() == '/' ||
         path.back() == '\\')
  {
    path.pop_back();
  }

  size_t pos = path.find_last_of("/\\");

  if (pos != string::npos)
    return path.substr(pos + 1);

  return path;
}

The basename() function should produce entries like the one below (primecount without any slashes or backslashes) in results.txt:

primecount 1e17 -s --S2_trivial

Reason for nth prime limit?

Hey! Thanks for making primecount, it's amazingly fast. Just curious about one thing--is there any particular reason as to why 216289611853439384 was chosen as a limit?

EDIT: Whoops, meant to comment as JoshuaRLi, not my work account, no biggie though :P

GCC-15 compatibility

A few days ago, GCC merged some stricter template checks into the 15.x branch:

https://trofi.github.io/posts/322-gcc-15-template-checking-improvements.html

Gentoo's CI caught a build failure that looks related in https://bugs.gentoo.org/936498. I've copy/pasted some of the compiler output below:

FAILED: CMakeFiles/libprimecount.dir/src/deleglise-rivat/S2_easy_libdivide.cpp.o 
/usr/bin/x86_64-pc-linux-gnu-g++ -DENABLE_MULTIARCH_AVX512_BMI2 -Dlibprimecount_EXPORTS -I/var/tmp/portage/sci-mathematics/primecount-7.13/work/primecount-7.13/include  -O2 -pipe -march=native -fno-diagnostics-color -fPIC -fno-diagnostics-color -Wno-uninitialized -MD -MT CMakeFiles/libprimecount.dir
In file included from /var/tmp/portage/sci-mathematics/primecount-7.13/work/primecount-7.13/src/deleglise-rivat/S2_easy_libdivide.cpp:35:
/var/tmp/portage/sci-mathematics/primecount-7.13/work/primecount-7.13/include/libdivide.h: In member function 'bool libdivide::divider<T, ALGO>::operator==(const libdivide::divider<T, ALGO>&) const':
/var/tmp/portage/sci-mathematics/primecount-7.13/work/primecount-7.13/include/libdivide.h:2018:41: error: 'const class libdivide::divider<T, ALGO>' has no member named 'denom'
 2018 |         return div.denom.magic == other.denom.magic &&
      |                                         ^~~~~

There's still a long way to go before GCC-15 is released, but we're starting early this time so that all the complaints don't flood in on the same day.

build requires cmake >= 3.7

Building under Ubuntu 16.04 with cmake version 3.5.1 yielded the error:

   if given arguments:

     "CMAKE_VERSION" "VERSION_GREATER_EQUAL" "3.9"

   Unknown arguments specified

According to this thread:

Dav1dde/glad#134

VERSION_GREATER_EQUAL is only supported in versions 3.7 or later.

Suggest altering the minimum required version of cmake to cmake_minimum_required(VERSION 3.7).

add pkg-config config?

To facilitate things for non-cmake setups, such as checking the version, setting include and library paths, it would be great to have primecount generating configuration, a .pc file, and installing it in the appropriate location (often it's subdirectory pkgconfig of the directory where the library is installed).

Incorrect Common Lisp bindings link

Hi, Kim.
I just realize you add a language bindings section in README.md.
But the link to the common lisp repo should be cl-primecount.
Good to see you keep improving and inspiring.

Rename interval to distance

Just like in primesieve-5.7.0 rename most occurences of "interval" to "distance".

Broken WITH_POPCNT logic

see https://trac.sagemath.org/ticket/33082

Error: pi_gourdon_64(125)=31 instead of 30

$ ./primecount 125 -s -g

=== pi_gourdon_64(x) ===
pi(x) = A - B + C + D + phi0 + Sigma
x = 125
y = 7
z = 7
k = 4
alpha_y = 1.527
alpha_z = 1.000
threads = 8

=== Sigma(x, y) ===
Status: 100%                                      
Sigma = 8
Seconds: 0.000

=== Phi0(x, y, z) ===
Status: 100%                                      
phi0 = 28
Seconds: 0.000

=== B(x, y) ===
Status: 100%                                      
B = 5
Seconds: 0.000

=== AC(x, y) ===
Status: 100%                                      
A + C = 0
Seconds: 0.000

=== D(x, y) ===
Status: 100%                                      
D = 0
Seconds: 0.000

31
Seconds: 0.001

# B is correct
For prime 7 > && prime <= 11:
   sum += pi[125 / prime]

-> pi[125/11]
-> pi[11] = 5

AC(x, y) and D(x, y) are most likely correct as well since they only apply for primes > primes[k] (which is primes[4]=7) && primes <= y.

Legendre O(x)

Please forgive me since I'm not great at figuring out these formulas, but
your table says that pi_legendre solves 10^11 in .03 seconds, but the comments in pi_legendre say:

/// Count the number of primes <= x using Legendre's formula.
/// Run time: O(x)
/// Memory usage: O(x^(1/2))

I may not be interpreting O() correctly, but I would assume O(x) means x operations. If I write a program that just iterates to 10^11, it takes a long time, many seconds. Is this because I am misinterpreting how O() works or is it because my program is single threaded and slower than the computer used to produce the results in the table?

Thanks in advance!

Will

Please create discussions

@kimwalisch Hi, I am Jakub Drozd again, sorry for bothering you again, but I would like you to set up discussions, so I wouldn't have to open issues. Could you set up the button, please?

xavier gourdon's paper

I can't seem to find reference 7 anywhere on the internet. Does anyone have a link I can go to or a pdf I can read to get a better understanding of the algorithm?

Question regarding PhiTiny _sieve array

Hi @kimwalisch, I would like to see the full content of the PhiTiny sieve array, because it seems that it is all zero. Could you send it to me? I would be grateful.

explain in docs how to run tests

Apparently there is a testsuite in test/, but no info on how to actually use it is provided.

Potential bug

In P2.cpp we have the following comment:

 // \sum_{i=a+1}^{b} pi(x / primes[i])

But then I use the following code at line 78:

// a = pi(y)
int64_t start = (int64_t) max(x / z + 1, y);

This code has been used since 2013 and I have never seen an erroneous result but I think the condition is an off by 1 error and it should be fixed by:

int64_t start = (int64_t) max(x / z + 1, y + 1);

It is possible the I can even simplify the code to:

int64_t start = (int64_t) (x / z + 1);

Riemann R approximation is a little off

The output of --Ri gives different numerical results than Wolfram Alpha and python's mpmath. For example,

For powers of 10 starting at 10^0 primecount --Ri gives:

0
4
25
168
1226
9587
78527
664667
5761551
50847455
455050683
4118052494
37607910542
346065531065
3204941731601
29844570495886
279238341360977
2623557157055978
24739954284239494
234057667300228940
2220819602556027016
21127269485932299722
201467286689188773984
1925320391607837270272
18435599767347541916672
176846309399141934432256
1699246750872419992338432
16352460426841662907482112
157589269275973235320553472
1520698109714271829697232896

Using mpmath riemannr I get:

from mpmath import riemannr, mp
mp.dps = 40 # set high deliberately

for i in range(0, 30):
print(riemannr(10**i))

1.0
4.564583141005090239865774695584049809675
25.66163326692418259322679794035569814997
168.3594462811673480649133109867329108466
1226.93121834343310855421625817211837034
9587.431738841973414351612923908229431098
78527.39942912770485887029214095925103488
664667.4475647477679853466998874388326848
5761551.86732016956230886495973466773615
50847455.42772142751394887577256049489582
455050683.3068469244631532415819991388608
4118052494.63140044176104610770875738881
37607910542.22591023474569601742946144013
346065531065.8260271978929257301899631105
3204941731601.689034750500754116280826964
29844570495886.92737822228672779202881061
279238341360977.1872302539272988649936199
2623557157055978.003875460015662127309679
24739954284239494.40252165144480328003146
234057667300228940.2346566885611062528803
2220819602556027015.401217592243516061431
21127269485932299723.73386404400837512661
201467286689188773625.1590118748480817499
1925320391607837268776.080252873251190086
18435599767347541878146.80335901928241809
176846309399141934626965.8309690195779633
1699246750872419991992147.2218584438395
16352460426841662910939464.57821529157634
157589269275973235652219770.5690766206166
1520698109714271830281953370.160723380532

Ignoring the few first values, these differ in the following ways,

a) The values are much the same until 2220819602556027015.4 which ends with a 6 in the primecount version.
b) From there the values start to drift apart. For example the final value 1520698109714271830281953370.16 is 584720474.16 far from the primecount version.

I checked wolfram alpha and it gives results very close to mpmath and hence quite far from primecount.

prime_pi of a negative number fails with Bus error

As reported by @remyoudompheng at sagemath/sage#35332, prime_pi crashes on negative input. (I think this is true of the 128 bit variant too.)

$ cat sage_bug_35332.c
#include <primecount.h>
#include <stdio.h>

int main()
{
    int64_t pix = primecount_pi(-1);
    printf("primes <= -1: %ld\n", pix);

    return 0;
}
$ gcc sage_bug_35332.c -lprimecount && ./a.out
Bus error (core dumped)
(exit 135)

Build on windows using mingw fail

cmake .

-- The CXX compiler identification is GNU 8.1.0
-- Check for working CXX compiler: W:/xx_zd/x86_64-8.1.0-release-win32-sjlj-rt_v6-rev0/mingw64/bin/g++.exe
CMake Error: Generator: execution of make failed. Make command was: nmake -f Makefile cmTC_e23fb/fast &&
-- Check for working CXX compiler: W:/xx_zd/x86_64-8.1.0-release-win32-sjlj-rt_v6-rev0/mingw64/bin/g++.exe - broken
CMake Error at W:/python388/Lib/site-packages/cmake/data/share/cmake-3.20/Modules/CMakeTestCXXCompiler.cmake:59 (message):
  The C++ compiler

    "W:/xx_zd/x86_64-8.1.0-release-win32-sjlj-rt_v6-rev0/mingw64/bin/g++.exe"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: W:/cl/primecount/CMakeFiles/CMakeTmp

    Run Build Command(s):nmake -f Makefile cmTC_e23fb/fast 
    Generator: execution of make failed. Make command was: nmake -f Makefile cmTC_e23fb/fast &&




  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:2 (project)


-- Configuring incomplete, errors occurred!
See also "W:/cl/primecount/CMakeFiles/CMakeOutput.log".
See also "W:/cl/primecount/CMakeFiles/CMakeError.log".

I try this command:
cmake . -G "MingGW Makefiles"

CMake Error: Could not create named generator MingGW Makefiles

Generators
  Visual Studio 16 2019        = Generates Visual Studio 2019 project files.
                                 Use -A option to specify architecture.
  Visual Studio 15 2017 [arch] = Generates Visual Studio 2017 project files.
                                 Optional [arch] can be "Win64" or "ARM".
  Visual Studio 14 2015 [arch] = Generates Visual Studio 2015 project files.
                                 Optional [arch] can be "Win64" or "ARM".
  Visual Studio 12 2013 [arch] = Generates Visual Studio 2013 project files.
                                 Optional [arch] can be "Win64" or "ARM".
  Visual Studio 11 2012 [arch] = Generates Visual Studio 2012 project files.
                                 Optional [arch] can be "Win64" or "ARM".
  Visual Studio 10 2010 [arch] = Generates Visual Studio 2010 project files.
                                 Optional [arch] can be "Win64" or "IA64".
  Visual Studio 9 2008 [arch]  = Generates Visual Studio 2008 project files.
                                 Optional [arch] can be "Win64" or "IA64".
  Borland Makefiles            = Generates Borland makefiles.
* NMake Makefiles              = Generates NMake makefiles.
  NMake Makefiles JOM          = Generates JOM makefiles.
  MSYS Makefiles               = Generates MSYS makefiles.
  MinGW Makefiles              = Generates a make file for use with
                                 mingw32-make.
  Green Hills MULTI            = Generates Green Hills MULTI files
                                 (experimental, work-in-progress).
  Unix Makefiles               = Generates standard UNIX makefiles.
  Ninja                        = Generates build.ninja files.
  Ninja Multi-Config           = Generates build-<Config>.ninja files.
  Watcom WMake                 = Generates Watcom WMake makefiles.
  CodeBlocks - MinGW Makefiles = Generates CodeBlocks project files.
  CodeBlocks - NMake Makefiles = Generates CodeBlocks project files.
  CodeBlocks - NMake Makefiles JOM
                               = Generates CodeBlocks project files.
  CodeBlocks - Ninja           = Generates CodeBlocks project files.
  CodeBlocks - Unix Makefiles  = Generates CodeBlocks project files.
  CodeLite - MinGW Makefiles   = Generates CodeLite project files.
  CodeLite - NMake Makefiles   = Generates CodeLite project files.
  CodeLite - Ninja             = Generates CodeLite project files.
  CodeLite - Unix Makefiles    = Generates CodeLite project files.
  Eclipse CDT4 - NMake Makefiles
                               = Generates Eclipse CDT 4.0 project files.
  Eclipse CDT4 - MinGW Makefiles
                               = Generates Eclipse CDT 4.0 project files.
  Eclipse CDT4 - Ninja         = Generates Eclipse CDT 4.0 project files.
  Eclipse CDT4 - Unix Makefiles= Generates Eclipse CDT 4.0 project files.
  Kate - MinGW Makefiles       = Generates Kate project files.
  Kate - NMake Makefiles       = Generates Kate project files.
  Kate - Ninja                 = Generates Kate project files.
  Kate - Unix Makefiles        = Generates Kate project files.
  Sublime Text 2 - MinGW Makefiles
                               = Generates Sublime Text 2 project files.
  Sublime Text 2 - NMake Makefiles
                               = Generates Sublime Text 2 project files.
  Sublime Text 2 - Ninja       = Generates Sublime Text 2 project files.
  Sublime Text 2 - Unix Makefiles
                               = Generates Sublime Text 2 project files.

with this cmd message.
Some error with setting of c/c++ compiler?

results.txt: incorrect seconds

The primecount-backup version prints the correct elapsed seconds to the screen after stop & resume:

$ ./primecount 1e19 --S2_hard -s

=== S2_hard(x, y) ===
Computation of the hard special leaves
x = 10000000000000000000
y = 66422883
z = 150550526390
c = 6
alpha = 30.831
threads = 8

Status: 93%^C
$ ./primecount 1e19 --S2_hard -s

=== S2_hard(x, y) ===
Computation of the hard special leaves
x = 10000000000000000000
y = 66422883
z = 150550526390
c = 6
alpha = 30.831
threads = 8

Resume from primecount.backup
Status: 100%                                      
S2_hard = 254486506300390351
Seconds: 86.761

But in the results.txt file the seconds are not the total seconds, it guess it is the seconds since the resume:

primecount 1e19 --S2_hard -s
Result: 254486506300390351
Threads: 8
Seconds: 26.758
Date: 2017-07-12 16:36:47

AddressSanitizer: undefined-behavior

Using the LLVM/clang sanitizer I found a minor issue:

src/S2LoadBalancer.cpp:175:49: runtime error: division by zero
SUMMARY: AddressSanitizer: undefined-behavior src/S2LoadBalancer.cpp:175

By 2025 change default setting to WITH_DIV32=OFF in CMakeLists.txt

The current default WITH_DIV32=ON increases branch mispredictions and only makes sense on CPUs with slow 64-bit integer division.

Enhance code sections in README.md

Instead of:

$ ./configure
$ make
$ sudo make install

Use:

./configure
make
sudo make install

The rationale of this change is to make copy - pasting more convenient.

Formulas for segmenting Gourdon's A formula

Xavier Gourdon's algorithm requires looking up PrimePi(x) values up to x^(1/2). Since a PrimePi(x) lookup table of size x^(1/2) is too large we need to segment the PrimePi(x) lookup to a size of y (which is about x^(1/3) * log(x)).

For any segment bounded by [low, high[ here are the formulas to compute the minimum and maximum values of the 1st prime used in the algorithm:

min_index_1st_prime = pi[x_star] + 1
max_index_1st_prime = pi[min(sqrt(x / low), x^(1/3))]

For any segment bounded by [low, high[ here are the formulas to compute the minimum and maximum values of the 2nd prime used in the algorithm:

1st_prime = primes[min_index_1st_prime]
max_index_2nd_prime = pi[min(x / (low * primes[b]), sqrt(x / primes[b]))]

Data Races in LoadBalancerS2.cpp

Using the Coderrect Scanner https://coderrect.com/ I have discovered three data race pairs occurring in LoadBalancerS2.cpp.

Line 96 vs itself

 94|{
 95|  LockGuard lockGuard(lock_);
>96|  sum_ += thread.sum;
 97|
 98|  if (is_print_)

Line 107 vs Line 114

 105| update_load_balancing(thread);
 106|
>107| thread.low = low_;
 108| thread.segments = segments_;
 109| thread.segment_size = segment_size_;
 110| thread.sum = 0;
 111| thread.secs = 0;
 112| thread.init_secs = 0;
 113|
>114| low_ += segments_ * segment_size_;
 115| bool is_work = thread.low < sieve_limit_;
 116|

Line 122 vs Line 124

 120|void LoadBalancerS2::update_load_balancing(const ThreadSettings& thread)
 121|{
>122|  if (thread.low > max_low_)
 123|  {
>124|   max_low_ = thread.low;
 125|   segments_ = thread.segments;
 126|

Multi-threading data races

There are a few non-critical data races in primecount where multiple threads write/read to some variable without using a mutex or lock. I am not aware of any CPUs where this could cause a segmentation fault but it is still best to fix these data races.

$ valgrind --tool=drd ./primecount --test

==23554== Conflicting store by thread 1 at 0x0036c411 size 1
==23554==    at 0x128080: primecount::set_print(bool) (in /home/kim/Desktop/primecount-master/build3/primecount)
==23554==    by 0x11AEDA: primecount::pi_legendre(long, int) (in /home/kim/Desktop/primecount-master/build3/primecount)

==23554== Conflicting store by thread 1 at 0x0036c418 size 4
==23554==    at 0x14DE88: primesieve::set_num_threads(int) (in /home/kim/Desktop/primecount-master/build3/primecount)
==23554==    by 0x11B950: primecount::pi_primesieve(long, int) (in /home/kim/Desktop/primecount-master/build3/primecount)

Multiple threads are allowed to read the same variable without locking. But multiple threads are not allowed to write/read the same variable without locking. Both primecount::set_print(bool) and primesieve::set_num_threads(int) are write accesses without locking which is not allowed and which causes a data race.

Idea for improvement to LMO/pi_lmo5.cpp

I was playing around with the timing of your LMO implementation;

primecount/src/lmo/pi_lmo5.cpp

Line 56 in a7e8351

int64_t segment_size = Sieve::get_segment_size(isqrt(limit));

I noticed that setting segment_size = 2 * segment_size seems to run ~2-5% faster

A couple of random tests suggest that for 1e11 - 1e16 4 * segment_size is the sweet spot.

Looking at the code I suspect that there's a trade off between overhead of starting sieve.cross_off_count and better localized memory access.

Maybe get_segment_size could be augmented with 'suggested_segment_size(limit)which handles finding a nice power of two or similar number aroundsqrt(limit)`.

Before I went further I thought I should seek your opinion in case this has been tested or there are other issues I'm not aware of.

make install fail

with

cmake . -DCMAKE_INSTALL_PREFIX=local -DBUILD_STATIC_LIBS=OFF

I obtain

[primecount-4.2] CMake Error at lib/primesieve/cmake_install.cmake:95 (file):
[primecount-4.2]   file INSTALL cannot find
[primecount-4.2]   "/opt/sage/local/var/tmp/sage/build/primecount-4.2/src/primesieve.pc".
[primecount-4.2] Call Stack (most recent call first):
[primecount-4.2]   cmake_install.cmake:90 (include)

Enable GCC/Clang function multiversioning for ARM SVE once the compilers support it

We have ARM SVE support in Sieve_count.cpp however by default it is not enabled since the GCC/Clang compilers do not support function multiversioning for ARM yet.

Once GCC/Clang compilers support it, I can enable it using:

Create multiarch_arm_sve.cmake CMake script similar to multiarch_avx512_vbmi2.cmake.
Use multiarch_arm_sve.cmake in CMakeLists.txt.
Add new ENABLE_MULTIARCH_ARM_SVE macro to all CMakeLists.txt files.
Add new ENABLE_MULTIARCH_ARM_SVE macro to Sieve.hpp and Sieve_count.cpp.

Get rid of MAX_THREADS constant

Just like in primesieve-5.7.0 get rid of MAX_THREADS constant which is defined as -1 and hence dubious.

describing how to build dynamically linked (lib)primecount without root

Playing with cmake a bit, I found how to set up libprimecount and primecount without root, something that's valuable in a number of settings, e.g. on HPC clusters, other shared machines - or building libprimecount extensions (e.g. Python ones).
Should this be added to docs?

The following is only tested on Linux so far.

set the desired location, e.g. export WDIR=$HOME/tmp
install libprimesieve as follows:

cd primesieve
cmake -DBUILD_STATIC_LIBS=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_INSTALL_PREFIX=$WDIR .
make -j  install     # no sudo!

install primecount as follows

cd primecount
cmake -DBUILD_STATIC_LIBS=OFF -DBUILD_SHARED_LIBS=ON \
       -DCMAKE_INSTALL_PREFIX=$WDIR \
      -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=TRUE  -DCMAKE_INSTALL_RPATH=$WDIR \
      -DCMAKE_SKIP_BUILD_RPATH=FALSE  -DBUILD_LIBPRIMESIEVE=OFF  \
      -DCMAKE_FIND_ROOT_PATH=$WDIR/lib/cmake  \
      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY .
make -j install # no sudo!

done, now you can do $WDIR/bin/primecount 1000000 and it will work as it should.

primecount-backup prints incorrect time elapsed

primecount-backup uses Gourdon's algorithm only above 1e7. If one first computes e.g. pi(1e10) using Gourdon's algorithm and then e.g. pi(100) primecount-backup will print the time elapsed of the previous pi(1e10) computation.

brew install on macOS installs primecount 7.0 and no library

Please update the brew recipe, I have got the following error on M1 Pro Max, Monterey 12.1b3

primecount -n 1000
dyld[11851]: Library not loaded: @rpath/libprimecount.7.dylib
Referenced from: /opt/homebrew/Cellar/primecount/7.0/bin/primecount
Reason: tried: '/usr/local/lib/libprimecount.7.dylib' (no such file), '/usr/lib/libprimecount.7.dylib' (no such file)

By 2027 change default setting to WITH_LIBDIVIDE=OFF in CMakeLists.txt

We currently use libdivide because integer division is very slow on most x64 CPUs. However integer division will likely become much faster in near future and then WITH_LIBDIVIDE=OFF is expected to improve primecount's performance. But carefully benchmark some large inputs e.g. 1e21 --AC -s.

Primecount fails at runtime with GCC-12 & MinGW/MSYS2 (Windows 10)

This seems to be an OpenMP issue of GCC-12 & MinGW/MSYS2. I have debugged the issue and found that the error occurs on exit of an omp parallel section in P2.cpp:

  std::cout << "P2_OpenMP()5" << std::endl;

  #pragma omp parallel num_threads(threads) reduction(+:sum)
  {
  std::cout << "P2_OpenMP()6" << std::endl;

    int64_t low, high;
    while (loadBalancer.get_work(low, high))
      sum += P2_thread(x, y, low, high);

  std::cout << "P2_OpenMP()7" << std::endl;
  }

  std::cout << "P2_OpenMP()8" << std::endl;

cd primecount/build
cmake .. -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Debug -DBUILD_TESTS=ON
./primecount.exe 20000 --P2

...
P2_OpenMP()5
P2_OpenMP()6
P2_OpenMP()7

Since P2_OpenMP()8 is not printed the exit clearly happens when the OpenMP parallel section finishes, so this seems to be a MinGW OpenMP issue. Note that primesieve's test suite completes successfully using the lastest GCC-12 & MinGW/MSYS2, hence this does not seem to be a libprimesieve issue. I have also tried running an older primecount release i.e. v6.4 from 2020 and it causes the same issue, hence the issue is not caused by a bug that I have recently introduced.

TODO: test with c < PhiTiny::get_c(y)

While working on primesum I tested this briefly and the results were not correct! So it is possible that there is a bug.

// Current code
int64_t c = PhiTiny::get_c(y);

// Test with e.g.
int64_t c = 2;

kimwalisch / primecount Goto Github PK

primecount's People

Contributors

Stargazers

Watchers

Forkers

primecount's Issues

Recommend Projects

Recommend Topics

Recommend Org