Giter VIP home page Giter VIP logo

bmcdonald3 / chapel Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chapel-lang/chapel

1.0 1.0 0.0 1.04 GB

a Productive Parallel Programming Language

Home Page: https://chapel-lang.org

License: Other

Makefile 0.38% C++ 28.54% Perl 0.32% C 8.32% Shell 1.56% LLVM 0.05% Python 3.10% Emacs Lisp 0.03% Gnuplot 0.03% Chapel 57.36% TeX 0.05% Cuda 0.01% Fortran 0.05% Zimpl 0.01% Mathematica 0.01% CSS 0.01% HTML 0.01% JavaScript 0.14% Batchfile 0.01% Lex 0.03%

chapel's People

Contributors

aconsroe-hpe avatar arezaii avatar ben-albrecht avatar benharsh avatar bhavanijayakumaran avatar bmcdonald3 avatar bradcray avatar cassella avatar danilafe avatar daviditen avatar dlongnecke-cray avatar e-kayrakli avatar gbtitus avatar jabraham17 avatar jeremiah-corrado avatar jhh67 avatar kyle-b avatar lydia-duncan avatar mppf avatar noakesmichael avatar npadmana avatar riftember avatar ronawho avatar shreyaskhandekar avatar spartee avatar stonea avatar sungeunchoi avatar thomasvandoren avatar tomyhoi avatar vasslitvinov avatar

Stargazers

 avatar

chapel's Issues

Parquet remaining work

Reading:

  • Get size of parquet files (without having to read the entire table)
  • Read in multiple parquet files to block distributed array
  • Read in multiple parquet files to cyclic distributed array
  • Integrate into Arkouda (with all the weird logging/error checking)

Writing:

  • Understand Arkouda HDF5 writing (hopefully in Arkouda weekly call)
  • Copy HDF5 writing style for Parquet

General:

  • Get Arrow/Parquet checking to be generalized and not hardcoded
  • Create testing/get running on nightly
  • Convert Arrow module to feel like the C_HDF5 module
  • Create slides and prep for SF call

Load factor setter for `map`

Summary

Our hash table probing method was switched from using regular quadratic probing to triangular numbers probing. A property of quadratic probing is the table must not exceed 50% capacity to guarantee finding a slot, but triangular numbers probing guarantees finding a slot if there is an open one, regardless of capacity (it checks each slot exactly once). This means that we can give more flexibility to users of map by exposing a maxLoadFactor (to be consistent with other languages) to allow them to decide when is best for their map to grow.

Proposed solutions

Proposal 1

var m = new map(int, int);
m.setMaxLoadCapacity(0.75);

maxLoadCapacity is set through a setter method and takes a fraction that corresponds to the % full that the table can reach before resizing

Proposal 2

var m = new map(int, int);
m.setMaxLoadCapacity(2);

Identical to proposal 1, but the maxLoadCapacity is set by an integer whose reciprocal (1/val) is the max capacity (i.e., passing 2 means your max capacity is 1/2 or 50%)

Proposal 3

var m = new map(int, int, maxLoadCapacity=.75);

maxLoadCapacity is a keyword argument for the map constructor

Comparisons to other languages

C++ unordered_map

  • C++ exposes a max_load_factor() which allows setting how full the map will be before resizing
  • takes a integer number instead of a fraction (e.g., max_load_factor = 2 = 1/2, so map can be 50% full before resize; max_load_factor = 3 = 1/3 so map can be 33% full before resize)

Usage:

std::unordered_map<std::int, std::int> m;
m.max_load_factor(2); // map can now be up to 50% full (numslots/loadfactor)

(thanks for pointing this one out Andrew)

Java Hashtable

  • the Java Hashtable constructor can take an initial capacity and a loadFactor
  • loadFactor in Java is a fraction, rather than an integer (e.g., loadFactor = .75 means table can be 75% full)

Usage:

Hashtable<int, int> m = new Hashtable<int, int>(10, 0.50); // initial capacity 10, table can be up to 50% full

Other languages

  • Rust doesn't have a concept of a loadFactor, but instead allows setting of an initial capacity, which may have the same effect in many cases
  • Python, Julia, etc., do not have a concept of this, just like our map now

Other stuff that you may not be particularly interested in

Motivated by: chapel-lang#18427 and https://github.com/Cray/chapel-private/issues/2536
PR with proposal 1: chapel-lang#18523
C++ load factor: https://en.cppreference.com/w/cpp/container/unordered_map/max_load_factor
Java hashtable: https://docs.oracle.com/javase/8/docs/api/java/util/Hashtable.html
Rust hashmap: https://doc.rust-lang.org/std/collections/struct.HashMap.html

Ability to use user-defined hash function for Map

Summary

Currently, the Chapel Map uses the default hash function in all scenarios, so even if the user wanted to use their own hash function, they are not able to. Many languages have hash table implementations that allow users to provide their own hash functions, and I also have observed performance improvements in certain instances with this change. I am proposing that we add this functionality to the map API.

Proposed Solutions

Proposal 1: Detect if there is a hash method provided on the key type

  proc int.hash() {
    return calculateHash();
  }
  var myMap:map(int,string);
  // will use the above hash since it is defined on int, even though not explicitly stated by user
  myMap[1] = "hello";

Proposal 2: Allow map to accept a hash record or first class function

proc doHash(arg, seed:uint): uint {
  return calculateHash();
}
var myMap2: map(int, string, doHash);
myMap2[1] = "hello 2";

Pros/cons of proposed solutions

  1. Detect if there is a hash method provided on the key type:
    a. Pros: Simple to use, not much extra work by user
    b. Cons: Since hash method now has special meaning, it might no longer be used in applications for other purposes, not very intuitive; no one would know how to use it unless they had seen the documentation
  2. Allow map to accept a hash record or first class function:
    a. Pros: Other implementations do this, such as the C++ one that our program is based on so there is precedence
    b. Cons: Changes the Map type to include the "hasher" type when specified, also, a little bit more work to program for a user in my opinion

More info for the highly interested

I ran into this while running performance tests on the k-nucleotide benchmark, where the C++ implementation that our implementation is based on allows a user to pass a hash function. The k-nucleotide implementation does the hashing with a required hash function before using the map, so we are double hashing in that case. If you want more info on the result of using the built in/user defined hash, see: https://github.com/Cray/chapel-private/issues/2365

C++ implementation usage:

struct T {
  ... other stuff ...
   struct hash{
     uint64_t operator()(const T& t)const{ return t.data; }
   };
};

cc_hash_table<T,unsigned,T::hash> myMap;

Todos

PREPARE FOR ARKOUDA MEETING ON WEDNESDAY:

  • benefits of C++
    • we can assign straight to the pointer (so less memory), we have more control over reader options (so better performance), we can (potentially) use our own memory allocators to avoid having to copy at all
  • progress with C++
    • should be at same place as GLib on the Chapel side by EOD Tuesday (more confident in implementation, just haven't brought it into Arkouda yet)
  • drawbacks of C/GLib
    • hard to install, not supported very widely, terrible documentation, things you just can't do

Second batch:

  • experiment with Elliot's make for Arrow
  • make a branch with the C++ Arrow in Arkouda
  • open up draft PR for the slides of Arrow branch
  • get the C++ Arrow to the same place as the GLib arrow
  • merge those last PR's
  • fix the c2chapel testing thing
  • add a member variable for setting the grow size of Map()
  • finish my slides
  • review Engin's slides
  • do performance experiments with the Arrow stuff
  • EASY: get a review on my c2chapel changes chapel-lang#18355

Short term

  • figure out what's going on with those failing tests (something with associative domains)
  • slides due Sept 24
  • collect perf numbers for different times to resize tables with new map
  • figure out why the c2chapel testing isn't working with the spawn module (didn't figure out, but worked around it)

Sort of short term

  • Arkouda draft PR with changes
  • prep slides for Oct 14
  • figure out testing issues with Arrow
  • Parquet string reading (less urgent)

Should a load factor setter for map be added?

Summary

Exposing a maxLoadFactor (name chosen to be consistent with other languages, open for discussion) to allow users to decide when is best for their map to grow can help improve both performance and memory in specific cases. Choosing a higher load factor can help save memory on unnecessary map allocations if memory is a concern. Choosing a lower load factor can help improve performance because it can reduce the number of slots probed when there are collisions.

Proposed solutions

Proposal 1: Setter with fractional argument

var m = new map(int, int);
m.setMaxLoadCapacity(0.75);

Proposal 2: Setter with integer argument

var m = new map(int, int);
m.setMaxLoadCapacity(2);

Proposal 3: Keyword argument for map constructor

var m = new map(int, int, maxLoadCapacity=.75);

Pros/cons of proposals

  1. Setter with fractional argument
    a. Accepting a fraction is easy to understand IMO (table can be this % full)
    b. Setter method won't make constructors more complicated for something most users won't use
    c. Not consistent with the C++ implementation
  2. Setter with integer argument
    a. Consistent with C++ implementation
    b. Confusing to wrap your head around IMO (table can be 1/value full -- hard to intuit)
    c. Limiting in what can be achieved
  3. Keyword argument for map constructor
    a. Doesn't require adding another method
    b. Consistent with Java implementation
    c. Fixed for life of map

Comparisons to other languages

C++ unordered_map

std::unordered_map<std::int, std::int> m;
m.max_load_factor(2); // map can now be up to 50% full (numslots/loadfactor)

Java Hashtable

Hashtable<int, int> m = new Hashtable<int, int>(10, 0.50); // initial capacity 10, table can be up to 50% full

Other languages

  • Rust doesn't have a concept of a loadFactor, but instead allows setting of an initial capacity, which may have the same effect in many cases
  • Python, Julia, etc., do not have a concept of this, just like our map now

Links to related information

Motivated by: chapel-lang#18427 and https://github.com/Cray/chapel-private/issues/2536
PR with proposal 1: chapel-lang#18523
C++ load factor: https://en.cppreference.com/w/cpp/container/unordered_map/max_load_factor
Java hashtable: https://docs.oracle.com/javase/8/docs/api/java/util/Hashtable.html
Rust hashmap: https://doc.rust-lang.org/std/collections/struct.HashMap.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.