Giter VIP home page Giter VIP logo

man-group / arcticdb Goto Github PK

View Code? Open in Web Editor NEW
1.1K 23.0 71.0 63.79 MB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

Home Page: http://arcticdb.io

License: Other

CMake 1.48% C++ 66.99% Shell 0.13% Python 31.40%
big-data data data-analysis data-science database pandas dataframe quantitative-analysis quantitative-finance quantitative-trading

arcticdb's Introduction



🌎 ArcticDB Website | πŸ“’ ArcticDB Blog | πŸ“£ Press Release | πŸ“£ Press Release | πŸ‘₯ Community


ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem. Launched in March 2023, it is the successor to Arctic.

ArcticDB offers an intuitive Python-centric API enabling you to read and write Pandas DataFrames to S3 or LMDB utilising a fast C++ data-processing and compression engine.

ArcticDB allows you to:

  • Pandas in, Pandas out: Read and write Pandas DataFrames, NumPy arrays and native types to S3 and LMDB without leaving Python.
  • Built for time-series data: Efficiently index and query time-series data across billions of rows
  • Time travel: Travel back in time to see previous versions of your data and create customizable snapshots of the database
  • Schemaless Database: Append, update and modify data without being constrained by the existing schema
  • Optimised for streaming data: Built in support for efficient sparse data storage
  • Powerful processing: Filter, aggregate and create new columns on-the-fly with a Pandas-like syntax
  • C++ efficiency: Accelerate analytics though concurrency in the C++ data-processing engine

ArcticDB handles data that is big in both row count and column count, so a 20-year history of more than 400,000 unique securities can be stored in a single symbol. Each symbol is maintained as a separate entity with no shared data which means ArcticDB can scale horizontally across symbols, maximising the performance potential of your compute, storage and network.

ArcticDB is designed from the outset to be resilient; there is no single point of failure, and persistent data structures in the storage mean that once a version of a symbol has been written, it can never be corrupted by subsequent updates. Pulling compressed data directly from storage to the client means that there is no server to overload, so your data is always available when you need it.

Quickstart

Prebuilt binary availability

PyPI (Python 3.6 - 3.11) conda-forge (Python 3.8 - 3.11)
Linux (Intel/AMD) βœ”οΈ βœ”οΈ
Windows (Intel/AMD) βœ”οΈ βž–
MacOS βž– Beta️

For conda-forge see the release-info.

Storage compatibility

Linux Windows Mac
S3 βœ”οΈ βœ”οΈ βœ”οΈ
LMDB βœ”οΈ βœ”οΈ βœ”οΈ
Azure Blob Storage βœ”οΈ βœ”οΈ βž–

We have tested against the following S3 backends:

  • AWS S3
  • Ceph
  • MinIO on Linux
  • Pure Storage S3
  • Scality S3
  • VAST Data S3

Installation

Install ArcticDB:

$ pip install arcticdb

or using conda-forge

$ conda install -c conda-forge arcticdb

Import ArcticDB:

>>> import arcticdb as adb

Create an instance on your S3 storage (with or without explicit credentials):

# Leave AWS to derive credential information
>>> ac = adb.Arctic('s3://MY_ENDPOINT:MY_BUCKET?aws_auth=true')

# Manually specify creds
>>> ac = adb.Arctic('s3://MY_ENDPOINT:MY_BUCKET?region=YOUR_REGION&access=ABCD&secret=DCBA')

Or create an instance on your local disk:

>>> ac = adb.Arctic("lmdb:///<path>")

Create your first library and list the libraries in the instance:

>>> ac.create_library('travel_data')
>>> ac.list_libraries()

Create a test dataframe:

>>> import numpy as np
>>> import pandas as pd
>>> NUM_COLUMNS=10
>>> NUM_ROWS=100_000
>>> df = pd.DataFrame(np.random.randint(0,100,size=(NUM_ROWS, NUM_COLUMNS)), columns=[f"COL_{i}" for i in range(NUM_COLUMNS)], index=pd.date_range('2000', periods=NUM_ROWS, freq='h'))

Get the library, write some data to it, and read it back:

>>> lib = ac['travel_data']
>>> lib.write("my_data", df)
>>> data = lib.read("my_data")

To find out more about working with data, visit our docs


Documentation

The source code for the ArcticDB docs are located in the docs folder, and are hosted at docs.arcticdb.io.

License

ArcticDB is released under a Business Source License 1.1 (BSL)

BSL features are free to use and the source code is available, but users may not use ArcticDB for production use or for a Database Service, without agreement with Man Group Operations Limited.

Use of ArcticDB in production or for a Database Service requires a paid for license from Man Group Operations Limited and is licensed under the ArcticDB Software License Agreement. For more information please contact [email protected].

The BSL is not certified as an open-source license, but most of the Open Source Initiative (OSI) criteria are met. Please see version conversion dates in the below table:

ArcticDB Version License Converts to Apache 2.0
1.0 Business Source License 1.1 Mar 16, 2025
1.2 Business Source License 1.1 May 22, 2025
1.3 Business Source License 1.1 Jun 9, 2025
1.4 Business Source License 1.1 Jun 23, 2025
1.5 Business Source License 1.1 Jul 11, 2025
1.6 Business Source License 1.1 Jul 25, 2025
2.0 Business Source License 1.1 Aug 29, 2025
3.0 Business Source License 1.1 Sep 13, 2025
4.0 Business Source License 1.1 Sep 27, 2025
4.1 Business Source License 1.1 Nov 1, 2025
4.2 Business Source License 1.1 Nov 12, 2025
4.3 Business Source License 1.1 Feb 7, 2026
4.4 Business Source License 1.1 Apr 5, 2026

Code of Conduct

Code of Conduct

This project has adopted a Code of Conduct. If you have any concerns about the Code, or behaviour that you have experienced in the project, please contact us at [email protected].

Contributing/Building From Source

We welcome your contributions to help us improve and extend this project!

Please refer to the Contributing page and feel free to open issues on GitHub.

We are also always looking for feedback from our dedicated community! If you have used ArcticDB please let us know, we would love to hear about your experience!

Our release process is documented here.

Community

We would love to hear how your ArcticDB journey evolves, email us at [email protected] or come chat to us on Twitter!

Interested in learning more about ArcticDB? Head over to our blog!

Do you have any questions or issues? Chat to us and other users through our dedicated Slack Workspace - sign up for Slack access on our website.

arcticdb's People

Contributors

alexowens90 avatar arcticdb-service-user avatar athakur91 avatar derthorsten avatar drnickclarke avatar eeaston avatar g-d-petrov avatar gemdot-neubla avatar hind-m avatar ianthomas23 avatar ivodd avatar jamesmunro avatar jjerphan avatar jmunro-mangroup avatar joe-iddon avatar johanmabille avatar klaim avatar lucyclark2 avatar mehertz avatar ms041223 avatar muhammadhamzasajjad avatar octogenary avatar phoebusm avatar poodlewars avatar qc00 avatar vasil-pashov avatar willdealtry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arcticdb's Issues

Make the static key map static

It's a bit sad at the moment that we use a semimap but it's in a runtime context so it just reverts to std::unordered_map behaviour. Fix it so that it is genuinely compile-time.

Apply BSL License

  • Add the BSL header to all source files
  • Adding the BSL license text to the license/ directory

FAQ - Common error messages

This should include details on how to configure ArcticDB logging and errors, such as:

  • How to configure the different logging sink levels
  • How to configure the AWS SDK logging level
  • Information on how the Python logs relate to the C++ logs

Misc small docs improvements

  • - Link to arcticdb.com from first "ArcticDB" string on getting started page
  • - Version maps not compacted - say that accessing early versions when there are many will be slower than more recent versions, with compaction API coming soon
  • - Ability to change some lib config settings after lib creation time coming soon
  • - Get rid of LibraryConfiguration docs page and just link to LibraryOptions API docs
  • - Explanation of NaN handling
  • - Explanation of Categoricals handling
  • - Coming soon - compaction API (solves data is fragmented problem)

GitHub Actions-based Build Tooling (Releases)

  • Wheel as artifact
  • Runs auditwheel
  • Ensure build exports debug symbols but builds in release mode
  • Test that a core dump can be retrieved from the release build, and debugged using the debug build
  • Ensure we can release off of GitHub

For the releases:

  • Only take place on release branches
  • Tag the release

Increase Python Pickle protocol to 4

We currently have the pickle protocol set as 2 for Python 2 compat.

We should be able to increase to 4 as a performance improvement.

(See internal ticket RAP-6273).

Docs accepted data/index types

  • What index types are supported
  • What data types are supported
  • What combination of types are supported by what combination of operation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.