Giter VIP home page Giter VIP logo

dat-main's Introduction

dat

dat is an open-source data integration platform. It platform enables seamless replication and transformation of data into various vector databases, making it an ideal solution for applications involving machine learning, search engines, and AI-driven analytics.

Key Features

  1. Open Source: Fully open-source, promoting community contributions and enabling users to tailor the platform to their specific needs.
  2. Extensive Connectors: Provides a comprehensive library of connectors for various vector storage systems, facilitating easy data integration from multiple sources.
  3. Custom Connector Development: Users can create and manage their own connectors, ensuring compatibility with any vector database.
  4. Automated Scheduling: Includes robust scheduling capabilities to automate and manage recurring data replication tasks, ensuring data is always up-to-date.
  5. Monitoring and Alerting: Offers built-in monitoring and alerting features to track the health of data pipelines and quickly address any issues that arise.

dat aims to simplify the integration and management of data within vector storage environments, providing a scalable and user-friendly platform for data engineers, machine learning practitioners, and developers. By leveraging our platform, users can focus on leveraging their data for advanced analytics and AI applications without worrying about the complexities of data integration.

Join us in building a powerful and flexible data integration solution for the vector storage ecosystem!

Running locally ๐Ÿš€

First build and run

  1. Ensure that you have docker installed.
  2. Download and run
curl -sSL https://raw.githubusercontent.com/dat-labs/dat-main/main/run-dat-platform.sh | bash -s -- --rebuild=false
  1. Wait for the build to complete and this message to show:

     _      _     _         _ _    _                     _     _       _ 
  __| |__ _| |_  | |__ _  _(_) |__| |  __ ___ _ __  _ __| |___| |_ ___| |
 / _` / _` |  _| | '_ \ || | | / _` | / _/ _ \ '  \| '_ \ / -_)  _/ -_)_|
 \__,_\__,_|\__| |_.__/\_,_|_|_\__,_| \__\___/_|_|_| .__/_\___|\__\___(_)
                                                   |_|                   

  1. Visit http://localhost:3000 on your browser.

Press Ctrl + C to stop dat.

Subsequent runs

To run dat again, navigate to the dat dir and run docker compose up.

cd dat && docker compose up

Update

To update the source files to the latest revision:

  1. Navigate to the dat dir and run:
curl -sSL https://raw.githubusercontent.com/dat-labs/dat-main/main/update-dat-platform.sh | bash -s
  1. Execute the following docker command:
docker compose build --no-cache
  1. Then restart the containers using:
docker compose down && docker compose up

Contributing ๐Ÿฑโ€๐Ÿ’ป

Verified Connectors

  1. Clone this repo.
  2. Ensure that you have docker installed.
  3. Download and run
./dev-dat-platform.sh --rebuild=false

Press Ctrl + C to stop dat.

Integration

Assuming that you have built an actor and you now wish to integrate it into the locally running dat instance, follow these steps.

For developing actors, please refer the detailed guide given here for verified-generators.

The following steps will:

Steps:

  1. Create a virtualenv (minimum Python3.10) and activate it.
  2. Install poetry and install required dependencies.
pip install poetry && poetry install
  1. Setup the repo for actor you wish to develop and/or integrate. This will:
    • Delete the following repositories (ones which were cloned from dat-labs, if present):
      • selected verified-* actor directory
      • dat-api
      • dat-orchestrator
    • git clone your forked repository in its place (if not already cloned)
    • git checkout your feature branch (if provided)
    • Generate stub source files and tests (if does not exist already)
    python cli/main.py init

Stub files have been generated inside the (cloned) verified-* dir.

  1. Create a virtualenv (minimum Python3.10) and activate it. Install dependencies.
cd verified-{actor} && pip install poetry && poetry install
  1. Develop your verified-* actor and ensure tests pass. Detailed dev guides are given here: verified-sources, verified-generators, verified-destinations

    pytest verified_*/{your-actor}/tests/test_{your-actor}.py 
  2. To add your actor to local database for local integration testing:

    • You might have added some poetry dependencies in your developed actor. These need to be installed in the api and orchestrator containers. This can be achieved by running.
      cd /path/to/dat-main/dat-dev
      docker compose build api orchestrator --no-cache
    • Once the above is done, ensure your local dat is running:
      docker compose up
    • Execute the cli command to add your actor to the local backend database.
      cd /path/to/dat-main
      python cli/main.py add-to-db

Subsequent runs

To run dat again, navigate to the dat-dev dir and run docker compose up.

cd /path/to/dat-main/dat-dev
docker compose up

Troubleshooting

Try looking for your issue under BUGS. There is a good chance that someone else from the community encountered the issue and found a solution.

Additional resources and further instructions right up to your PR can be found at CONTRIBUTING.md.

dat-main's People

Contributors

riju-dc avatar rijumone avatar ankit-dc avatar info-datachannel avatar suryaanshrai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.