dat is an open-source data integration platform. It platform enables seamless replication and transformation of data into various vector databases, making it an ideal solution for applications involving machine learning, search engines, and AI-driven analytics.
- Open Source: Fully open-source, promoting community contributions and enabling users to tailor the platform to their specific needs.
- Extensive Connectors: Provides a comprehensive library of connectors for various vector storage systems, facilitating easy data integration from multiple sources.
- Custom Connector Development: Users can create and manage their own connectors, ensuring compatibility with any vector database.
- Automated Scheduling: Includes robust scheduling capabilities to automate and manage recurring data replication tasks, ensuring data is always up-to-date.
- Monitoring and Alerting: Offers built-in monitoring and alerting features to track the health of data pipelines and quickly address any issues that arise.
dat aims to simplify the integration and management of data within vector storage environments, providing a scalable and user-friendly platform for data engineers, machine learning practitioners, and developers. By leveraging our platform, users can focus on leveraging their data for advanced analytics and AI applications without worrying about the complexities of data integration.
Join us in building a powerful and flexible data integration solution for the vector storage ecosystem!
- Ensure that you have
docker
installed. - Download and run
curl -sSL https://raw.githubusercontent.com/dat-labs/dat-main/main/run-dat-platform.sh | bash -s -- --rebuild=false
- Wait for the build to complete and this message to show:
_ _ _ _ _ _ _ _ _
__| |__ _| |_ | |__ _ _(_) |__| | __ ___ _ __ _ __| |___| |_ ___| |
/ _` / _` | _| | '_ \ || | | / _` | / _/ _ \ ' \| '_ \ / -_) _/ -_)_|
\__,_\__,_|\__| |_.__/\_,_|_|_\__,_| \__\___/_|_|_| .__/_\___|\__\___(_)
|_|
- Visit http://localhost:3000 on your browser.
Press Ctrl + C to stop dat.
To run dat again, navigate to the dat
dir and run docker compose up
.
cd dat && docker compose up
To update the source files to the latest revision:
- Navigate to the
dat
dir and run:
curl -sSL https://raw.githubusercontent.com/dat-labs/dat-main/main/update-dat-platform.sh | bash -s
- Execute the following docker command:
docker compose build --no-cache
- Then restart the containers using:
docker compose down && docker compose up
- Clone this repo.
- Ensure that you have
docker
installed. - Download and run
./dev-dat-platform.sh --rebuild=false
Press Ctrl + C to stop dat.
Assuming that you have built an actor and you now wish to integrate it into the locally running dat
instance, follow these steps.
For developing actors, please refer the detailed guide given here for
verified-generators
.
The following steps will:
- Create a virtualenv (minimum Python3.10) and activate it.
- Install
poetry
and install required dependencies.
pip install poetry && poetry install
- Setup the repo for actor you wish to develop and/or integrate.
This will:
- Delete the following repositories (ones which were cloned from dat-labs, if present):
- selected
verified-*
actor directory dat-api
dat-orchestrator
- selected
git clone
your forked repository in its place (if not already cloned)git checkout
your feature branch (if provided)- Generate stub source files and tests (if does not exist already)
python cli/main.py init
- Delete the following repositories (ones which were cloned from dat-labs, if present):
Stub files have been generated inside the (cloned)
verified-*
dir.
- Create a virtualenv (minimum Python3.10) and activate it. Install dependencies.
cd verified-{actor} && pip install poetry && poetry install
-
Develop your
verified-*
actor and ensure tests pass. Detailed dev guides are given here: verified-sources, verified-generators, verified-destinationspytest verified_*/{your-actor}/tests/test_{your-actor}.py
-
To add your actor to local database for local integration testing:
- You might have added some
poetry
dependencies in your developed actor. These need to be installed in theapi
andorchestrator
containers. This can be achieved by running.cd /path/to/dat-main/dat-dev docker compose build api orchestrator --no-cache
- Once the above is done, ensure your local
dat
is running:docker compose up
- Execute the cli command to add your actor to the local backend database.
cd /path/to/dat-main python cli/main.py add-to-db
- You might have added some
To run dat again, navigate to the dat-dev
dir and run docker compose up
.
cd /path/to/dat-main/dat-dev
docker compose up
Try looking for your issue under BUGS. There is a good chance that someone else from the community encountered the issue and found a solution.
Additional resources and further instructions right up to your PR can be found at CONTRIBUTING.md.