The project is a concoction of research (audio signal processing, keyword spotting, ASR), development (audio data processing, deep neural network training, evaluation) and deployment (building model artifacts, web app development, docker, cloud PaaS) by integrating CI/CD pipelines with automated tests and releases.
Implement data management pipeline for data extraction, validation, data version control etc.
Use cloud storage services like Amazon S3 bucket to store data, artifacts, predictions.
Even though exception handling is implemented in the code, it is equally important to write separate test cases for different scenarios.
Orchestrate the entire workflow as automated pipeline by means of orchestration tools like Airflow, KubeFlow. As this is a small personal project with a static dataset, the pipeline can be created using normal function calls. But, it is crucial and predominant to replace them with orchestration tools for large, scalable and real-time workflows.
Implement Continuous Training (CT) pipeline along with CI/CD.
When using poetry over pip, it is possible to add and install dependencies using poetry add. But, what if there are multiple dependencies specified in requirements.txt (conventional way) ?