srishtisingh3895 Goto Github PK

followers: 1.0 following: 7.0 repos: 10.0 gists: 0.0

Name: Srishti Singh

Type: User

Company: University of Rochester

Location: Washington, USA

Blog: https://www.linkedin.com/in/srishtisingh03

Srishti Singh's Projects

datalakewithspark_udacity

This project creates a data lake using AWS S3, EMR and Spark to build an ETL pipeline for a music database.

datamodelingwithapachecassandra_udacity

Modeled song data using Apache Cassandra and designed an ETL pipeline.

datamodelingwithpostgres_udacity

This project was done as a part of Udacity's Data Engineering Nanodegree Program.

datapipelineswithairflow_udacity

Created the DAGs and designed an ETL pipeline using custom operators to perform tasks such as staging the data, filling the data warehouse, and running checks on the data as the final step. This project was completed as a part of Udacity's Data Engineering Nanodegree.

datawarehousingonaws_udacity

This project builds a data warehouse using Amazon Redshift and S3, and was completed as a part of Udacity's Data Engineering Nanodegree program.

frequentpatternmining

Comparative study of frequent pattern mining algorithm on Adult Census Data

hackathon-summer-2020

Data and details for University of Rochester Biomedical Data Science Hackathon

online-music-streaming-app

In this project we have created a music database which can be a part of a much larger application of online music streaming. The database keeps record of all the songs and its properties as well as all the artists and their details. Moreover, it also keeps track of the all the users, their playlists and the songs in their playlists. The idea is to capture the user’s taste of music by storing the details of the songs like song name, song genre, artist, number of times user listened to a song etc. so that the analysts can use this data to design a recommender system and to improve the song base. The amount of data that can be collected for creating a music library is quite large. For the purpose of this project, we used two Kaggle datasets of Spotify top tracks and artists to create the database. We also generated synthetic data for user details, playlists etc. Database management system is necessary for this application since it consumes a huge amount of space, and many users access it at the same time from various locations. The database is administered by admins, who can add/delete songs as well as users from their region. Thus, there are two login pages, one for users and other for admins with different functions associated with these accounts. Some of the admin functions include, inserting new songs, deleting old songs, deleting and monitoring users. Moreover, some of the user functions are, viewing or searching songs/artists, creating playlists, deleting playlists, etc.

spotifydataanalysis

Analysis of popularity of top 100 spotify songs based on the musical attributes

usimmigrationdatalakeetl

The project aims to create a data lake for US immigration data and developing an ETL pipeline to build this data lake using data from various sources. The project was completed as a part of Udacity's Data Engineering Nanodegree program.

srishtisingh3895 Goto Github PK

Srishti Singh's Projects

datalakewithspark_udacity

datamodelingwithapachecassandra_udacity

datamodelingwithpostgres_udacity

datapipelineswithairflow_udacity

datawarehousingonaws_udacity

frequentpatternmining

hackathon-summer-2020

online-music-streaming-app

spotifydataanalysis

usimmigrationdatalakeetl

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent