Giter VIP home page Giter VIP logo

publicidconverter's Introduction

Hi there ๐Ÿ‘‹

Here's some information that can help you to know about me, let's go!

TLDR; Check out this pdf or image versions of my CV.

Hits

Experience

  • 01/2022 -> present: Data Engineer at MoMo (M_service). From MoMo Talents Program.

Education


Skills

  • Agile / Scrum concept
  • Programming Languages (C/C++, Java, Kotlin, Python, SQL,...)
  • MS SQL Server / Oracle OCI / Bigquery / Vertica / Trino
  • Open Table Format (Delta Lake / Apache Iceberg)
  • Command Line (with or without Linux/Unix system)
  • Git and Version Control
  • CI / CD
  • Shell / Linux
  • Docker
  • Kubernetes
  • ETL / ELT
  • Spark Application
  • Data modeling
  • Data Observability / Data Quality / Data Catalog / Data Security
  • Data Governance
  • Google Cloud Platform (Bigquery / PubSub / Dataproc / GKE / GCS / Cloud Functions / Resource monitoring / Looker / GCP gRPC API)
  • Oracle APEX
  • Scikit-learn
  • Machine Learning Algorithms
  • Generative AI
  • MS Office
  • Kubectl / Helm / Skaffold
  • Bazel
  • Infrastructure as code (IaC) with pulumi
  • Policy as code

Tools

Contributions


Project

Company Projects

  • Golden Record - Process to achieve high-value Data Mart at MoMo
    Build tools and services on top of open-source projects to control the data model's quality, freshness, and extensionality. Golden Record currently serves many dataflows such as events and transactions of the MoMo Super App.
    Used: dbt, Great Expectations, Airflow, Gitlab, Kubernetes, Oracle OCI, and Oracle APEX.

  • Cost Optimization - Reduce cost on GCP
    Support other teams to optimize queries: move services, ETL, and ELT to on-premise Kubernetes. Try to shift from Bigquery to Vertica. Manage GCP resources for each team in MoMo by the divide-and-conquer principle.
    Conclusion: 40% cost saved without any stuck workload.
    Fluent in: Bigquery, Vertica, Kubernetes, Oracle APEX, GCP gRPC API.

  • Data Observability - Data Governance
    Just a project which helps end-user monitor five pillars of data: Freshness, Volume, Quality, Schema, and Lineage. This project aims to reduce the workload of the data-platform team in responsiveness to data for both info and incident.
    Fluent in: Datahub, dbt, Great Expectations, Airflow.

  • Data Lakehouse
    Collaborate with the team to build a lakehouse solution to reduce the cost of all workloads at Momo. Trino/Spark run on GKE as a query engine to process large batch data stored in GCS. Reduce up to 70% cost per workload thanks to Spot instance without any data SLA.
    Fluent in: Trino, Spark, GKE, GCS, Bigquery Storage, dbt, Airflow, Apache Ranger, Delta Lake, Apache Iceberg

  • Data Pipeline Migration
    Build a transpiling tool based on top of open-source projects to help end-to-end migrate SQL from current production environment to the Lakehouse, reduce up to 90% human cost of the migration phase at Momo.
    Fluent in: SQLGlot, Trino/Presto, Bigquery, Airflow

University projects


Badges

There are a lot of badges (with AI, Machine Learning, Deep Learning, and Data Scientist) I have reached from that base on Google Cloud Platform.

Let's check out my Qwiklabs Public Profile.

Programming Languages

Top Langs

Duy's GitHub stats


Contact

Website

Github Page: viplazylmht.github.io

publicidconverter's People

Contributors

viplazylmht avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

ariden24

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.