Giter VIP home page Giter VIP logo

Welcome to my GitHub Profile!

Profile Banner

👋 About Me

Hello! I'm Denis Moura, a Senior Data Engineer with over 4 years of experience in building scalable and resilient data pipelines and data platforms. My passion lies in working with data and using modern technologies to solve complex data problems. I have extensive experience with data models, SQL, AWS, and a deep love for Python. You can find me on LinkedIn and explore my projects here on GitHub, although I haven't been much active in my public projects lately.

🚀 Skills and Technologies

  • Programming Languages: Python, SQL, JavaScript
  • Big Data Technologies: Hive, S3, Google Storage, Presto, Athena, BigQuery, Spark
  • Data Warehousing & ETL: Snowflake, Airflow, AWS Glue, AWS Step Functions, Lambda Functions, Kafka
  • Cloud Platforms: AWS, GCP
  • Data Modeling & Quality: Data Lakes, Delta Lakes, Data Governance, Data Validations
  • Tools: Terraform, Docker, Kubernetes, Git, GitHub Actions
  • Data Visualization: Sigma Computing, PowerBI, Looker Studio, Metabase
  • Methodologies: Agile/Scrum

💼 Professional Experience

Lead Data Engineer @ Dexian Disys (2022 – Present)

  • Developed a data migration platform with custom, reusable operators in Airflow for large-scale ETL batch processes, reducing AWS costs significantly.
  • Managed multiple data projects migrating data from on-premises solutions and third-party APIs to Snowflake, handling up to 10 million records per day using Kafka for batch and streaming pipelines.
  • Created various reports and dashboards using Spotfire, Sigma Computing, and PowerBI.

Technologies: Python, Airflow, Snowflake, AWS, Git, GitHub Actions, Terraform, Docker, Kubernetes, Kafka

Data Engineer @ Varsomics – Hospital Israelita Albert Einstein (2021 – 2022)

  • Led a data migration project to create a data lake for 70 Terabytes of genomic data on AWS, employing S3, Glue, Athena, Lake Formation, and EMR to build a custom Delta Lake structure.
  • Developed and maintained numerous reports and dashboards for internal clients, streamlining genomics pipeline monitoring and final user results analysis.

Technologies: Python, AWS Glue, AWS Step Functions, AWS Athena, Terraform, Docker, Git, PySpark

Software Engineer @ PickCells (2020 – 2021)

  • Developed and maintained a microscopy solution, automating robot movements and enhancing camera focus using Python and C libraries.
  • Spearheaded an international data science project using Python for COVID-19 network analysis and led an on-premises to cloud data lake migration project using Airflow and AWS.

Technologies: Python, Airflow, AWS, Network Science, Kubernetes, Deep Learning, Computer Vision

🎓 Education

  • Ph.D. in Applied Biology (Bioinformatics), Universidade Federal de Pernambuco, 2022
  • M.Sc. in Applied Biology (Neuroscience & Bioinformatics), Universidade Federal de Pernambuco, 2018
  • B.Sc. in Biology, Universidade Federal de Pernambuco, 2015

🌟 Projects and Achievements

  • Data Migration Platform: Developed a reusable data migration platform in Airflow, optimizing cost and performance for a global client.
  • Genomic Data Lake: Led the creation of a genomic data lake, enhancing data governance and compliance with legislation.
  • On-premises to AWS Data Lake: Led the creation of a data lake in AWS, moving daily and almost real time data from On-Premises to AWS using Airflow.
  • Microscopy Automation: Built and maintained an automated microscopy solution, contributing to advanced research capabilities.

📫 Get in Touch

Feel free to reach out via Email or connect with me on LinkedIn.


Thanks for visiting my GitHub profile! Explore my repositories and feel free to contribute or reach out if you have any questions or collaboration ideas.

Denis Moura's Projects

assembler icon assembler

A read assembler python script implementing the Greedy Shortest Common Superstring algorithm.

covid19stats icon covid19stats

A Covid-19 statistics tracking app distributed through docker

dice_roller icon dice_roller

A dice rolling simulator to teach python modules, object oriented programming, system variables, text file operatorions, and list comprehensions.

dsp_etl_viz_project_gcp icon dsp_etl_viz_project_gcp

A repository for Sauter's DSP project containg a Spotipy ETL job by Apache Airflow, and connections to GCP.

functional-variant-caller icon functional-variant-caller

A functional annotation and variant calling pipeline for DNA-Seq. Pipeline orchestration in Python. Uses bwa, samtools, picard, freebayes and snpeff.

genotypeanalysis icon genotypeanalysis

This program extracts information from Genographic and FamilyTree DNA genotyping data, translating it into clinical relevant data.

megasena icon megasena

This program draws Mega Sena lottery tickets based on historical frequency of drawn numbers.

mirpfbc icon mirpfbc

In this project I will be analyzing microRNAs involved with PFBC-causing genes.

python_algorithms icon python_algorithms

This repo stores Python implementations of shortest path algorithms, such as Dijkstra and A*.

seirsplus icon seirsplus

Models of SEIRS epidemic dynamics with extensions, including network-structured populations, testing, contact tracing, and social distancing.

stgallen_loess icon stgallen_loess

A Python module to extract symptom percentage and produce an interactive html plot from it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.