UPDATE April 15, 2024: This repository has been archived and is superseded by my personal website (dinhanhthi.com) and the repositories on my Github account. I've also compiled a note about all resources I've have followed within the field of Data Science.
The list of things I've finished so far on the way of learning by myself Machine Learning and Data Science.
- My raw notes: rawnote.dinhanhthi.com (quickly capture ideas from the courses).
- My main notes: dinhanhthi.com/notes (well-written notes, not only for me).
- My learning log.
- Setting up a cafΓ© in Ho Chi Minh City β find a best place to setting up a new business β article β source.
- Titanic: Machine Learning from Disaster (from Kaggle) β predicts which passengers survived the Titanic shipwreck β source.
I also do some mini-projects for understanding the concepts. You can find the html files (exported from the corresponding Jupyter Notebook files) and "Open in Colab" files for below mini projects here.
- Image compression using K-Means β source β Open in Colab β my note
- Example to understand the idea of PCA β source β Open in Colab.
- Image compression using PCA β source β Open in Colab.
- PCA without scikit-learn β source β Open in Colab.
- Face Recognition using SVM β source β Open in Colab.
- XOR problem using SVM to see the effect of gamma and C in the case of using RBF kernel β source β Open in Colab.
- Anomaly Detection. β my note
- Data Aggregation β my note
- Data Overview. β my note
- Data Visualization.
- Model evaluation.
- Preprocessing (texts, images, dates & times, structured data). β my note
- Testing. β my note
- Web Scraping.
- GraphQL β an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data.
- Python β an interpreted, high-level, general-purpose programming language β my note.
- R β a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.
- Scala β a general-purpose programming language providing support for functional programming and a strong static type system.
- SQL β a domain-specific language used in programming and designed for managing data held in a relational database management system, or for stream processing in a relational data stream management system.
- Apache Airflow β my note
- Docker β a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers β my note
- Google Colab β a free cloud service, based on Jupyter Notebooks for machine-learning education and research β my note.
- Google Kubernetes
- Hadoop β a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation.
- Kaggle β an online community of data scientists and machine learners, owned by Google.
- PostgreSQL (Postgres) β a free and open-source relational database management system emphasizing extensibility and technical standards compliance.
- Spark β an open-source distributed general-purpose cluster-computing framework.
- Bash β my note
- Git β a distributed version-control system for tracking changes in source code during software development β my note.
- Markdown β a lightweight markup language with plain text formatting syntax β my note.
- Jupyter Notebook β an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text β my note.
- Trello β a web-based Kanban-style list-making application.
The "ticked" libraries don't mean that I've known/understand whole of them (but I can easily use them with their documentation)!
- D3js β a JavaScript library for producing dynamic, interactive data visualizations in web browsers.
- Keras β an open-source neural-network library written in Python.
- Matplotlib β a plotting library for the Python programming language and its numerical mathematics extension NumPy. β my note
- Numpy β a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. β my note
- OpenCV β a library of programming functions mainly aimed at real-time computer vision.
- Pandas β a software library written for the Python programming language for data manipulation and analysis. -- my note
- Plotly -- the front-end for ML and data science models.
- PyTorch -- my note
- Seaborn β a Python data visualization library based on matplotlib.
- Scikit-learn β a free software machine learning library for the Python programming language.
- TensorFlow β a free and open-source software library for dataflow and differentiable programming across a range of tasks.
The "non-checked" courses are under the way to be finished!
- Advanced Data Science with IBM Specialization on Coursera.
- Advanced Machine Learning with TensorFlow on Google Cloud Platform Specialization by Google Cloud Training on Coursera.
- Advanced Statistics for Data Science Specificaton by Johns Hopkins University on Coursera.
- Anomaly Detection in Time Series Data with Keras by Coursera Project Network. -- my certificate
- CS231n: Convolutional Neural Networks for Visual Recognition by Stanford.
- Data Science Path on Codecademy. It contains 27 sub-courses covering all necessary knowledge about data science β my certificate β notes & codes.
- Data Scientist path & Data Engineer path on Dataquest. Both of them contain many sub-courses covering all about Data Science β my note β my certificate
- Deep Learning Specialization by Andrew NG on Coursera. It contains 5 courses covering the foundations of Deep Learning (CNN, RNN, LSTM, Adam, Dropout, BatchNorm, Xavier/He initialization,...). Many case studies projects are proposed β my note -- my certificate.
- fast.ai's courses for Machine Learning and Deep Learning..
- IBM AI Engineering Professional Certificate on Coursera.
- IBM Data Professional Certificate specialization on Coursera. It contains 9 sub-courses covering fundamental knowledge about data science β my note β my certificate.
- Introduction to Statistics with NumPy on Codecademy β my certificate.
- Learn Python 3 on Codecademy β my note β my certificate.
- Learn SQL on Codecademy β my certificate.
- Machine Learning by Andew NG on Coursera. It introduces a general idea about ML and some commonly used algorithms β my note β my certificate.
- Machine Learning Crash Course by Google.
- Machine Learning with TensorFlow on Google Cloud Platform Specialization by Google Cloud Training on Coursera.
- MIT Deep Learning
- Natural Language Processing by HSE University on Coursera. -- my note -- my certificate
- Natural Language Processing Specialization by deeplearning.ai on Coursera.
- Shervine Amidi's courses about Machine Learning, Deep Learning, AI, Stats (Stanford University)
- TensorFlow: Data and Deployment Specialization by deeplearning.ai on Coursera.
- TensorFlow in Practice Specialization by deeplearning.ai on Coursera. -- mynote -- my cerficate.
- TensorFlow Tutorials.
The "non-checked" books are under the way to be finished!
- An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirami.
- Deep Learning with Python by François Chollet.
- Dive into Deep Learning β An interactive deep learning book with code, math, and discussions, based on the NumPy interface. β Github.
- Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd edition) by AurΓ©lien GΓ©ron.
- Machine Learning Yearing by Andew NG.
- Practical Machine Learning: A New Look at Anomaly Detection -- Ted Dunning & Ellen Friedman
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani and Jerome Friedman.
- Awesome's lists:
- Awesome Anomaly Detection -- A curated list of awesome anomaly detection resources.
- Awesome Big Data β A curated list of awesome big data frameworks, ressources and other awesomeness.
- Awesome Data Engineering β A curated list of data engineering tools for software developers.
- Awesome Deep Learning β A curated list of awesome Deep Learning tutorials, projects and communities.
- Awesome Deep learning papers and other resources β Deep Learning and deep reinforcement learning research papers and some codes.
- Awesome Machine Learning β A curated list of awesome Machine Learning frameworks, libraries and software.
- Awesome Public Datasets β A topic-centric list of HQ open datasets.
- 120 Data Science Interview Questions β Answers to 120 commonly asked data science interview questions.
- A Machine Learning Course with Python β Machine Learning Course with Python. Refer to the course page for step-by-step explanations.
- Python Data Science Handbook β Python Data Science Handbook: full text in Jupyter Notebooks.
- Homemade Machine Learning β Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained.
- TensorFlow-Course β Simple and ready-to-use tutorials for TensorFlow.
- Machine Learning & Deep Learning Tutorials β ML and DL tutorials, articles and other resources.
- 100-Days-Of-ML-Code.
- Data science blogs β A curated list of data science blogs.
- data-science-ipython-notebooks β DS Python notebooks: DL (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
- Papers With Code β a free and open resource with Machine Learning papers, code and evaluation tables.
- Chris Albon's notes β Notes On Using Data Science & Artificial Intelligence To Fight For Something That Matters.
- Seeing Theory β A visual introduction to probabilities and statistics.
- Collection of useful articles for understanding concepts in ML, AI and DS.
The descriptions of terms in this site are borrowed from Wikipedia.