Skip to main content

Dataware

·5 mins

This project is my personal path to learning and practising on topics related to Data Science, Machine Learning, and Deep Learning. The plan includes the following goals:

  • ☑ Training on Machine Learning and Deep Learning. I completed the Machine Learning course and the Deep Learning specialization at Coursera. These courses are great and I highly recommend them, even if the former is a bit outdated. I gained a solid theoretical background of the math and “artistry” required to get the models working, and exposure to the basic Python stack for Data Science (Pandas, NumPy, Matplotlib, Scikit-Learn, TensorFlow, Keras).

  • ☑ Reading the Python for Data Analysis book and going through the exercises in detail. The goal is to solidify my expertise in using the Pandas-NumPy-Matplotlib stack.

  • ☞ Reading the Hands-On Machine Learning with Scikit-Learn & TensorFlow book and going through selected exercises in detail. The goal is to solidify my expertise on these two well known Python-based tools. This goal is my current focus.

  • ☞ Develop models of different kinds to tackle real-world data sets to infer valuable information. This effort will go from working with entry-level data sets such as Titanic and Iris on my own (yes, I know, these data sets are too basic), to participating in bigger projects. Details will follow as things evolve, likely in parallel with my book-based learning efforts. On-going.

Changelog #

  • ☑ [Mar 25, 2019] Wrote Titanic Machine Learning Models, a Jupyter notebook about the Titanic dataset. This is the last kernel in the series. While completing the Titanic series I finished reading the Python for Data Analysis book; objective achieved.

  • [Feb 18, 2019] Wrote Titanic Data Exploration, a Jupyter notebook about the Titanic dataset. This is the second notebook in the series.

  • [Sep 28, 2018] Wrote Titanic Data Cleanup, a Jupyter notebook about the Titanic dataset from the competition hosted at Kaggle. This is the first in a series of posts about the Titanic which will grow until I submit my model to the competition.

  • [Sep 23, 2018] Finished reading chapter 12 of the Python for Data Analysis book. It is an interesting chapter on time series. My reading was lighter compared with previous chapters as I approached the chapter with the “reference guide” mindset. That is, I wanted to become aware of what is available more than in mastering the techniques in detail, which only comes with experience. I was impressed with the capabilities to handle dates and date periods which I had no idea they existed.

  • [Sep 21, 2018] Finished reading chapters 8, 10, and 11 of the Python for Data Analysis book. These chapters are massive to read as a story book, following every step in detail on my laptop. Half-way through my reading I realized that:

    1. I won’t memorize all Numpy and pandas tricks by reading the book. Only experimenting with different datasets will provide me with useful expertise.

    2. Nevertheless, it is advisable to read the book following the details in my laptop. This helps me build a mental picture of the tool’s capabilities which I wouldn’t get easily otherwise.

    3. In the long term, the book must me treated as a reference guide whose content can be accessed at random to consult any individual topic as required.

    I skipped chapter 9 about matplotlib graphics for now as I am more interested in the core data wrangling with pandas first; I’ll come back to it later.

  • [Sep 13, 2018] Finished reading chapter seven of the Python for Data Analysis book. The most interesting topics to me were the pandas’ tools to handle missing data, discretization and binning, and vectorized string functions.

  • [Sep 10, 2018] Finished reading chapter six of the Python for Data Analysis book. This is a small chapter on saving and loading data from Python, with a few topics new to me such as the HDF5 file format.

  • [Sep 09, 2018] Finished reading chapter five of the Python for Data Analysis book. This chapter is about panda’s Series and Data Frames. I typed all exercises instead of just reading them. This is a better way to get concepts to sink in. There are no exercises to do, I’ll leave them for the next chapters.

  • [Sep 01, 2018] Completed work on chapter four of the Python for Data Analysis book. This chapter is all about Numpy and a very brief intro to matplotlib. I wrote a program to exercise the topics discussed in this chapter. It generates and visualizes two different 2D random walks.
    Programs: random-walk

  • [Aug 24, 2018] Read the first three chapters of the Python for Data Analysis book. Pretty easy reading as the last two chapters are just a review of the Python language.

  • [Jul 31, 2018] Verified and updated my previous installation of IPython and Anaconda. Everything is up-to-date now.

  • ☑ [Feb 19, 2018] Completed the Deep Learning specialization. This included the following five courses:

    • Neural Networks and Deep Learning (four weeks)
    • Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization (three weeks)
    • Structuring Machine Learning Projects (two weeks)
    • Convolutional Neural Networks (four weeks)
    • Sequence Models (three weeks)
  • ☑ [Dec 23, 2017] Completed the Machine Learning course. This course is spread over 11 weeks. It covers multiple topics, including linear and logistic regression on single and multiple variables, Support Vector Machines, unsupervised learning, neural networks, dimensionality reduction, anomaly detection, recommender systems, and several others. The course exercises are done in GNU Octave, which is interesting, but of not much use in data science anymore.