If you are interested in practical data science or data analysis, below
is a recipe to get started. It is suggested that you read the material
in the first links (1-8), but after that, you can skip around based on interest.
You may notice that the two most significant programming languages used in data science and data analysis are Python and R (with small fractions that use Matlab, SAS, and Julia). We welcome any language, however, most of us at SIG-Data use Python for it’s popularity, simplicity, plotting aesthetics, and range of methods. Hence, most content here will be for Python.
The recommended course of study below is a practical one. Though theory is important, in data science, learning data exploration and plotting should come first before you go into more advanced methods. If you can’t see the pattern you hypothesize in the data with your own eyes, then it’s probably not real!
As exciting as machine learning and drawing inferences is, to begin you must master the basics of data wrangling, plotting, summarizing, as well as manual inference and hypothesis testing on data.
A refresher in python with a data analytics perspective:
If you just need a quick review of python:
For a great overview of the full data-science stack in python:
The first step to great data science is visualizations!!
Narrowing down on data science specific tasks:
A great book on data analysis in Python:
Read the PANDAS tutorials; most of us at SIG-Data prefer it to R Data Frames
Once you have mastered the basics above, then you can start moving on to more advanced methods for machine learning on your data:
If you are interested in application-specific tasks like vision or natural language processing (NLP):
If you are interested in crawling the web, for example to get NLP or financial data:
And finally, if you want to start playing around with real datasets and seeing what is out there, check out Kaggle:
If you would like to learn R for Data Science/Analytics: