Now we’re attending to the “most attention-grabbing“ half.
My opinion is that it’s essential to know the fundamentals properly to be a strong information scientist. It doesn’t imply to be a nerd, however a great understanding of the primary rules will assist you to in each the job and succeeding in an interview.
Within the roadmap, I counsel you get to know solely essentially the most typically used algorithms however it’s important to know them very properly. Utilizing this information, you may then proceed with different algorithms.
Now, let’s go.
It is a excellent course to get an outline of what machine studying is and what are the 2 commonest issues which might be solved by ML: regression and classification. Don’t go over the tons of different intro programs, take this.
Word: by default, Coursera is just not free however you may ask for monetary help they usually provides you with that after consideration. I did that a number of instances again in my pupil days.
Step 1: Videos of 3blue1brown about linear algebra
Step 2: Tutorial of Python Linear Algebra by Pablo Caceres
Step 1: Statistics Crash Course by Adriene Hill
Step 2: Learn Statistics with Python by Ethan Weed
There are a large quantity of algorithms however you barely use even 20% of them. I suggest you be taught the next checklist after which proceed with the remaining utilizing the data you get.
There will probably be some intersections with Andrew Ng’s course, however it could not damage to go a bit deeper and have totally different implementations and views on the identical materials.
Intro idea: Nando de Freitas lectures at UBC
Python Implementation
Regularization in linear regression
Regularization is a vital idea to grasp and with linear fashions, you are able to do it simpler. There will probably be quite a lot of questions in interviews about it, so be sure to know them.
Step 1: Nando de Freitas lectures at UBC
Step 2: Visible rationalization with code
Sklearn tutorial with Lasso model
Logistic regression is a baseline algorithm for classification duties. As it’s extremely associated to the linear regression mannequin, you do not want to be taught it from scratch however you will need to perceive some necessary ideas about it.
Intro: Logistic regression topic of mlcourse.ai
Selected topic: odds ratio as weights interpretability
This one it’s important to know by coronary heart, I’m sorry. I provide you with some good sources to begin.
Step 1: Gradient Boosting topic of mlcourse.ai
Step 2: Gradient Boosting, deeper dive
I personally discovered loads from the unique XGBoost paper, however Natekin’s paper could be very detailed and all the time nice to return again to whenever you overlook issues.
Step 3: Demo playground by Alex Rogozhnikov
One other genius made a terrific visualization for us, regular folks. By the way in which, take a look at his total weblog. It’s merely superb.
One other essential algorithm to know by coronary heart. Please, perceive the distinction between Random Forest and Gradient Boosting, I guess you get this query in 30–40% of the interviews.
Step 1: Lectures by Nando de Freitas
Step 2: Bagging topic on mlcourse.ai
You will have grown up, my pal. You’re able to know be taught issues from information with out understanding what’s the true label/worth. Let’s see how.
- PCA: Material from the one and famous Sebastian Rashka
- t-SNE
– What is it and how to run it in Python
– How to use t-SNE effectively (with great visualizations) - UMAP
Function choice is likely one of the most necessary subjects whenever you actually wish to enhance your mannequin, make it extra clear, and perceive the WHYs behind the predictions.
Function significance
Linear strategies: Chapter 5 of Interpretable Machine Studying ebook
Tree-based methods: Youtube Raschka lecture
Permutation feature importance: Chapter 8 of Interpretable Machine Learning book
SHAP: SHAP library documentation
Mannequin metrics analysis
Okay, you match the mannequin however then what? Much more, which metric you select to your drawback? The next hyperlinks present a great overview about Execs and Cons of the primary regression and classification metrics. You may additionally typically see questions on these metrics within the interview.
Regression metrics: H2O blog tutorial
Classification metrics: Evidently AI blog tutorial
Cross-validation is necessary to grasp to successfully keep away from overfitting.
There are tons of sources on neural networks. It’s THE hottest matter. Particularly with all the thrill with LLM. For my part, to get an intro into the subject, Andrew Ng’s specialization remains to be nice. He goes step-by-step and I assure you’ll perceive the idea. From that, you may go deeper relying on the area you have an interest in.
It has 5 courses in it, so take a deep breath.
Optimization is a comparatively laborious, heavy-math matter. However it’s utilized in many sensible purposes. I extremely advise you to steadily be taught this matter, as it would open nice profession alternatives.
That is an AWESOME useful resource on numerical optimization. Clear examples in Python with mathematical derivations of the fundamentals.
Bayesian Optimization
Bayesian optimization is a set of optimization strategies that permit optimization of black-box features utilizing input-output sampling.
Source 1: Awesome playground with theory explanation by distill.pub
Source 2: Tutorial with deep theory dive by Nando de Freitas and Co.
Optimization with SciPy
There are numerous optimization Python libraries you should utilize for optimization. SciPy could be very typically used for it. When you come throughout the necessity to use SciPy for this, take a look at these sources:
Typically it’s helpful to play with parameters and see how the algorithm works. Right here is a superb playground with a few strategies.
Extra sources
Sign processing is usually a vital a part of an ML tasks as a result of you may have to have the ability to filter information from noise outliers and different soiled stuff.
Paid supply:
I extremely suggest a paid course by Mike Cohen. For this worth and high quality, I take into account it basically free. I’ve accomplished the course myself and prefer it loads. Since then, I’ve utilized a number of strategies from the course in follow.
Free sources:
In order for you absolutely free alternate options, listed here are some hyperlinks on filtering and Fourier rework.
Imply filter
Median filters
Exponential smoothing
Gaussian filter
Fourier rework
High and low move filters