Episode 13: Dealing with Imbalanced Courses in Geospatial Information
In real-world geospatial datasets, not all lessons are created equal. As an example, in land cowl classification, “city” or “water” areas would possibly occupy solely a small fraction in comparison with “vegetation.” This imbalance can mislead your mannequin into at all times predicting the dominant class — and nonetheless getting “excessive” accuracy. Let’s repair that.
The Downside
Imbalanced datasets make fashions biased towards the majority class. For spatial issues, this could imply lacking crucial minority zones (like flooded areas, illness hotspots, or uncommon soil sorts).
✅ 1. Resampling Methods
- Oversampling: Duplicate or synthesize extra samples from minority lessons.
- Use
SMOTE
fromimblearn
:
from imblearn.over_sampling import SMOTE
X_res, y_res = SMOTE().fit_resample(X, y)
Undersampling: Randomly cut back the variety of majority samples.
2. Class Weights
Give extra penalty for misclassifying minority lessons:
from sklearn.ensemble import RandomForestClassifier
mannequin = RandomForestClassifier(class_weight='balanced')