Scraped 5,000+ listings from 99acres for Gurgaon utilizing IP-rotation, HTML parsing, and customized dealing with for blocked requests. Cleaned inconsistencies, dealt with lacking values through contextual logic, and utilized regex-based location normalization.
Crafted highly effective derived options:
- Worth per sqft
- Luxurious Rating utilizing KMeans Clustering
- Possession Age Buckets
Used 8+ characteristic significance strategies to finalize ~12 high predictors — guaranteeing each interpretability and efficiency.
Examined a number of regressors — Random Forest, XGBoost, Additional Timber — and tuned them throughout 12,800 mixtures utilizing 10-fold Cross-Validation.
Achieved:
Constructed 3 fashions centered on:
- Location Proximity (utilizing radius + regex-cleaned locality names)
- Facility Scoring (based mostly on quantity and sort of facilities)
- Cosine Similarity (characteristic vector matching)
Mixed them through a weighted hybrid system permitting user-controlled priorities (e.g., proximity vs worth vs options).
Crafted interactive dashboards:
- GeoMap Visualizations (scraped lat-long knowledge)
- Sankey Diagrams
- Heatmaps & Charts
All wrapped inside a clear, mobile-responsive Flask internet utility.