Overview
Market Analytics is a real estate market data analytics platform built for a leading market research firm in Vietnam. The system collects and analyzes property transaction data from multiple sources, providing market reports and price trend intelligence for investors, banks, and property developers.
The standout capability is processing and visualizing over 1 million transaction records with sub-2-second query times — giving analysts the freedom to explore data interactively without waiting.
The Challenge
Real estate data in Vietnam is fragmented across many sources: online listing portals, provincial land price bulletins, auction records, and brokerage firm reports. Each source uses different formats, data quality is inconsistent, and address representations vary widely — the same district might appear as "Q.1, HCM", "Quận 1, TP.HCM", or "District 1, Ho Chi Minh City" across different sources.
Normalizing addresses was the core analytical challenge: all of these variations needed to be recognized as the same geographic entity for accurate cross-source comparison.
Our Solution
Ventra Rocket built a Python ETL pipeline with distinct stages: data scraping, cleansing with fuzzy matching for address normalization, geocoding to standardize geographic coordinates, and loading into Snowflake via a star schema optimized for analytical queries.
The React + D3.js frontend provides interactive dashboards with a real estate price heatmap by ward and district, price trend charts over time, and tools to compare market indicators across regions.
Key Features
- Automated Data Collection: Crawlers automatically collect data from 15+ sources daily, processing 50,000+ new listings per week with intelligent deduplication.
- Address Normalization Engine: Fuzzy matching algorithm normalizes addresses at 96% accuracy, with automatic geocoding to WGS84 coordinates for map visualization.
- Interactive Heatmap: Real estate price heatmap by ward across Ho Chi Minh City and Hanoi, with drill-down to individual projects and a 5-year historical timeline.
- Price Trend Analysis: Trend charts with seasonality decomposition, early detection of potential price bubbles, and comparison against macroeconomic indicators.
- Custom Report Builder: Self-service market report generator by region and segment, auto-exporting professional PDF reports on a configurable schedule.
Impact & Results
The system processes analytical queries across 1M+ records in under 2 seconds, enabled by Snowflake's optimized clustering keys and materialized views. Weekly market report preparation time fell from 2 days to 3 hours.
Data quality improved significantly: the successfully normalized address rate reached 96% vs. 60% previously. Clients gained the ability to analyze price trends by individual street — a level of granularity previously unavailable in Vietnam's real estate research market.
Tech Stack Details
Python with Pandas and GeoPandas handles the geospatial ETL pipeline — GeoPandas is particularly suited for spatial analysis and coordinate system transformations. Snowflake Time Travel allows querying data at any point in the past 90 days — invaluable for auditing and reproducing historical analyses. React with D3.js builds custom visualizations that off-the-shelf charting libraries cannot provide, particularly choropleth maps and multi-dimensional scatter plots. Mapbox GL renders interactive maps at high performance even with tens of thousands of data points rendered simultaneously.