Snowflake Data Modeling: Star Schema, Clustering, and Cost Optimization
Best practices for Snowflake data warehouse design — star schema, clustering keys, materialized views, virtual warehouse sizing, and credit cost optimization.
Snowflake separates compute from storage, changing how you think about data modeling. This guide covers patterns and cost optimization from Ventra Rocket's Snowflake implementations.
1. Star Schema
Star schema reduces join complexity and enables effective result caching.
-- Fact table (high volume, append-only)
CREATE TABLE fact_orders (
order_id NUMBER PRIMARY KEY,
order_date_id NUMBER REFERENCES dim_date(date_id),
customer_id NUMBER REFERENCES dim_customer(customer_id),
product_id NUMBER REFERENCES dim_product(product_id),
quantity NUMBER,
unit_price FLOAT,
total_amount FLOAT
)
CLUSTER BY (order_date_id);
-- Dimension table
CREATE TABLE dim_customer (
customer_id NUMBER PRIMARY KEY,
full_name VARCHAR(200),
email VARCHAR(200),
country VARCHAR(100),
segment VARCHAR(50),
is_current BOOLEAN DEFAULT TRUE,
valid_from DATE,
valid_to DATE
);
2. Clustering Keys
Clustering ensures related data lands in the same micro-partitions, reducing scan time.
-- Time-series: cluster by date
ALTER TABLE fact_orders CLUSTER BY (order_date_id);
-- Multi-dimension: most filtered column first
ALTER TABLE fact_events CLUSTER BY (event_date, event_type);
-- Check clustering depth (optimal: 1-4)
SELECT SYSTEM$CLUSTERING_DEPTH('fact_orders', '(order_date_id)');
3. Materialized Views
CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT
d.year,
d.month,
c.country,
COUNT(DISTINCT o.customer_id) AS unique_customers,
SUM(o.total_amount) AS revenue,
AVG(o.total_amount) AS avg_order_value
FROM fact_orders o
JOIN dim_date d ON o.order_date_id = d.date_id
JOIN dim_customer c ON o.customer_id = c.customer_id
GROUP BY 1, 2, 3;
4. Virtual Warehouse Sizing
| Size | Credits/Hour | Use Case | |------|-------------|----------| | XS | 1 | Dev, small queries | | S | 2 | Light production | | M | 4 | Standard analytics | | L | 8 | Complex joins, large scans | | XL | 16 | ETL, data loading |
CREATE WAREHOUSE analytics_wh
WAREHOUSE_SIZE = MEDIUM
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE
MAX_CLUSTER_COUNT = 3
SCALING_POLICY = ECONOMY;
5. Cost Optimization
-- Zero-copy cloning — no storage cost for unchanged data
CREATE TABLE orders_test CLONE fact_orders;
-- Time travel for debugging
SELECT * FROM fact_orders AT (OFFSET => -7200); -- 2 hours ago
-- Check query cache hit rate
SELECT query_text, partitions_scanned
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE partitions_scanned = 0 -- Cache hit = zero credits
ORDER BY start_time DESC LIMIT 20;
6. Bulk Data Loading
COPY INTO fact_orders
FROM @s3_stage/orders/2026/03/
FILE_FORMAT = (
TYPE = PARQUET
SNAPPY_COMPRESSION = TRUE
)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
ON_ERROR = SKIP_FILE_10%;
Conclusion
Effective Snowflake usage: star schemas, clustering keys based on filter columns, materialized views for repeated aggregations, right-sized warehouses per workload. Ventra Rocket typically achieves 60-80% credit reduction after applying these optimizations.