snowflake9 min read

Snowflake Data Modeling: Star Schema, Clustering, and Cost Optimization

Best practices for Snowflake data warehouse design — star schema, clustering keys, materialized views, virtual warehouse sizing, and credit cost optimization.

By Ventra Rocket

·Published on 5 February 2026

#Snowflake#Data Warehouse#SQL#Star Schema#Data Modeling

Snowflake separates compute from storage, changing how you think about data modeling. This guide covers patterns and cost optimization from Ventra Rocket's Snowflake implementations.

1. Star Schema

Star schema reduces join complexity and enables effective result caching.

-- Fact table (high volume, append-only)
CREATE TABLE fact_orders (
  order_id        NUMBER PRIMARY KEY,
  order_date_id   NUMBER REFERENCES dim_date(date_id),
  customer_id     NUMBER REFERENCES dim_customer(customer_id),
  product_id      NUMBER REFERENCES dim_product(product_id),
  quantity        NUMBER,
  unit_price      FLOAT,
  total_amount    FLOAT
)
CLUSTER BY (order_date_id);

-- Dimension table
CREATE TABLE dim_customer (
  customer_id  NUMBER PRIMARY KEY,
  full_name    VARCHAR(200),
  email        VARCHAR(200),
  country      VARCHAR(100),
  segment      VARCHAR(50),
  is_current   BOOLEAN DEFAULT TRUE,
  valid_from   DATE,
  valid_to     DATE
);

2. Clustering Keys

Clustering ensures related data lands in the same micro-partitions, reducing scan time.

-- Time-series: cluster by date
ALTER TABLE fact_orders CLUSTER BY (order_date_id);

-- Multi-dimension: most filtered column first
ALTER TABLE fact_events CLUSTER BY (event_date, event_type);

-- Check clustering depth (optimal: 1-4)
SELECT SYSTEM$CLUSTERING_DEPTH('fact_orders', '(order_date_id)');

3. Materialized Views

CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT
  d.year,
  d.month,
  c.country,
  COUNT(DISTINCT o.customer_id) AS unique_customers,
  SUM(o.total_amount)           AS revenue,
  AVG(o.total_amount)           AS avg_order_value
FROM fact_orders o
JOIN dim_date d     ON o.order_date_id = d.date_id
JOIN dim_customer c ON o.customer_id = c.customer_id
GROUP BY 1, 2, 3;

4. Virtual Warehouse Sizing

| Size | Credits/Hour | Use Case | |------|-------------|----------| | XS | 1 | Dev, small queries | | S | 2 | Light production | | M | 4 | Standard analytics | | L | 8 | Complex joins, large scans | | XL | 16 | ETL, data loading |

CREATE WAREHOUSE analytics_wh
  WAREHOUSE_SIZE = MEDIUM
  AUTO_SUSPEND = 300
  AUTO_RESUME = TRUE
  MAX_CLUSTER_COUNT = 3
  SCALING_POLICY = ECONOMY;

5. Cost Optimization

-- Zero-copy cloning — no storage cost for unchanged data
CREATE TABLE orders_test CLONE fact_orders;

-- Time travel for debugging
SELECT * FROM fact_orders AT (OFFSET => -7200);  -- 2 hours ago

-- Check query cache hit rate
SELECT query_text, partitions_scanned
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE partitions_scanned = 0  -- Cache hit = zero credits
ORDER BY start_time DESC LIMIT 20;

6. Bulk Data Loading

COPY INTO fact_orders
FROM @s3_stage/orders/2026/03/
FILE_FORMAT = (
  TYPE = PARQUET
  SNAPPY_COMPRESSION = TRUE
)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
ON_ERROR = SKIP_FILE_10%;

Conclusion

Effective Snowflake usage: star schemas, clustering keys based on filter columns, materialized views for repeated aggregations, right-sized warehouses per workload. Ventra Rocket typically achieves 60-80% credit reduction after applying these optimizations.

snowflake

Data Pipeline Best Practices with Snowflake

A comprehensive guide to building high-performance ETL pipelines on Snowflake — from Medallion architecture and Streams to cost control and monitoring.

SnowflakeETLData Pipeline

15 March 2026·8 min read