Back to Blog
snowflake8 min read

Data Pipeline Best Practices with Snowflake

A comprehensive guide to building high-performance ETL pipelines on Snowflake — from Medallion architecture and Streams to cost control and monitoring.

V
By Ventra Rocket
·Published on 15 March 2026
#Snowflake#ETL#Data Pipeline#Cloud Data Warehouse

Snowflake has fundamentally changed how enterprises build data warehouses. Its separation of compute and storage allows elastic scaling without performance trade-offs. This article consolidates best practices the Ventra Rocket team has refined across multiple enterprise data pipeline deployments.

1. Design Schemas Using the Medallion Architecture

The Medallion pattern (Bronze → Silver → Gold) is the foundation of every modern data pipeline. Rather than writing raw data directly to final tables, separate your data into three clearly defined layers.

Bronze layer — Raw, unprocessed data preserved exactly as received from the source. This is your "source of truth" for replay when needed.

Silver layer — Cleaned, schema-normalised data with duplicates removed. Apply basic business rules at this layer.

Gold layer — Aggregated data ready for BI tools and reporting, optimised for query performance.

-- Create databases per Medallion layer
CREATE DATABASE IF NOT EXISTS bronze_db;
CREATE DATABASE IF NOT EXISTS silver_db;
CREATE DATABASE IF NOT EXISTS gold_db;

-- Bronze: raw ingest from S3
CREATE TABLE bronze_db.sales.orders_raw (
  raw_data VARIANT,
  ingested_at TIMESTAMP_LTZ DEFAULT CURRENT_TIMESTAMP(),
  source_file STRING
);

2. Leverage Time Travel and Fail-safe

Snowflake provides Time Travel up to 90 days (Enterprise edition), enabling recovery of accidentally deleted or modified data. Combined with the 7-day Fail-safe window, this is a powerful protection layer.

-- Query data from 2 days ago
SELECT * FROM silver_db.sales.orders
AT (OFFSET => -60*60*24*2);

-- Recover a dropped table
UNDROP TABLE silver_db.sales.orders;

-- Zero-copy clone for testing without extra storage cost
CREATE TABLE silver_db.sales.orders_backup
CLONE silver_db.sales.orders;

3. Optimise Virtual Warehouses

Compute is the largest cost driver in Snowflake. Key optimisation principles:

Auto-suspend and Auto-resume

Always enable AUTO_SUSPEND with appropriate timing. For batch pipelines, 60 seconds is sufficient. For BI dashboards requiring fast response, 300 seconds works better.

CREATE WAREHOUSE etl_wh
  WAREHOUSE_SIZE = 'MEDIUM'
  AUTO_SUSPEND = 60
  AUTO_RESUME = TRUE
  MAX_CLUSTER_COUNT = 3  -- Multi-cluster for concurrent load
  MIN_CLUSTER_COUNT = 1;

Choose the Right Warehouse Size

Bigger is not always faster. For queries that scan large data with few joins, LARGE outperforms MEDIUM. For complex multi-join queries, XLARGE starts delivering real benefits.

4. Incremental Load with Streams and Tasks

Instead of full table reloads, use Snowflake Streams to capture only changed data (CDC).

-- Create a stream tracking changes on the bronze table
CREATE STREAM bronze_db.sales.orders_stream
  ON TABLE bronze_db.sales.orders_raw;

-- Task that processes every 5 minutes automatically
CREATE TASK silver_db.etl.process_orders
  WAREHOUSE = 'etl_wh'
  SCHEDULE = '5 minute'
AS
INSERT INTO silver_db.sales.orders
SELECT
  raw_data:order_id::STRING AS order_id,
  raw_data:customer_id::STRING AS customer_id,
  raw_data:amount::DECIMAL(18,2) AS amount,
  raw_data:created_at::TIMESTAMP_LTZ AS created_at
FROM bronze_db.sales.orders_stream
WHERE METADATA$ACTION = 'INSERT';

5. Monitoring and Alerting

Set up proactive monitoring to detect issues before they impact the business.

-- Find pipes experiencing delays
SELECT pipe_name, last_ingested_time,
       DATEDIFF('minute', last_ingested_time, CURRENT_TIMESTAMP()) AS delay_minutes
FROM TABLE(INFORMATION_SCHEMA.PIPE_USAGE_HISTORY())
WHERE delay_minutes > 30
ORDER BY delay_minutes DESC;

Combine Snowflake alerts with PagerDuty or Slack webhooks for immediate notification when pipelines fail.

6. Cost Management with Resource Monitors

CREATE RESOURCE MONITOR monthly_budget
  CREDIT_QUOTA = 1000
  FREQUENCY = MONTHLY
  START_TIMESTAMP = IMMEDIATELY
  TRIGGERS
    ON 75 PERCENT DO NOTIFY
    ON 90 PERCENT DO NOTIFY
    ON 100 PERCENT DO SUSPEND;

ALTER WAREHOUSE etl_wh
  SET RESOURCE_MONITOR = monthly_budget;

Conclusion

Building quality data pipelines on Snowflake is more than writing SQL. It requires a sound architecture (Medallion), leveraging platform features (Streams, Tasks, Time Travel), and proactive operations (monitoring, cost control). Ventra Rocket is ready to consult and implement enterprise data solutions tailored to your needs.

Related Articles

Data Pipeline Best Practices with Snowflake | Ventra Rocket