Technology stack
Customer’s challenges
The project addressed several key challenges:
Outdated ML approach with poor accuracy
existing complaint classification process relied on legacy Word2Vec embeddings and LSTM models hosted on the Azure ML service, delivering inadequate accuracy (54% for failure mode classification, 63% for mortality classification).
Manual error correction
low classification accuracy forced quality analysts to manually review and reclassify numerous incorrectly categorized complaints, consuming valuable time and creating bottlenecks in the complaint handling process.
Platform fragmentation
data was stored in Snowflake while ML models ran on Azure ML created unnecessary complexity, data movement overhead, and integration challenges between platforms.
Lack of reusable ML infrastructure and processes
no standardized framework existed for deploying machine learning projects on Snowflake, meaning each new initiative would require building infrastructure from scratch.
Solution
The solution leveraged Snowflake’s native ML capabilities to classify hundreds of complaints daily by two different sets of categories. To achieve business objectives, we implemented an integrated ML solution on Snowflake:
Snowflake feature store-based ML pipeline
a production-ready, programmatic framework for managing ML lifecycle on Snowflake, including feature view creation, training dataset generation, and model deployment with version control.
Modern transformer-based classification
replaced outdated Word2Vec+LSTM approach with microsoft/deberta-v3 transformer model, delivering significant accuracy improvements through state-of-the-art natural language understanding.
Unified platform architecture
consolidated data storage, feature engineering, ML model registry (with model versioning and observability) and inference eliminating cross-platform data movement and reducing architectural complexity.
Reusable python framework
developed component-based deployment system enabling rapid development of future ML projects through standardized patterns for entities, feature views, datasets, models, and stored procedures.
Automated training and inference pipelines
implemented end-to-end automation from feature extraction through model deployment to batch inference via Snowflake stored procedures, reducing manual intervention and ensuring consistent model performance.
Streamlit application for classification reviews
developed a custom Streamlit application running in Snowflake to allow quality analysts effectively review classifications and provide manual corrections quicker.
Benefits
Dramatic classification accuracy improvement
increased classification accuracy from 52% to 94% (+38 percentage points) for failure mode and from 63% to 83% (+20 percentage points) for mortality, significantly reducing manual reclassification burden.
Simplified architecture
unified platform for data, features, and models eliminates cross-platform complexity, reduces data movement costs, and improves maintainability.
Scalable and reusable ML infrastructure
reusable framework enables rapid deployment of future ML projects across the organization without rebuilding infrastructure, accelerating time-to-value for new ML initiatives.
Human-in-the-loop
thanks to a dedicated Streamlit application in Snowflake, quality analysts have full control over complaint classification and can quickly make manual corrections if required.
Discover the possibilities that data platforms offer for your business