My Notes on Feature Store #MLOps #Snowflake #databricks

What does Feature Store do? Gives governed, reusable features
How does it make data discoverable? Exposes machine learning ready data structures
How does it execute data governance? There are entity definitions and lineage
What about scalability? There are reusable Feature Views
Feature store is a centralized repository designed to manage and share features efficiently across teams. databricks What is Feature Store?. Reduce redundancy, enhance collaboration, efficient data management. versioning, monitoring, governance.
Feature engineering: scaling, encoding categorical variables, creating interaction terms. Identify most relevant variables and transform data to enhance model accuracy.
As projects scale, feature stores become invaluable for managing and reusing features.
dbt is an example of an Automated Feature Engineering Platform
The Snowflake Feature Store has entities and feature views. Machine learning ready datasets are built off these feature views. Good Link: https://www.snowflake.com/en/developers/guides/end-to-end-ml-workflow/

Feature Views: traditional views vs Snowflake dynamic table

But that link just describes the differences. What about specific to Feature Store? Here is the documentation you want. This is the “Developer > Snowflake ML > Manage and serve features > Working with feature views” section of Snowflake documentation. By definition there are “two different kinds of feature views. Snowflake-managed feature view and External feature view.” The Snowflake-managed feature views are refreshed via a schedule that is specified at the time of feature view creation. External feature views have no schedule for refresh.

Therefore it appears that a Snowflake-managed feature view is like a dynamic table (ie Snowflake keeps it up to date and materializes it) and an external feature view may be more like a traditional view (ie you are pointing to existing data). As mentioned, one way to create an external feature view is to pass a DataFrame to the FeatureView class with no refresh schedule. The SQL query defined by that DataFrame will be run when the feature view is needed. Nothing is precomputed if this is your set up. This would not be ideal for real-time inference. But would work well in small-scale workloads or when low-latency is not important. If you have a large dataset, have a lot of transformations, or need real time inference, you want your data precomputed and available. Use a Snowflake-managed feature view or external feature view maintained by another tool like dbt in these situations. Ideally you want to use precomputed features, whether precomputed by Snowflake or another tool.

Datasets,
Point of Time Lookups,
ASOF Joins

Dataset object is versions and immutable

generate_dataset() - use for TensorFlow or PyTorch
generate_training_set() - use for classical machine learning libraries

Note: Sublime Text has a built in spell check, under View > Spell Check