SnowPro® Advanced Data Scientist

Description

SnowPro® Advanced: Data Scientist Overview

The SnowPro® Advanced: Data Scientist Certification will test advanced knowledge and skills used to apply comprehensive data science principles, tools, and methodologies using Snowflake. This certification will test the ability to:

Outline data science concepts
Implement Snowflake data science best practices
Prepare data and feature engineering in Snowflake
Train and use machine learning models
Use data visualization to present a business case (e.g., model explainability)
Implement model lifecycle management

SnowPro® Advanced: Data Scientist Candidate

2+ years of practical data science experience with Snowflake, in an enterprise environment. In addition, successful candidates may have:

A statistical, mathematical, or science education (or equivalent work experience)
Background working with one or more of the following programming languages (e.g., Python, R, SQL, PySpark, etc.)
Experience modeling and using machine learning platforms (e.g., SageMaker, Azure Machine Learning, GCP AI platform, AutoML tools, etc.)
An understanding of various open source and commercial frameworks and libraries (e.g., scikit-learn, TensorFlow, etc. )
Experience preparing, cleaning, and transforming data sets from multiple sources
Experience creating features for machine learning training
Experience validating and interpreting models
Experience putting a model into production and monitoring the model in production
Experience presenting data using visualization tools

Target Audience:

Data Scientists
AI/ML Engineers

Exam Format:

Exam Version: DSA-C02
Total Number of Questions: 65
Question Types: Multiple Select, Multiple Choice
Time Limit: 115 minutes
Language: English
Registration fee: $375 USD
Passing Score: 750 + Scaled Scoring from 0 – 1000

Unscored Content:

Exams may include unscored items to gather statistical information for future use. These items are not identified on the form and do not impact your score, and additional time is factored into account for this content.

Prerequisites:

SnowPro Core Certified

Delivery Options:

Online Proctoring
Onsite Testing Centers

Exam Domain Breakdown:

This exam guide includes test domains, weightings, and objectives. It is not a comprehensive listing of all the content that will be presented on this examination. The table below lists the main content domains and their weightings.

Domain	Domain Weightings on Exams
1.0 Data Science Concepts	15-20%
2.0 Data Pipelining	15-20%
3.0 Data Preparation and Feature Engineering	30-35%
4.0 Model Development	15-20%
5.0 Model Deployment	15-20%

Exam Topics:

Outlined below are the Domains & Objectives measured on the exam. To view subtopics, download the exam study guide.

Data Science Concepts

Define machine learning concepts for data science workloads.
Outline machine learning problem types.
Summarize the machine learning lifecycle.
Define statistical concepts for data science.

Data Pipelining

Enrich data by consuming data sharing sources.
Build a data science pipeline.

Data Preparation and Feature Engineering

Prepare and clean data in Snowflake.
Perform exploratory data analysis in Snowflake.
Perform feature engineering on Snowflake data.
Visualize and interpret the data to present a business case.

Model Development

Connect data science tools directly to data in Snowflake.
Train a data science model.
Validate a data science model.
Interpret a model.

Model Deployment

Move a data science model into production.
Determine the effectiveness of a model and retrain if necessary.
Outline model lifecycle and validation tools

– Define machine learning concepts for data science workloads.

Machine Learning

Supervised learning
Unsupervised learning

– Outline machine learning problem types.

Supervised Learning

Structured Data
Linear regression
Binary classification
Multi-class classification
Time-series forecasting
Unstructured Data
Image classification
Segmentation

Unsupervised Learning

Clustering
Association models

Data Science Concepts

– Summarize the machine learning lifecycle.

Data collection
Data visualization and exploration
Feature engineering
Training models
Model deployment
Model monitoring and evaluation (e.g., model explainability, precision, recall, accuracy, confusion matrix)
Model versioning

– Define statistical concepts for data science.

Normal versus skewed distributions (e.g., mean, outliers)
Central limit theorem
Z and T tests
Bootstrapping
Confidence intervals

– Enrich data by consuming data sharing sources.

Snowflake Marketplace
Direct Sharing
Shared database considerations

– Build a data science pipeline.

Data Pipelining

Automation of data transformation with streams and tasks
Python User-Defined Functions (UDFs)
Python User-Defined Table Functions (UDTFs)
Python stored procedures
Integration with machine learning platforms (e.g., connectors, ML partners, etc.)

– Prepare and clean data in Snowflake.

Use Snowpark for Python and SQL

Aggregate
Joins
Identify critical data
Remove duplicates
Remove irrelevant fields
Handle missing values
Data type casting
Sampling data
Perform exploratory data analysis in Snowflake.

Snowpark and SQL

Identify initial patterns (i.e., data profiling)
Connect external machine learning platforms and/or notebooks (e.g., Jupyter)

Use Snowflake native statistical functions to analyze and calculate descriptive data statistics.

Window Functions
MIN/MAX/AVG/STDEV
VARIANCE
TOPn
Approximation/High Performing function

Linear Regression

Find the slope and intercept
Verify the dependencies on dependent and independent variables

Data Preparation and Feature Engineering

– Perform feature engineering on Snowflake data.

Preprocessing

Scaling data
Encoding
Normalization

Data Transformations

Data Frames (i.e, Pandas, Snowpark)
Derived features (e.g., average spend)

Binarizing data

Binning continuous data into intervals
Label encoding
One hot encoding
Visualize and interpret the data to present a business case.

Statistical summaries

Snowsight with SQL
Streamlit
Interpret open-source graph libraries
Identify data outliers

Common types of visualization formats

Bar charts
Scatterplots
Heat maps

– Connect data science tools directly to data in Snowflake.

Connecting Python to Snowflake
Snowpark
Python connector with Pandas support
Spark connector

Snowflake Best Practices

One platform, one copy of data, many workloads
Enrich datasets using the Snowflake Marketplace
External tables
External functions
Zero-copy cloning for training snapshots
Data governance

– Train a data science model.

Hyperparameter tuning
Optimization metric selection (e.g., log loss, AUC, RMSE)
Partitioning
1. Cross validation
2. Train validation hold-out
Down/Up-sampling
Training with Python stored procedures
Training outside Snowflake through external functions
Training with Python User-Defined Table Functions (UDTFs)

Model Development

– Validate a data science model.

ROC curve/confusion matrix
1. Calculate the expected payout of the model
Regression problems
Residuals plot
1. Interpret graphics with context
Model metrics

– Interpret a model.

Feature impact
Partial dependence plots
Confidence intervals

– Move a data science model into production.

Use an external hosted model
1. External functions
2. Pre-built models
Deploy a model in Snowflake
1. Vectorized/Scalar Python User Defined Functions (UDFs)
2. Pre-built models
3. Storing predictions
4. Stage commands

Model Deployment

– Determine the effectiveness of a model and retrain if necessary.

1) Metrics for model evaluation

Data drift /Model decay

– Data distribution comparisons

Do the data-making predictions look similar to the training data?
Do the same data points give the same predictions once a model is deployed?

2) Area under the curve

3) Accuracy, precision, recall

4) User-defined functions (UDFs)

– Outline model lifecycle and validation tools.

Streams and tasks
Metadata tagging
Model versioning with partner tools
Automation of model retraining

For more information on SnowPro® Advanced Data Scientist; please visit here.

Contact Locus IT support team for further training details.

Reviews

There are no reviews yet.

Be the first to review “SnowPro® Advanced: Data Scientist training”

SnowPro® Advanced: Data Scientist training

Enquiry

Training Mode: Online

Description

SnowPro® Advanced: Data Scientist Overview

SnowPro® Advanced: Data Scientist Candidate

Target Audience:

Exam Format:

Unscored Content:

Prerequisites:

Delivery Options:

Exam Domain Breakdown:

Exam Topics:

Data Science Concepts

Data Pipelining

Data Preparation and Feature Engineering

Model Development

Model Deployment

TABLE OF CONTENTS

Data Science Concepts

Data Pipelining

Data Preparation and Feature Engineering

Model Development

Model Deployment

Reviews

Enquiry

SnowPro® Advanced: Data Scientist training

Enquiry

Training Mode: Online

Description

SnowPro® Advanced: Data Scientist Overview

SnowPro® Advanced: Data Scientist Candidate

Target Audience:

Exam Format:

Unscored Content:

Prerequisites:

Delivery Options:

Exam Domain Breakdown:

Exam Topics:

Data Science Concepts

Data Pipelining

Data Preparation and Feature Engineering

Model Development

Model Deployment

TABLE OF CONTENTS

Data Science Concepts

Data Pipelining

Data Preparation and Feature Engineering

Model Development

Model Deployment

Reviews

Enquiry

Related products