SnowPro® Advanced: Data Scientist training

Duration: Hours

Enquiry


    Category:

    Training Mode: Online

    Description

    SnowPro® Advanced: Data Scientist Overview

    The SnowPro® Advanced: Data Scientist Certification will test advanced knowledge and skills used to apply comprehensive data science principles, tools, and methodologies using Snowflake. This certification will test the ability to:

    1. Outline data science concepts
    2. Implement Snowflake data science best practices
    3. Prepare data and feature engineering in Snowflake
    4. Train and use machine learning models
    5. Use data visualization to present a business case (e.g., model explainability)
    6. Implement model lifecycle management

    SnowPro® Advanced: Data Scientist Candidate

    2+ years of practical data science experience with Snowflake, in an enterprise environment. In addition, successful candidates may have:

    1. A statistical, mathematical, or science education (or equivalent work experience)
    2. Background working with one or more of the following programming languages (e.g., Python, R, SQL, PySpark, etc.)
    3. Experience modeling and using machine learning platforms (e.g., SageMaker, Azure Machine Learning, GCP AI platform, AutoML tools, etc.)
    4. An understanding of various open source and commercial frameworks and libraries (e.g., scikit-learn, TensorFlow, etc. )
    5. Experience preparing, cleaning, and transforming data sets from multiple sources
    6. Experience creating features for machine learning training
    7. Experience validating and interpreting models
    8. Experience putting a model into production and monitoring the model in production
    9. Experience presenting data using visualization tools

    Target Audience:

    1. Data Scientists
    2. AI/ML Engineers

    Exam Format:

    1. Exam Version: DSA-C02
    2. Total Number of Questions: 65
    3. Question Types: Multiple Select, Multiple Choice
    4. Time Limit: 115 minutes
    5. Language: English
    6. Registration fee: $375 USD
    7. Passing Score: 750 + Scaled Scoring from 0 – 1000

    Unscored Content:

    Exams may include unscored items to gather statistical information for future use. These items are not identified on the form and do not impact your score, and additional time is factored into account for this content.

    Prerequisites:

    SnowPro Core Certified

    Delivery Options:

    1. Online Proctoring
    2. Onsite Testing Centers

    Exam Domain Breakdown:

    This exam guide includes test domains, weightings, and objectives. It is not a comprehensive listing of all the content that will be presented on this examination. The table below lists the main content domains and their weightings.

    Domain Domain Weightings on Exams
    1.0 Data Science Concepts 15-20%
    2.0 Data Pipelining 15-20%
    3.0 Data Preparation and Feature Engineering 30-35%
    4.0 Model Development 15-20%
    5.0 Model Deployment 15-20%

    Exam Topics:

    Outlined below are the Domains & Objectives measured on the exam. To view subtopics, download the exam study guide.

    Data Science Concepts

    1. Define machine learning concepts for data science workloads.
    2. Outline machine learning problem types.
    3. Summarize the machine learning lifecycle.
    4. Define statistical concepts for data science.

    Data Pipelining

    1. Enrich data by consuming data sharing sources.
    2. Build a data science pipeline.

    Data Preparation and Feature Engineering

    1. Prepare and clean data in Snowflake.
    2. Perform exploratory data analysis in Snowflake.
    3. Perform feature engineering on Snowflake data.
    4. Visualize and interpret the data to present a business case.

    Model Development

    1. Connect data science tools directly to data in Snowflake.
    2. Train a data science model.
    3. Validate a data science model.
    4. Interpret a model.

    Model Deployment

    1. Move a data science model into production.
    2. Determine the effectiveness of a model and retrain if necessary.
    3. Outline model lifecycle and validation tools

    TABLE OF CONTENTS

    – Define machine learning concepts for data science workloads.

    Machine Learning

    1. Supervised learning
    2. Unsupervised learning

    – Outline machine learning problem types.

    Supervised Learning

    1. Structured Data
    2. Linear regression
    3. Binary classification
    4. Multi-class classification
    5. Time-series forecasting
    6. Unstructured Data
    7. Image classification
    8. Segmentation

    Unsupervised Learning

    1. Clustering
    2. Association models

    Data Science Concepts

    – Summarize the machine learning lifecycle.

    1. Data collection
    2. Data visualization and exploration
    3. Feature engineering
    4. Training models
    5. Model deployment
    6. Model monitoring and evaluation (e.g., model explainability, precision, recall, accuracy, confusion matrix)
    7. Model versioning

    – Define statistical concepts for data science.

    1. Normal versus skewed distributions (e.g., mean, outliers)
    2. Central limit theorem
    3. Z and T tests
    4. Bootstrapping
    5. Confidence intervals

    – Enrich data by consuming data sharing sources.

    1. Snowflake Marketplace
    2. Direct Sharing
    3. Shared database considerations

    – Build a data science pipeline.

    Data Pipelining

    1. Automation of data transformation with streams and tasks
    2. Python User-Defined Functions (UDFs)
    3. Python User-Defined Table Functions (UDTFs)
    4. Python stored procedures
    5. Integration with machine learning platforms (e.g., connectors, ML partners, etc.)

    – Prepare and clean data in Snowflake.

    Use Snowpark for Python and SQL

    1. Aggregate
    2. Joins
    3. Identify critical data
    4. Remove duplicates
    5. Remove irrelevant fields
    6. Handle missing values
    7. Data type casting
    8. Sampling data
    9. Perform exploratory data analysis in Snowflake.

    Snowpark and SQL

    1. Identify initial patterns (i.e., data profiling)
    2. Connect external machine learning platforms and/or notebooks (e.g., Jupyter)

    Use Snowflake native statistical functions to analyze and calculate descriptive data statistics.

    1. Window Functions
    2. MIN/MAX/AVG/STDEV
    3. VARIANCE
    4. TOPn
    5. Approximation/High Performing function

    Linear Regression

    1. Find the slope and intercept
    2. Verify the dependencies on dependent and independent variables

    Data Preparation and Feature Engineering

    – Perform feature engineering on Snowflake data.

    Preprocessing

    1. Scaling data
    2. Encoding
    3. Normalization

    Data Transformations

    1. Data Frames (i.e, Pandas, Snowpark)
    2. Derived features (e.g., average spend)

    Binarizing data

    1. Binning continuous data into intervals
    2. Label encoding
    3. One hot encoding
    4. Visualize and interpret the data to present a business case.

    Statistical summaries

    1. Snowsight with SQL
    2. Streamlit
    3. Interpret open-source graph libraries
    4. Identify data outliers

    Common types of visualization formats

    1. Bar charts
    2. Scatterplots
    3. Heat maps

    – Connect data science tools directly to data in Snowflake.

    1. Connecting Python to Snowflake
    2. Snowpark
    3. Python connector with Pandas support
    4. Spark connector

    Snowflake Best Practices

    1. One platform, one copy of data, many workloads
    2. Enrich datasets using the Snowflake Marketplace
    3. External tables
    4. External functions
    5. Zero-copy cloning for training snapshots
    6. Data governance

    – Train a data science model.

    1. Hyperparameter tuning
    2. Optimization metric selection (e.g., log loss, AUC, RMSE)
    3. Partitioning
      1. Cross validation
      2. Train validation hold-out
    4. Down/Up-sampling
    5. Training with Python stored procedures
    6. Training outside Snowflake through external functions
    7. Training with Python User-Defined Table Functions (UDTFs)

    Model Development

    – Validate a data science model.

    1. ROC curve/confusion matrix
      1. Calculate the expected payout of the model
    2. Regression problems
    3. Residuals plot
      1. Interpret graphics with context
    4. Model metrics

    – Interpret a model.

    1. Feature impact
    2. Partial dependence plots
    3. Confidence intervals

    – Move a data science model into production.

    1. Use an external hosted model
      1. External functions
      2. Pre-built models
    2. Deploy a model in Snowflake
      1. Vectorized/Scalar Python User Defined Functions (UDFs)
      2. Pre-built models
      3. Storing predictions
      4. Stage commands

    Model Deployment

    – Determine the effectiveness of a model and retrain if necessary.

    1) Metrics for model evaluation

    1. Data drift /Model decay

    – Data distribution comparisons

    1. Do the data-making predictions look similar to the training data?
    2. Do the same data points give the same predictions once a model is deployed?

    2) Area under the curve

    3) Accuracy, precision, recall

    4) User-defined functions (UDFs)

    – Outline model lifecycle and validation tools.

    1. Streams and tasks
    2. Metadata tagging
    3. Model versioning with partner tools
    4. Automation of model retraining

    For more information on SnowPro® Advanced Data Scientist; please visit here.

    Contact Locus IT support team for further training details.

     

    Reviews

    There are no reviews yet.

    Be the first to review “SnowPro® Advanced: Data Scientist training”

    Your email address will not be published. Required fields are marked *

    Enquiry


      Category: