Duration: Hours

Training Mode: Online

Enquiry

    Category: Tag:

    Description

    PySpark-python API for Apache Spark. It enables you to perform real-time, large-scale data processing in a distributed environment using Python. PySpark helps you interface with Resilient Distributed Datasets (RDDs) in Apache Spark and Python programming language.

    Course Content

    1-Operating System

    • Intro to Operating System
    • Important Unix Commands

    2-Python

    • Main constructs of any programming language: Sequence- Condition -Loop
    • Working with Python packages- types of packages
    • Importing and installing Packages
    • Searching for python packages
    • IDE Familiarity – Spyder/Pycharm/Jupyter Notebook
    • Python Operators including bitwise operators
    • Variables & Types
    • Conditional statements – If else
    • Loops
    • Working with strings and arrays
    • Functions
    • Data Libraries (Numpy, Pandas)

    3-RDBMS

    • Database Architecture
    • Data modelling
    • Relational Database concepts
    • Database design and schema
    • DDL – Create, Alter, Drop Databases
    • DML – Load and Query Data

    4-Data warehousing

    • Overview of Data Warehousing
    • Concepts and architecture of Data Warehouses

    5-Big Data Concepts

    • Introduction to Big Data
    • Distributed computing and Hadoop Architecture

    6-Storage

    • Storing data on Hadoop – HDFS

    7-PySpark

    • Spark Architecture
    • Spark Session
    • Spark Language API’s
    • Data Frame and Partitions
    • Transformations & Actions
    • Structured API’s (PySpark-python API)
    • Schema Spark
    • Types Structured
    • API Execution
    • Operation on Data Frames
    • Working with Different Data Types
    • Aggregations in Spark
    • Joins in Spark
    • RDD and RDD Operations, DAG

    8-PySpark Streaming

    • PySpark Streaming
    • Structured Streaming

    Reviews

    There are no reviews yet.

    Be the first to review “PySpark – Python API”

    Your email address will not be published. Required fields are marked *