Duration: Hours

Pandas is a game-changer for data science and analytics, it uses fast, flexible, and expressive data structures to make working with relational data. Although it wasn’t the first primary programming language, its popularity has grown throughout the years.

Training Mode: Online

Description

Objectives :

1. Bring your Data Handling & Data Analysis skills to an outstanding level.
2. Learn and practice all relevant Pandas methods and workflows with Real-World Datasets
3. Learn Pandas based on NEW Version 1.x (the days of versions 0.x are over)
4. Import, clean, and merge messy Data and prepare Data for Machine Learning
5. Master a complete Machine Learning Project A-Z with Pandas, Scikit-Learn, and Seaborn
6. Analyze, visualize, and understand your Data with Pandas, Matplotlib, and Seaborn
7. Practice and master your Pandas skills with Quizzes, 150+ Exercises, and Comprehensive Projects
8. Import Financial/Stock Data from Web Sources and analyze them with Pandas
9. Learn and master the most important Pandas workflows for Finance
10. Learn how to best transition from Versions 0.x to new Version 1.x
11. Learn the Basics of Pandas and Numpy Coding (Appendix)
12. Learn and master important Statistical Concepts with scipy

Overview

a). Installation of Anaconda

b). Opening a Jupyter Notebook

c). How to use Jupyter Notebooks

d). How to tackle Pandas Version 1.0

Part 1 : Pandas from Zero to Hero (Building blocks)

  1. Introduction to Tabular Data / Pandas.

a). Pandas Basics (DataFrame Basic 1)

  1. Create your very first Pandas DataFrame (from csv)
  2. Pandas Display Options and the methods head() & tail()
  3. First Data Inspection
  4. Built-in Functions, Attributes and Methods with Pandas
  5. Make it easy: TAB Completion and Tooltip
  6. Selecting Columns
  7. Selecting one Column with the “dot notation”
  8. Zero-based Indexing and Negative Indexing
  9. Selecting Rows with iloc (position-based indexing)
  10. Slicing Rows and Columns with iloc (position-based indexing)
  11. Selecting Rows with loc (label-based indexing)
  12. Slicing Rows and Columns with loc (label-based indexing)
  13. Indexing and Slicing with reindex()

b). Pandas Service and Index Objects

  1. Intro
  2. First Steps with Pandas Series
  3. Analyzing Numerical Series with unique(), nunique() and value_counts()
  4. Analyzing non-numerical Series with unique(), nunique(), value_counts()
  5. Creating Pandas Series
  6. Indexing and Slicing Pandas Series
  7. Sorting of Series and Introduction to the inplace – parameter
  8. nlargest() and nsmallest()
  9. idxmin() and idxmax()
  10. Manipulating Pandas Series
  11. First Steps with Pandas Index Objects
  12. Creating Index Objects from Scratch
  13. Changing Row Index with set_index() and reset_index()
  14. Changing Column Labels
  15. Renaming Index & Column Labels with rename()

c). DataFrame Basics :

  1. Intro
  2. Filtering DataFrames by one Condition
  3. Filtering DataFrames by many Conditions (AND)
  4. Filtering DataFrames by many Conditions (OR)
  5. Advanced Filtering with between(), isin() and ~
  6. any() and all()
  7. Removing Columns
  8. Removing Rows
  9. Adding new Columns to a DataFrame
  10. Creating Columns based on other Columns
  11. Adding Columns with insert()
  12. Adding new Rows (hands-on approach)

d).  Manipulating elements in a Dataframes / Slice

  1. Intro
  2. View vs. Copy
  3. Simple Rules what to do when..
  4. Manipulating DataFrames / Slices

e). Visualization with Matplotlib

  1. Intro
  2. The plot() method
  3. Customization of Plots
  4. Histograms
  5. Barcharts and Piecharts
  6. Scatterplots

Part 2 : Full Data workflow A – Z

a). Importing Data

  1. Importing csv-files with pd.read_csv
  2. Importing messy csv-files with pd.read_csv
  3. Importing messy Data from Excel with pd.read_excel()
  4. Importing Data from the Web with pd.read_html()

b). Cleaning Data

  1. String Operations
  2. Changing Datatype of Columns with astype()
  3. Intro NA values / missing values
  4. Detection of missing Values
  5. Removing missing values
  6. Replacing missing values
  7. Intro Duplicates
  8. Detection of Duplicates
  9. Handling / Removing Duplicates
  10. The ignore_index parameter (NEW in Pandas 1.0)
  11. Detection of Outliers
  12. Handling / Removing Outliers
  13. Categorical Data
  14. Pandas Version 1.0: New dtypes and pd.NA

c). Merging, Joining and Concatenating Data

  1. Intro
  2. Adding Rows with append() and pd.concat
  3. Arithmetic with Pandas Objects / Data Alignment
  4. Inner Joins with merge()
  5. Outer Joins (without Intersection) with merge()
  6. Left Joins (without Intersection) with merge()
  7. Right Joins (without Intersection) with merge()
  8. Left Joins with merge()
  9. Right Joins with merge()
  10. Joining on different Column Names / Indexes
  11. Joining on more than one Column
  12. pd.merge() and join()

d). GroupBy Operations

  1. Understanding the GroupBy Object
  2. Splitting with many Keys
  3. split-apply-combine explained
  4. split-apply-combine applied
  5. Advanced aggregation with agg()
  6. GroupBy Aggregation with Relabeling (NEW – Pandas Version 0.25)
  7. Transformation with transform()
  8. Replacing NA Values by group-specific Values
  9. Generalizing split-apply-combine with apply()
  10. Hierarchical Indexing with Groupby
  11. stack() and unstack()

e). Reshaping and pivoting DataFrames

  1. Transposing Rows and Columns
  2. Pivoting DataFrames with pivot()
  3. Limits of pivot()
  4. pivot_table()
  5. pd.crosstab()
  6. melting DataFrames with melt()

Part 3 : Comprehensive Project Challenges

a). Explanatory Data Analysis Challenges

  1. Merging and Concatenating
  2. Data Cleaning 1
  3. Impact of GDP, Population and Politics
  4. Statistical Analysis and Hypothesis Testing
  5. Aggregating and Ranking
  6. Summer Games vs. Winter Games – does Location matter?

Part 4 : Pandas for Finance, Investing & Time series

a). Time Series Basics

  1. Converting strings to datetime objects with pd.to_datetime()
  2. Initial Analysis / Visualization of Time Series
  3. Indexing and Slicing Time Series
  4. Creating a customized DatetimeIndex with pd.date_range()
  5. More on pd.date_range()
  6. Downsampling Time Series with resample() (Part 1)
  7. Downsampling Time Series with resample (Part 2)
  8. The PeriodIndex object
  9. Advanced Indexing with reindex()

b). Pandas for Finance and Investing

  1. Getting Ready (Installing required package)
  2. Importing Stock Price Data from Yahoo Finance (it still works!)
  3. Initial Inspection and Visualization
  4. Normalizing Time Series to a Base Value (100)
  5. The shift() method
  6. The methods diff() and pct_change()
  7. Financial Time Series – Return and Risk
  8. Financial Time Series – Covariance and Correlation
  9. Helpful DatetimeIndex Attributes and Methods
  10. Filling NA Values with bfill, ffill and interpolation

Part 5 : Machine Learning with Pandas and Scikit – Learn

a). Introduction to Regression and Classifiation 

  1. Machine Learning – an Overview
  2. Linear Regression with scikit-learn – a simple Introduction
  3. Making Predictions with Linear Regression
  4. Overfitting
  5. Underfitting

b). What’s new in panda version 1.0 ?

  1. Intro and Overview
  2. How to update Pandas to Version 1.0
  3. Downloads for this Section
  4. Important Recap: Pandas Display Options (Changed in Version 0.25)
  5. Info() method – new and extended output
  6. NEW Extension dtypes (“nullable” dtypes): Why do we need them?
  7. Creating the NEW extension dtypes with convert_dtypes()
  8. NEW pd.NA value for missing values
  9. The NEW “nullable” Int64Dtype
  10. The NEW StringDtype
  11. The NEW “nullable” BooleanDtype
  12. Addition of the ignore_index parameter
  13. Removal of prior Version Deprecations

c). The NumPy Page3

  1. Introduction to Numpy Arrays
  2. Numpy Arrays: Vectorization
  3. Numpy Arrays: Indexing and Slicing
  4. Numpy Arrays: Shape and Dimensions
  5. Numpy Arrays: Indexing and Slicing of multi-dimensional Arrays
  6. Numpy Arrays: Boolean Indexing
  7. Generating Random Numbers
  8. Performance Issues
  9. Case Study: Numpy vs. Python Standard Library
  10. Summary Statistics
  11. Visualization and (Linear) Regression

Requirements :

a). A desktop computer (Windows, Mac, or Linux) capable of storing and running Anaconda. The course will walk you through installing the necessary free software.
b). An internet connection capable of streaming videos.
c). Ideally some Spreadsheet Basics/Programming Basics (not mandatory, the course guides you through the basics)
For more inputs on Complete Pandas Bootcamp 2022 : Data Science with Python you can connect here.
Contact the L&D Specialist at Locus IT.

Reviews

There are no reviews yet.

Be the first to review “Pandas Bootcamp 2022 : Data Science with Python”

Your email address will not be published.