Advanced Data Analytics with PySpark Training (DSC115)
Course Length: 2 days
Delivery Methods:
Available as private class only
Course Overview
This Advanced Data Analytics with PySpark Training training class is for business analysts who want a scalable platform for solving SQL-centric problems.
Course Benefits
- Quick Intro to Spark / PySpark
- Applying Spark SQL / DataFrames to problems that lend themselves to being solved using SQL and Pivot tables
- Exploratory Data Analysis (EDA)-visual analysis using graphs
Course Outline
- Introduction to Apache Spark
- What is Apache Spark
- The Spark Platform
- Spark vs Hadoop's MapReduce (MR)
- Common Spark Use Cases
- Languages Supported by Spark
- Running Spark on a Cluster
- The Spark Application Architecture
- The Driver Process
- The Executor and Worker Processes
- Spark Shell
- Jupyter Notebook Shell Environment
- Spark Applications
- The spark-submit Tool
- The spark-submit Tool Configuration
- Interfaces with Data Storage Systems
- Project Tungsten
- The Resilient Distributed Dataset (RDD)
- Datasets and DataFrames
- Spark SQL, DataFrames, and Catalyst Optimizer
- Spark Machine Learning Library
- GraphX
- Extending Spark Environment with Custom Modules and Files
- Summary
- The Spark Shell
- The Spark Shell
- The Spark v.2 + Command-Line Shells
- The Spark Shell UI
- Spark Shell Options
- Getting Help
- Jupyter Notebook Shell Environment
- Example of a Jupyter Notebook Web UI (Databricks Cloud)
- The Spark Context (sc) and Spark Session (spark)
- Creating a Spark Session Object in Spark Applications
- The Shell Spark Context Object (sc)
- The Shell Spark Session Object (spark)
- Loading Files
- Saving Files
- Summary
- Introduction to Spark SQL
- What is Spark SQL?
- Uniform Data Access with Spark SQL
- Hive Integration
- Hive Interface
- Integration with BI Tools
- What is a DataFrame?
- Creating a DataFrame in PySpark
- Commonly Used DataFrame Methods and Properties in PySpark
- Grouping and Aggregation in PySpark
- The "DataFrame to RDD" Bridge in PySpark
- The SQLContext Object
- Examples of Spark SQL / DataFrame (PySpark Example)
- Converting an RDD to a DataFrame Example
- Example of Reading / Writing a JSON File
- Using JDBC Sources
- JDBC Connection Example
- Performance, Scalability, and Fault-tolerance of Spark SQL
- Summary
- Practical Introduction to Pandas
- What is pandas?
- The Series Object
- Accessing Values and Indexes in Series
- Setting Up Your Own Index
- Using the Series Index as a Lookup Key
- Can I Pack a Python Dictionary into a Series?
- The DataFrame Object
- The DataFrame's Value Proposition
- Creating a pandas DataFrame
- Getting DataFrame Metrics
- Accessing DataFrame Columns
- Accessing DataFrame Rows
- Accessing DataFrame Cells
- Using iloc
- Using loc
- Examples of Using loc
- DataFrames are Mutable via Object Reference!
- Deleting Rows and Columns
- Adding a New Column to a DataFrame
- Appending / Concatenating DataFrame and Series Objects
- Example of Appending / Concatenating DataFrames
- Re-indexing Series and DataFrames
- Getting Descriptive Statistics of DataFrame Columns
- Getting Descriptive Statistics of DataFrames
- Applying a Function
- Sorting DataFrames
- Reading From CSV Files
- Writing to the System Clipboard
- Writing to a CSV File
- Fine-Tuning the Column Data Types
- Changing the Type of a Column
- What May Go Wrong with Type Conversion
- Summary
- Data Visualization with seaborn in Python
- Data Visualization
- Data Visualization in Python
- Matplotlib
- Getting Started with matplotlib
- Figures
- Saving Figures to a File
- Seaborn
- Getting Started with seaborn
- Histograms and KDE
- Plotting Bivariate Distributions
- Scatter plots in seaborn
- Pair plots in seaborn
- Heatmaps
- Summary
Class Materials
Each student will receive a comprehensive set of materials, including course notes and all the class examples.
Class Prerequisites
Experience in the following is required for this Python class:
- Knowledge of SQL.
- Familiarity with Python (or the ability to learn the basics of a new language).
Private Class Live Private Class
- Private Class for your Team
- Live training
- Online or On-location
- Customizable
- Expert Instructors