Spark and Delta Lake Workshop
Pricing: $45-60 for 6-hrs Hands-On Workshop. See EventBrite for details.
Register using EventBrite.
PDS (Professional Development Seminar)
A 2-day virtual workshop intended to teach you how you can architect reliable scalable solutions with Apache Spark and Data Lake
Details
Sat, April 2, 2022, 10:00 AM – 1:00 PM
Sun, April 3, 2022, 10:00 AM – 1:00 PM
Abstract
This 2-day workshop is intended to teach you what Apache Spark™ and Delta Lake are and how to use them in your data architectures for reliable large-scale distributed data pipelines. This course will show the features of Delta Lake that, alongside Spark SQL and Spark Structured Streaming, introduce ACID transactions and time travel (data versioning) to your ETL batch and streaming workloads. Slides, demos, exercises, and Q&A sessions should all together help you understand the concepts of the modern data lakehouse architecture.
Motivation
Whether you are new to the field of data analytics & data science, you know that working with large amounts of data is a critical need for businesses today. For the first time, SF Bay ACM is partnering with Databricks to bring to you this exciting workshop on Apache Spark and Delta Lake. These two technologies combine to bring the power of petabytes of data at your finger-tip.
Sponsorship
Databricks is partially sponsoring this event, so we also have a rare opportunity to support our professional development activities with a significant price drop. Check below for details.
NOTE: While this is a virtual class, we will cap it at classroom size so that there is a strong focus on learning. There is a nominal charge for the 6 hours of lecture – please sign up early as we will keep the attendee count low. This is NOT a MOOC. Registration also includes a 1-year SFBay ACM membership ($20 value)
Content: You will have access to all the notebooks, training material for hands-on workshop training.
Who is the course for?
- Solution Architects
- Data Engineers
- Data Scientists
Structure
- Six 55-min modules (10-min break between modules)
- 15-min talk/20-min labs / 15-min Q&A / 5-min buffer
Requirements
- Sign up for Databricks Community Edition
- Should have experience with SQL and Python
Saturday – Day 1: 10am-11:30am, Pacific Time
Module 1. The Fundamentals of Apache Spark
- Introduction to Databricks Community Edition
- Loading and saving datasets (/databricks-datasets) [SQL]
- Basic DataFrame Transformations [SQL]
- Working with Spark tables [SQL]
Module 2. Intermediate Spark SQL
- Aggregations [SQL]
- Joins [SQL]
- Basics of web UI
Module 3. Advanced Spark SQL
- Windowed Aggregation [SQL]
- Introduction to Spark Structured Streaming [Python, SQL]
Sunday (Delta Lake)
Module 4: Introduction to Delta Lake
- Bringing Reliability to Data Lakes (Concepts)
- Convert existing tables to Delta Lake [SQL]
- Unified Batch and Streaming [Python, SQL]
Module 5: DML and Schema
- Create, Insert, Update, Delete, Merge
- Schema Enforcement and Evolution
Module 6: SQL and the Transaction Log
- Delta Lake SQL
- Time Travel
- Transaction Log Fundamentals
Organizer & SFBay ACM Prof Dev Chair: Yashesh Shroff @yashroff
For more information about Registration, please contact SF Bay Chapter of the ACM, yshroff at g | m | a i l
We look forward to seeing you at the workshop!