Non-Certification Course

Analyzing Big Data with SQL

The course on the SQL SELECT statement teaches learners how to use this powerful tool to work with big data using Apache Hive and Apache Impala. The course covers filtering results, grouping and aggregating data, and sorting and limiting results. Differences in using SELECT with traditional RDBMs like MySQL and PostgreSQL are also addressed. This course is available in multiple formats, including face-to-face classroom training, live virtual classes, and online learning, making it accessible to a wide range of learners seeking to enhance their SQL skills.

RM 4,199.00

per person

Level

Intermediate

Duration

3 Days

Training Delivery Format

Face-to-face / Virtual Class

RM 4,199.00

per person

Level

Intermediate

Duration

3 Days

Training Delivery Format

Face-to-face (F2F) / Virtual Class

Class types

Public Class

Private Class

In-House Training

Bespoke

In this course, you’ll get an in-depth look at the SQL SELECT statement and its main clauses. The course focuses on big data SQL engines Apache Hive and Apache Impala, but most of the information is applicable to SQL with traditional RDBMs as well; the instructor explicitly addresses differences for MySQL and PostgreSQL.

What you will learn

  • Understand the basics of SELECT statements

  • Understand how and why to filter results

  • Explore grouping and aggregation to answer analytic questions

  • Work with sorting and limiting results

This course is available on face to face classroom training or live virtual class training.

It is suitable for data analysts, data scientists, data engineers, and database administrators who work with big data. The course provides a comprehensive understanding of SQL and its applications for working with big data. No prior experience with SQL is necessary, making it accessible to a broad range of learners. Overall, anyone interested in learning how to analyze big data with SQL will benefit from attending this training.

By the end of the course, you will be able to

  • explore and navigate databases and tables using different tools;
  • understand the basics of SELECT statements;
  • understand how and why to filter results;
  • explore grouping and aggregation to answer analytic questions;
  • work with sorting and limiting results;
  • combine multiple tables in different ways.

To use the hands-on environment for this course, you need to download and install a virtual machine and the software on which to run it. Before continuing, be sure that you have access to a computer that meets the following hardware and software requirements:

• Windows, macOS, or Linux operating system (iPads and Android tablets will not work)

• 64-bit operating system (32-bit operating systems will not work)

• 8 GB RAM or more

• 25GB free disk space or more

• Intel VT-x or AMD-V virtualization support enabled (on Mac computers with Intel processors, this is always enabled; on Windows and Linux computers, you might need to enable it in the BIOS)

• For Windows XP computers only: You must have an unzip utility such as 7-Zip or WinZip installed (Windows XP’s built-in unzip utility will not work)

Module 1: Orientation to SQL on Big Data

  • Review and Preparation
  • Using the Hue Query Editors
  • Running SQL Utility Statements
  • Running SQL SELECT Statements
  • Understanding Different SQL Interfaces
  • Overview of Beeline and Impala Shell
  • Using Beeline
  • Using Impala Shell

Module 2: SQL SELECT Essentials 

  • Introduction
  • SQL SELECT Building Blocks
  • Introduction to the SELECT List
  • Expressions and Operators
  • Data Types
  • Column Aliases
  • Built-In Functions
  • Data Type Conversion
  • The DISTINCT Keyword
  • Introduction to the FROM Clause
  • Identifiers
  • Formatting SELECT Statements
  • Using Beeline in Non-Interactive Mode
  • Using Impala Shell in Non-Interactive Mode
  • Formatting the Output of Beeline and Impala Shell
  • Saving Hive and Impala Query Results to a File

Module 3: Filtering Data 

  • Introduction
  • About the Datasets
  • Introduction to the WHERE Clause
  • Using Expressions in the WHERE Clause
  • Comparison Operators
  • Data Types and Precision
  • Logical Operators
  • Other Relational Operators
  • Understanding Missing Values
  • Handling Missing Values
  • Conditional Functions
  • Using Variables with Beeline and Impala Shell
  • Calling Beeline and Impala Shell from Scripts
  • Querying Hive and Impala in Scripts and Applications

Module 4: Grouping and Aggregating Data

  • Introduction
  • Introduction to Aggregation
  • Common Aggregate Functions
  • Using Aggregate Functions in the SELECT Statement
  • Introduction to the GROUP BY Clause
  • Choosing an Aggregate Function and Grouping Column
  • Grouping Expressions
  • Grouping and Aggregation, Together and Separately
  • NULL Values in Grouping and Aggregation
  • The COUNT Function
  • Tips for Applying Grouping and Aggregation
  • Filtering on Aggregates
  • The HAVING Clause
  • Understanding Hive and Impala Version Differences
  • Understanding Hue Version Differences

Module 5: Sorting and Limiting Data

  • Introduction
  • Introduction to the ORDER BY Clause
  • Controlling Sort Order
  • Ordering Expressions
  • Missing Values in Ordered Results
  • Using ORDER BY with Hive and Impala
  • Introduction to the LIMIT Clause
  • When to Use the LIMIT Clause
  • Using LIMIT with ORDER BY
  • Using LIMIT for Pagination
  • Review

Module 6: Combining Data

  • Introduction
  • Combining Query Results with the UNION Operator
  • Using ORDER BY and LIMIT with UNION
  • Introduction to Joins
  • Join Syntax
  • Inner Joins
  • Outer Joins
  • Conclusion
HRD Corp Claimable Course

At this time, this course is available for private class and in-house training only. Please contact us for any inquiries. 

Contact form

Get the Project Charter Guide now!

Just enter your email address to access the FREE Project Charter guide and template.