Data Analysis with Python
Learn how to analyze data with Python in this comprehensive course. Topics covered include data collection, preparation, cleaning, frame manipulation, summarization, machine learning regression models, model refinement, and creating data pipelines. Apply what you learn by performing exploratory data analysis and creating data visualizations to predict future trends.
per person
Level
Duration
Training Delivery Format
Face-to-face / Virtual Class
per person
Level
Duration
Training Delivery Format
Face-to-face (F2F) / Virtual Class
Class types
Public Class
Private Class
In-House Training
Bespoke
About this course
Analyzing data with Python is an essential skill for Data Scientists and Data Analysts. This course will take you from the basics of data analysis with Python to building and evaluating data models.
Topics covered include:
- collecting and importing data
- cleaning, preparing & formatting data
- data frame manipulation
- summarizing data – building machine learning regression models
- model refinement
- creating data pipelines
You will learn how to import data from multiple sources, clean and wrangle data, perform exploratory data analysis (EDA), and create meaningful data visualizations. You will then predict future trends from data by developing linear, multiple, polynomial regression models & pipelines and learn how to evaluate them.
Course Objective
Skills You Will Gain
- Predictive Modelling
- Python Programming
- Data Analysis
- Data Visualization (DataViz)
- Model Selection
Who should attend?
Data Scientists and Data Analysts
Learning Outcome
What you will learn
- Develop Python code for cleaning and preparing data for analysis – including handling missing values, formatting, normalizing, and binning data
- Perform exploratory data analysis and apply analytical techniques to real-word datasets using libraries such as Pandas, Numpy and Scipy
- Manipulate data using dataframes, summarize data, understand data distribution, perform correlation and create data pipelines Build and evaluate regression models using machine learning scikit-learn library and use them for prediction and decision making
Prerequisites
You should have a working knowledge of Python and Jupyter Notebooks
Course Content
Module 1: Importing Datasets
In this module, you will learn how to understand data and learn about how to use the libraries in Python to help you import data from multiple sources. You will then learn how to perform some basic tasks to start exploring and analyzing the imported data set.
- The Problem
- Understanding the Data
- Python Packages for Data Science
- Importing and Exporting Data in Python
- Getting Started Analyzing Data in Python
Module 2: Data Wrangling
In this module, you will learn how to perform some fundamental data wrangling tasks that, together, form the pre-processing phase of data analysis. These tasks include handling missing values in data, formatting data to standardize it and make it consistent, normalizing data, grouping data values into bins, and converting categorical variables into numerical quantitative variables.
- Pre-processing Data in Python
- Dealing with Missing Values in Python
- Data Formatting in Python
- Data Normalization in Python
- Binning in Python
- Turning categorical variables into quantitative variables in Python
Module 3: Exploratory Data Analysis
In this module, you will learn what is meant by exploratory data analysis, and you will learn how to perform computations on the data to calculate basic descriptive statistical information, such as mean, median, mode, and quartile values, and use that information to better understand the distribution of the data. You will learn about putting your data into groups to help you visualize the data better, you will learn how to use the Pearson correlation method to compare two continuous numerical variables, and you will learn how to use the Chi-square test to find the association between two categorical variables and how to interpret them.
- Exploratory Data Analysis
- Descriptive Statistics
- GroupBy in Python
- Correlation
- Correlation – Statistics
- Analysis of Variance ANOVA
Module 4: Model Development
In this module, you will learn how to define the explanatory variable and the response variable and understand the differences between the simple linear regression and multiple linear regression models. You will learn how to evaluate a model using visualization and learn about polynomial regression and pipelines. You will also learn how to interpret and use the R-squared and the mean square error measures to perform in-sample evaluations to numerically evaluate our model. And lastly, you will learn about prediction and decision making when determining if our model is correct.
- Model Development
- Linear Regression and Multiple Linear Regression
- Model Evaluation using Visualization
- Polynomial Regression and Pipelines
- Measures for In-Sample Evaluation
- Prediction and Decision Making
Module 5: Model Evaluation
- Model Evaluation and Refinement
- Overfitting, Underfitting and Model Selection
- Ridge Regression
- Grid Search
Certification
This is a non-certification course.
If you are looking for a certification, Pyhton Institute offers the best options for you to choose from.
You can start with PCEP, Certified Entry Level Python Programmer.
Then you can pursue PCAP, Certified Associate in Python Programming, and then PCPP, Certified Professional in Python Programming 1.
FAQs
Q: What does the course cover?
A: The course covers essential topics in data analysis and modeling using Python. It includes modules on importing datasets, data wrangling (pre-processing), exploratory data analysis, model development, and model evaluation and refinement.
Q: What will I learn about data wrangling in this course?
A: In the data wrangling module, you will learn fundamental tasks like handling missing values, formatting data, normalization, binning, and converting categorical variables into numerical quantitative variables using Python.
Q: How does exploratory data analysis contribute to a better understanding of the data?
A: Exploratory Data Analysis (EDA) involves computing basic descriptive statistical information, visualizing data, and using correlation methods to compare numerical variables. It helps to gain insights into data distribution, and relationships between variables, and identify patterns or associations.
Q: What will I gain from the model development and evaluation modules?
A: In the model development module, you will learn about linear regression, multiple linear regression, polynomial regression, and model evaluation using visualization techniques. The model evaluation module will cover overfitting, underfitting, model selection, Ridge Regression, and Grid Search to refine and optimize your models for better performance.
Q: Is it worth it for me to inquire about Python skills to progress in my career path?
A: Acquiring Python skills can significantly enhance your career opportunities and growth prospects. Python is a versatile and widely used programming language with diverse applications across various industries. By mastering Python, you can pursue roles such as Python Developer, Software Engineer, Data Scientist, and more.
At this time, this course is available for private class and in-house training only. Please contact us for any inquiries.