Exploratory Data Analysis with Python (6 op)
Toteutuksen tunnus: TT00EU30-3001
Toteutuksen perustiedot
Ajoitus
22.03.2021 - 31.12.2021
Opintopistemäärä
6 op
Virtuaaliosuus
6 op
Toteutustapa
Etäopetus
Yksikkö
School of ICT
Toimipiste
Karaportti 2
Opetuskielet
- Englanti
Paikat
0 - 500
Koulutus
- Information and Communication Technology
Opettaja
- Virve Prami
Ryhmät
-
ATX21TVNonStop virtual Studies year 2021
Tavoitteet
Exploratory Data Analysis (EDA) is a combination of multiple techniques that extract valuable insights and meaningful information from the data. The main aim of EDA is to investigate datasets to reveal the underlying structures, challenges, and opportunities of data without attempting to apply any machine learning model. This course will introduce the student to the practical knowledge and the main pillars of EDA including data exploration, data preparation, data visualization, data relationships and data clustering using Python programming language. Apart from the intuitions, the student will get familiar with how EDA steps are performed by various Python libraries such as NumPy, Pandas, and Matplotlib. After passing this course, the student will be prepared to enter the fantastic world of data analysis towards amazing job positions in the industry.
This course is 100% virtual, thanks to figures and content prepared for this course.
Sisältö
1. Introduction:
Introduction to Data Science – Data Science Workflow – Data – Sources of Data – What is Exploratory Data Analysis? – Python Libraries for EDA
2. Describing Data:
Introduction – Observations and Variables – Categorical Variables – Continuous Variables – Central Tendency – Data Variability – Data Distributions
3. Importing Data:
Introduction – Vector and Matrix – NumPy Arrays – Working with NumPy Arrays – Loading Data with NumPy – Pandas Series – Working with Series – Pandas DataFrame – Working with DataFrame – Loading Data with Pandas
4. Data Exploration:
Extracting Descriptive Statistics – Extracting Descriptive Statistics: Preliminaries – Extracting Descriptive Statistics: Implementation – Mathematical Operations on DataFrame – Applying Functions to DataFrame – Querying a DataFrame – Filtering Data – Groupby – Identifying Unique and Missing Values – Cross Tabulation
5. Data Visualization:
Univariate Analysis – Histogram – Frequency Polygons – Boxplot – Bar Chart – Pie Chart- Multivariate Analysis – Plot – Subplot – Scatter Plot – Bubble Chart
6. Data Preparation:
Introduction – Incorrect Values and Categories – Feature Engineering: Creating New Features –Outlier Detection: Univariant –Outlier Detection: Multivariant – Removing Missing Values – Imputing Missing Values: Constant Imputation – Imputing Missing Values: K-NN Imputation – Feature Encoding: Label Encoding – Feature Encoding: One-Hot Encoding – Feature Scaling: Normalization – Feature Scaling: Standardization
7. Data Relationships:
Introduction – Covariance Matrix – Heatmap of Covariance Matrix – Correlation – Non-linear Relationship – Hypothesis Testing
8. Identifying and Understanding Groups
Introduction – Clustering – Association Rules – Hierarchical Clustering – K-Means Clustering
9. Next Steps:
What’s More? – EDA for Text Data – Model Development and Evaluation
10. Final Tasks:
Self-study Essay – Project
Aika ja paikka
Online TechClass portal.
Oppimateriaalit
Lecture slides, quizzes, exercises
Opetusmenetelmät
- Exercise
- Quiz
- Project
- Self-study
Harjoittelu- ja työelämäyhteistyö
N/A
Tenttien ajankohdat ja uusintamahdollisuudet
N/A
Kansainvälisyys
N/A
Toteutuksen valinnaiset suoritustavat
N/A
Opiskelijan ajankäyttö ja kuormitus
Lectures = 70h
Exercises = 20h
Self-study = 40h
Quizzes = 15h
Project = 45h
Total = 190 hours
Arviointiasteikko
Hyväksytty/Hylätty
Arviointikriteeri, hyväksytty/hylätty
The student will pass this course after submitting the required quizzes, assignments, and the final project.
Arviointimenetelmät ja arvioinnin perusteet
Exercise 20%
Quiz 30%
Project 30%
Essay 20%
Prerequisites
Introduction to Python for Data Science