Siirry suoraan sisältöön

Exploratory Data Analysis with Python (6 op)

Toteutuksen tunnus: TT00EV98-3001

Toteutuksen perustiedot


Ajoitus

01.08.2021 - 31.07.2022

Opintopistemäärä

6 op

Virtuaaliosuus

6 op

Toteutustapa

Etäopetus

Yksikkö

ICT ja tuotantotalous

Toimipiste

Karaportti 2

Opetuskielet

  • Englanti

Koulutus

  • Tieto- ja viestintätekniikan tutkinto-ohjelma

Opettaja

  • Virve Prami

Vastuuopettaja

Janne Salonen

Ryhmät

  • DiplomaDA
    Diploma in Data Analytics
  • DiplomaMD
    Diploma in Machine and Deep Learning

Tavoitteet

Exploratory Data Analysis (EDA) is a combination of multiple techniques that extract valuable insights and meaningful information from the data. The main aim of EDA is to investigate datasets to reveal the underlying structures, challenges, and opportunities of data without attempting to apply any machine learning model. This course will introduce the student to the practical knowledge and the main pillars of EDA including data exploration, data preparation, data visualization, data relationships and data clustering using Python programming language. Apart from the intuitions, the student will get familiar with how EDA steps are performed by various Python libraries such as NumPy, Pandas, and Matplotlib. After passing this course, the student will be prepared to enter the fantastic world of data analysis towards amazing job positions in the industry.

Sisältö

1. Introduction:
Introduction to Data Science – Data Science Workflow – Data – Sources of Data – What is Exploratory Data Analysis? – Python Libraries for EDA

2. Describing Data:
Introduction – Observations and Variables – Categorical Variables – Continuous Variables – Central Tendency – Data Variability – Data Distributions

3. Importing Data:
Introduction – Vector and Matrix – NumPy Arrays – Working with NumPy Arrays – Loading Data with NumPy – Pandas Series – Working with Series – Pandas DataFrame – Working with DataFrame – Loading Data with Pandas

4. Data Exploration:
Extracting Descriptive Statistics – Extracting Descriptive Statistics: Preliminaries – Extracting Descriptive Statistics: Implementation – Mathematical Operations on DataFrame – Applying Functions to DataFrame – Querying a DataFrame – Filtering Data – Groupby – Identifying Unique and Missing Values – Cross Tabulation

5. Data Visualization:
Univariate Analysis – Histogram – Frequency Polygons – Boxplot – Bar Chart – Pie Chart- Multivariate Analysis – Plot – Subplot – Scatter Plot – Bubble Chart

6. Data Preparation:
Introduction – Incorrect Values and Categories – Feature Engineering: Creating New Features –Outlier Detection: Univariant –Outlier Detection: Multivariant – Removing Missing Values – Imputing Missing Values: Constant Imputation – Imputing Missing Values: K-NN Imputation – Feature Encoding: Label Encoding – Feature Encoding: One-Hot Encoding – Feature Scaling: Normalization – Feature Scaling: Standardization

7. Data Relationships:
Introduction – Covariance Matrix – Heatmap of Covariance Matrix – Correlation – Non-linear Relationship – Hypothesis Testing

8. Identifying and Understanding Groups
Introduction – Clustering – Association Rules – Hierarchical Clustering – K-Means Clustering

9. Next Steps:
What’s More? – EDA for Text Data – Model Development and Evaluation

10. Final Tasks:
Self-study Essay – Project

Aika ja paikka

Course is 100% online (Self-Study) course and study environment is TechClass -portal.

Oppimateriaalit

Online in TechClass -portal.

Opetusmenetelmät

Course is 100% online (Self-Study) course.

Harjoittelu- ja työelämäyhteistyö

N/A

Tenttien ajankohdat ja uusintamahdollisuudet

Online.

Kansainvälisyys

N/A

Toteutuksen valinnaiset suoritustavat

N/A

Opiskelijan ajankäyttö ja kuormitus

Up to student her-/himself.

Sisällön jaksotus

Up to student her-/himself.

Arviointiasteikko

Hyväksytty/Hylätty

Arviointikriteeri, hyväksytty/hylätty

Grading is pass/fail.

Lisätiedot

Course is only for Diploma in Machine & Deep Learning students.