Statistical Data Analysis with PythonLaajuus (10 ECTS)
Course unit code: TT00FA97
General information
- Credits
- 10 ECTS
- Teaching language
- English
Objective
Data analysis and statistical analysis are necessary for many in-demand data analytics job roles. They are used hand in hand to solve business problems; data analysis is a general approach to making data understandable for decision-makers, and statistical analysis is a professional statistical attitude. A rich data analysis skill-set will help you better understand the data to extract knowledge and insights. This course is designed to give you the necessary resources to gain the career-building Python skills you need to succeed as a Data Analyst. By the end of this course, you will get a full understanding of how to use Python’s scientific computing libraries to import, clean, manipulate, visualize data and use a wide range of statistical techniques to analyze data to extract meaningful insights.
This course is 100% virtual thanks to the comprehensive tutorial videos and content prepared for this course.
The student will pass this course after submitting the required quiz, assignments, and the final project.
Content
1. Introduction to Data Analysis:
What is Data Analysis? – Different Types of Data Analysis – What is Statistical Analysis? – Descriptive vs. Inferential Statistics – Methods of Sampling – Steps Involved in Data Analysis – Quiz
2. Data Ingestion:
Introduction – Importing Flat Files – Parsing Date and Time – Importing Excel Spreadsheets – Connecting to a Database – Retrieving Tables from MySQL Databases – Retrieving Tables from PostgreSQL Database – Retrieving Data from Azure Blob Storage – Retrieving Data from AWS S3 Buckets – Importing JSON Files – Combining Multiple Datasets – Quiz
3. Descriptive Statistics:
Introduction – Histogram and Bar Chart – Central Tendency Measures – Data Variability Measures – Extracting Descriptive Statistics – Skewness – Kurtosis
4. Data Cleaning:
Introduction – Handling Incorrect Values – Handling Incorrect Data Types – Removing Missing Values – Handling Missing Values: Simple Imputation – Handling Missing Values: K-NN Imputation – Handling Missing Values: MICE – Binning – Outlier Detection: IQR Method – Outlier Detection: Isolation Forest – Data Sanitization – Quiz
5. Probability:
Introduction – Probabilistic Experiment – Probability of an Event – Random Variable – Discrete and Continuous Random Variables – Probability Mass Function – Probability Density Function – Cumulative Distribution Function – Empirical Cumulative Distribution Function – Expected Values
6. Statistical Data Modeling:
Introduction – Normal Distribution – Other Types of Distribution Functions – Kernel Density Estimation – Fitting Data to the Probability Distribution – Conditional Probabilistic Analysis
7. Relationship Analysis
Introduction – Correlation vs. Causation – Covariance Matrix – Pearson Correlation – Kendall Rank Correlation – Spearman Rank Correlation – Heatmap of Correlation Matrix – Quiz
8. Hypothesis Testing:
Introduction – Essential Concepts – Chi-square Test of Independence – Chi-square Test of Independence: Implementation – Two-Sample t-Test – Paired t-Test – One-Way ANOVA – Post-Hoc Test – Non-Parametric Tests
9. A/B Testing:
Introduction – Designing the Experiment – Collecting and Preparing the data – Visualizing the Results – Testing the Hypothesis – Drawing Conclusions
10. Final Tasks:
Project – Self-study Essay
Qualifications
Introduction to Python for Data Science
Assessment criteria, approved/failed
Exercises 50%
Quizzes 25%
Project 25%