Skip to main content

Data miningLaajuus (5 ECTS)

Course unit code: TT00BF05

General information


Credits
5 ECTS

Objective

The student knows the main approaches in data mining as well as their applicability to different data sets. The student can use data mining software to find previously unknown patterns in a data set. He/she is able to transfer a data set into the format required by the chosen analysis method as well as implement selected analysis methods programmatically.

Content

- The approaches for data mining (classification, association analysis, clustering, predicting numerical values) as well as their fields of applicability and usage
- Data mining software.
- Estimating the statistical significance of the observed outcome and validation of the results.
- Data preprocessing
- Text mining

Qualifications

The course on Mathematical Statistics supports this course.

Assessment criteria, satisfactory (1)

The student is familiar with the approaches and possibilities of data mining and its mathematical foundations.
The student can use data mining software to perform given analysis for a given data set.
The student can manually edit a data set to the format required by the chosen analysis method.

Assessment criteria, good (3)

The student can choose a data mining approach that is suitable for the problem at hand.
The student has understanding on distributions, evaluation of statistical significance and sources of error.
The student understands the underlying principles of the algorithms used in data mining software. He/she understands how modifying parameter values affects the analysis outcome.
The student can implement programmatically a mechanical data set preprocessing task

Assessment criteria, excellent (5)

The student can apply a variety of data mining methods taking into account the special characteristics of the data set and research question at hand.
The student can implement selected data mining algorithms programmatically.
The student can implement programmatically a wide range of transformations for data sets.

Go back to top of page