The students on the completion of this course would be able to
• Apply various inferential statistical analysis techniques to describe data sets and withdraw useful conclusions from the data sets (e.g., confidence interval, hypothesis testing)
• Apply data visualization techniques and key data mining techniques (e.g., classification analysis, associate rule learning, anomaly/outlier detection, clustering analysis, regression analysis) in dealing with big data sets
• Implement the analytic algorithms for practical data sets
• Perform large scale analytic projects in various industrial sectors
|
Module 1: Fundamental Data Analysis
I. Basic Statistical Concepts
1. Descriptive Statistics
2. Statistical Inferences
3. Data Measurement
4. Measures of Central Tendency and Dispersion
5. Common Statistical Graphs
6. Determination of Outliers
II. Statistical Inferences
1. Point Estimation and Required Properties of Point Estimators
2. Interval Estimations for Mean, Proportion and Variance of Population
3. Sample Size Determination
III. Hypothesis Testing
1. Hypothesis Testing for Mean, Proportion and Variance of Population –
Single Sample Test
2. Hypothesis Testing for Mean, Proportion and Variance of Population –
Two Samples Test
3. Type I and Type II Errors – Power of the Test
4. Observed Significance Level
Module 2: Data Visualization
IV. Data Visualization
1. Introduction to Data Visualization
2. Basic Charts for Numerical Data and Categorical Data
3. Distribution Plots
4. Multivariate Charts: Combo Chart, Combination Chart, Stacked Column
Chart
V. Data Dashboard
1. What is a Data Dashboard?
2. Applications and Benefits of Data Dashboard
3. Design and Construct a Data Dashboard
Module 3: Key Data Mining Techniques
VI. Regression Analysis
1. Linear Regression and Least Square Method
2. Residual Analysis
3. Multiple Regression
4. Goodness of Fit Tests
VII. Data Classification
1. k-Nearest Neighbor Algorithm for Estimation and Prediction
2. Distance Functions: Euclidian, Manhattan, Minkowski, Min-Max Normalization, Z-Score Standardization
3. Logistics Regression
4. Bayesian Networks
5. Model Evaluation Measures for Classification Task
VIII. Data Clustering
1. Hierarchical Clustering Method
2. k-Means Clustering
3. Measuring Cluster Goodness: The Silhouette Method and The Pseudo-F Statistic
IX. Association Rules
1. Affinity Analysis
2. The a Priori Algorithm – Generating Frequent Itemsets
3. The a Priori Algorithm – Generating Association Rules
4. Measure the Usefulness of Associate Rules
|
1. Larose, D.T. and Larose, C.D., Data Mining and Predictive Analytics, 2nd edition, Wiley, 2015
2. Shmueli, G., Bruce, P.C., Yahav, I., Patel, N.R. and Lichtendahl Jr., K.C., Data Mining for Business Analytics – Concepts, Techniques, and Application in R, 1st edition, Wiley, 2018
3. Ankam, V., Big Data Analytics, Packt, 2016
4. Walkowiak, S., Big Data Analytics with R, 1st edition, Packt, 2016
5. Grolemund, G., Hands-on Programming with R, 1st edition, O’Reilly, 2014
6. Wickham, H. and Grolemund, G., R for Data Science, 1st edition, O’Reilly, 2017
7. Wexler, S., Shaffer, J. and Cotgreave, A., The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios, 1st edition, Wiley, 2017
8. O’Cornor, E., Microsoft Power BI Dashboards Step by Step, 1st edition, Practice Files, 2019
|