This course is the first part of a two-semester data analytics course that is required to extract knowledge from business data. The topics covered are: Review of Programming Software, Getting Started with Data, introduction to data- analytics, data-driven thinking, Causality, Learning: Supervised and Unsupervised, Data and Models, Linear Models, logistic regression, regression and classification trees, Entropy, Information Gain, Neighbor Models, and Distance as Similarity.
By the end of this course, students will be able to:
Identify correct data analytics method
Check data for errors and use correct method to clean data for analysis
Apply supervised learning methods (Decision Trees, KNN, Regression) for classification
Estimate probability of an event (e.g., default risk)
occurring
Evaluate performance
Tentative Course Schedule
| Week | Theme | Key Competencies |
|---|---|---|
| 1 | Introduction | Content of the course |
| 1, 2 | Why, Remember: Installing R, RStudio, Remember: Rstudio 101 | Environment setup on Windows, CRAN |
| 3,4 | Steps, Statistical Learning: an Intro, Data - Model - Analysis | Learning from data, Causality |
| 4,5 | Data - Model - Analysis, Missing Value Treatment | Data Types, Exploratory Data Analysis, Summaries, Visualization, Missing Values and Treatment |
| 5,6 | A Brief Intro to Linear Regression Model | Linear Model, Linear Regression, Least Squares Estimation, Maximum Likelihood |
| 7,8 | Regression, Dependent: Categorical | Linear Regression, Logistic Regression, Probit Regression |
| 9,10 | Decision Trees | Classification and Regression Trees, Information Gain, Gini Gain |
| 11 | K-Nearest Neighbors | Neighbors Models, KNN |
| 12 | Similarity Measures | Similarity measures |
| 13 | Term Project Presentations | |
| 14 | Term Project Presentations |
Textbook: Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R (Second Edition).
Software: R, RStudio (download links and setup instructions will be provided), and relevant R packages.
Attendance: Regular attendance is expected and will be rewarded.
Late Submissions: Assignments submitted late will incur a penalty unless prior approval is granted.
Academic Integrity:
Academic integrity is fundamental to the academic mission of the university. Acts of academic dishonesty, including but not limited to plagiarism, cheating, fabrication, or unauthorized collaboration, undermine the learning process and violate university policies.
Specific guidelines include:
Plagiarism: Using someone else’s work, ideas, or words without proper attribution is strictly prohibited. This includes copying and pasting from any source, paraphrasing without citation, or submitting another person’s work as your own.
Cheating: Unauthorized use of materials, devices, or information during exams or assignments, including sharing or receiving answers, is not allowed.
Fabrication: Falsifying or inventing data, citations, or research is a breach of academic integrity.
Collaboration: While collaboration on group assignments may be permitted, sharing answers or work on individual tasks is not acceptable unless explicitly authorized.
Consequences: Violations of academic integrity will be addressed following the university’s academic policies, potentially leading to penalties such as assignment failure, course failure, or further disciplinary actions.