This course aims to introduce data-analytics for management students. Topics covered in this course are: introduction to data- analytics; data-driven thinking; business problems and data science solutions; introduction to R; learning from data - supervised versus unsupervised; entropy and information gain; regression and classification trees; linear models; logistic regression; nearest neighbor methods; model performance and clustering.
At the end of the course, the students shall acquire basic knowledge about widely used data analysis techniques. Course is an applied course, hence, students will be able to apply these techniques and assess their performances. In addition, students will become familiar with the pros and cons of applying these techniques. The main software used in this course for statistical programming is R. Students shall be able to use R and its related packages. The content of the course is designed to be dynamic and changing to follow new developments.
The content of the course includes (but not limited):
| Week | Theme | Key Competencies |
|---|---|---|
| 1 | Introduction | Content of the course |
| 1, 2 | Why, Steps, | Why Data Analytics, Steps of Data Analytics |
| 3,4 | Learning Resources, Installing R, RStudio, Rstudio 101, Basic R Syntax/Commands, R Data, R Data Import/Export | Books, SW installation, CRAN, Assignment, Vectors, R as Calculator, Combining, Data Type, Import Export |
| 5 | Statistical Learning: an Intro | Learning from data, Variance-Bias Tradeoff |
| 5,6 | Data - Model - Analysis | Data Types, Exploratory Data Analysis, Summaries, Visualization |
| 7 | Decision Trees | Classification and Regression Trees, Information Gain, Gini Gain |
| 8 | K-Nearest Neighbors | Nearest Neighbor Model |
| 9 | Midterm Exam | |
| 10,11 | Regression, Dependent: Real Valued, Logistic Regression, Probit Regression, Dependent: Categorical | Regression Review: Dependent Variable Real/Categorical |
| 11,12 | Introduction to Clustering, Similarity | Similarity measures, Clustering algorithms |
| 13 | Term Project Presentations | |
| 14 | Term Project Presentations |
An Introduction to Statistical Learning, Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani, 2nd Ed. Click Here to Get pdf Version
R for Data Science, Hadley Wickham, Mine
Çetinkaya-Rundel, Garrett Grolemund (2e). Click Here to Access HTML Book
R Markdown: The Definitive Guide, Click Here to Access Free
HTML Book
Practical Data Science with R, N. Zumel and J. Mount, 2nd
Ed.
Articles (to be distributed), Lecture Notes (from other
universities)
Attendance: Regular attendance is expected and will be rewarded.
Late Submissions: Assignments submitted late will incur a penalty unless prior approval is granted.
Academic Integrity:
Academic integrity is fundamental to the academic mission of the university. Acts of academic dishonesty, including but not limited to plagiarism, cheating, fabrication, or unauthorized collaboration, undermine the learning process and violate university policies.
Specific guidelines include:
Plagiarism: Using someone else’s work, ideas, or words without proper attribution is strictly prohibited. This includes copying and pasting from any source, paraphrasing without citation, or submitting another person’s work as your own.
Cheating: Unauthorized use of materials, devices, or information during exams or assignments, including sharing or receiving answers, is not allowed.
Fabrication: Falsifying or inventing data, citations, or research is a breach of academic integrity.
Collaboration: While collaboration on group assignments may be permitted, sharing answers or work on individual tasks is not acceptable unless explicitly authorized.
Consequences: Violations of academic integrity will be addressed following the university’s academic policies, potentially leading to penalties such as assignment failure, course failure, or further disciplinary actions.