Aims and Objectives

This course aims to introduce data-analytics for management students. Topics covered in this course are: introduction to data- analytics; data-driven thinking; business problems and data science solutions; introduction to R; learning from data - supervised versus unsupervised; entropy and information gain; regression and classification trees; linear models; logistic regression; nearest neighbor methods; model performance and clustering.

At the end of the course, the students shall acquire basic knowledge about widely used data analysis techniques. Course is an applied course, hence, students will be able to apply these techniques and assess their performances. In addition, students will become familiar with the pros and cons of applying these techniques. The main software used in this course for statistical programming is R. Students shall be able to use R and its related packages. The content of the course is designed to be dynamic and changing to follow new developments.

Course Content

The content of the course includes (but not limited):

Week Theme Key Competencies
1 Introduction Content of the course
1, 2 Why, Steps, Why Data Analytics, Steps of Data Analytics
3,4 Learning Resources, Installing R, RStudio, Rstudio 101, Basic R Syntax/Commands, R Data, R Data Import/Export Books, SW installation, CRAN, Assignment, Vectors, R as Calculator, Combining, Data Type, Import Export
5 Statistical Learning: an Intro Learning from data, Variance-Bias Tradeoff
5,6 Data - Model - Analysis Data Types, Exploratory Data Analysis, Summaries, Visualization
7 Decision Trees Classification and Regression Trees, Information Gain, Gini Gain
8 K-Nearest Neighbors Nearest Neighbor Model
9 Midterm Exam
10,11 Regression, Dependent: Real Valued, Logistic Regression, Probit Regression, Dependent: Categorical Regression Review: Dependent Variable Real/Categorical
11,12 Introduction to Clustering, Similarity Similarity measures, Clustering algorithms
13 Term Project Presentations
14 Term Project Presentations

References and Suggested Readings

An Introduction to Statistical Learning, Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani, 2nd Ed. Click Here to Get pdf Version


R for Data Science, Hadley Wickham, Mine Çetinkaya-Rundel, Garrett Grolemund (2e). Click Here to Access HTML Book


R Markdown: The Definitive Guide, Click Here to Access Free HTML Book


Practical Data Science with R, N. Zumel and J. Mount, 2nd Ed.


Articles (to be distributed), Lecture Notes (from other universities)

Evaluation Criteria

Policies

Academic integrity is fundamental to the academic mission of the university. Acts of academic dishonesty, including but not limited to plagiarism, cheating, fabrication, or unauthorized collaboration, undermine the learning process and violate university policies.

Specific guidelines include:

  1. Plagiarism: Using someone else’s work, ideas, or words without proper attribution is strictly prohibited. This includes copying and pasting from any source, paraphrasing without citation, or submitting another person’s work as your own.

  2. Cheating: Unauthorized use of materials, devices, or information during exams or assignments, including sharing or receiving answers, is not allowed.

  3. Fabrication: Falsifying or inventing data, citations, or research is a breach of academic integrity.

  4. Collaboration: While collaboration on group assignments may be permitted, sharing answers or work on individual tasks is not acceptable unless explicitly authorized.

  5. Consequences: Violations of academic integrity will be addressed following the university’s academic policies, potentially leading to penalties such as assignment failure, course failure, or further disciplinary actions.