I. Ozkan, PhD 
 Professor 
 MIS 
 Cankaya
University
 
iozkan@cankaya.edu.tr
  
Fall 2025
Additional
✅ Tidyverse style guide
✅ Frequently used [some] functions
✅ Data Types in
❌ Data Creation in
R has five basic or “atomic” classes of objects:
character (“a”, “1”, “TRUE” - with quotation marks)
numeric (real numbers, 1.234, 18, 1e25)
integer (1, 2, 100)
complex (1 - 2i, 3 + 5i): We do not discuss or use this class
logical (True/False, T, F, TRUE, FALSE)
The most basic type of R object is a vector
A vector can only contain objects of the same class
Before moving to creating different data types and using/manipulating them, let’s discuss some basic built-in functions we will use frequently
Learning from a sample of 5 observations
Functions used:
## [1]  1.0  2.2 -3.0  4.0  0.5
## [1] TRUE
## [1]  TRUE  TRUE FALSE  TRUE  TRUE
## [1] TRUE
## [1] 3
## [1] TRUE
## [1] 4
## [1] 1
x_vec > 0 returns a vector of size 5 (TRUE/FALSE for
each element)sum(x_vec > 0) returns a vector of size
1dim(object_name)class(object_name)
Source: The content and image are from Hadley Wickham’s
Advanced R: Chapter 3 on Vectors
## NULL
Atomic vectors have a dim of NULL, which distinguishes it from 1-D arrays 😲!!!
dbl_var <- c(1, 2.5, 4.5)
int_var <- c(1L, 6L, 10L)
lgl_var <- c(TRUE, FALSE)
chr_var <- c("these are", "some strings")
- You can determine the type of a vector with typeof() and
its length with length()
typeof(dbl_var)
#> [1] "double"
typeof(int_var)
#> [1] "integer"
typeof(lgl_var)
#> [1] "logical"
typeof(chr_var)
#> [1] "character"c() always creates
another atomic vector## [1] 1 2 3 4
Missing, or unknown values, are represented with a value:
NA (short for Not Applicable)
Most computations involving a missing value will return another missing value
## [1] 1
## [1] TRUE
## [1] FALSE
is.na() to test for the presence of
missingness# When creating it: 
x <- c(a = 1, b = 2, c = 3)
x
#> a b c 
#> 1 2 3
# By assigning a character vector to names()
x <- 1:3
names(x) <- c("a", "b", "c")
x
#> a b c 
#> 1 2 3
# Inline, with setNames():
x <- setNames(1:3, c("a", "b", "c"))
x
#> a b c 
#> 1 2 3NULL
Source: The content and image are from isa401:
An Undergrad Course on Business Intelligence & Data Visualization,
Introduction to R as of July 22, 2025
Logicals: (TRUE or
FALSE), or abbreviated (T or
F).
Doubles: decimal (0.1234),
scientific (1.23e4), or hexadecimal (0xcafe)
form.
Inf,
-Inf, and NaN (not a number).Integers: similar to doubles but must be
followed by L(1234L, 1e4L, or
0xcafeL), and can not contain fractional values.
Strings: surrounded by " (e.g.,
"hi") or ' (e.g., 'bye'). Special
characters are escaped with \ see ?Quotes for
full details.
Dates/times: more complicated they seem. Number of days differs in leap years. Daylight saving time (DST) may be important, since at this day number of hours may become 23 or 25. Format of the dates changes with geographic regions/countries. MIS 306 Data Analysis: Forecasting course cover this object class in details.
A matrix is a 2D data structure made of one/homogeneous data type.
Number of rows and columns are two dimension attributes
The functions that may be used for matrix object:
matrix(): creates a matrix from the given set of
valuesdim(): retrieve or set the dimension of an objectrownames(), colnames(): row or column names of a
matrix-like objectnrow(), ncol(): number of rows or columnsrbind(), cbind(): Take a matrix and combine by columns
or rowst(): transpose of a matrixis.matrix(): as name suggested##  int [1:2, 1:2] 1 2 3 4
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## [1] "-----------------"
## [1] 3
##      [,1] [,2] [,3] [,4]
## [1,] "a"  "d"  "g"  "j" 
## [2,] "b"  "e"  "h"  "k" 
## [3,] "c"  "f"  "i"  "l"
##      [,1] [,2]
## [1,] "d"  "g" 
## [2,] "e"  "h"
matrix() and dim()dim() function may help to create a
matrix directly from vectors by adding a dimension attribute
##  [1]  1  2  3  4  5  6  7  8  9 10
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
rbind() and cbind()rbind(), cbind(): functions may help to
create a matrix by column-binding or row-binding
##      x  y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12
##   [,1] [,2] [,3]
## x    1    2    3
## y   10   11   12
t()t(): transpose of a matrix
##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
##      [,1]
## [1,]    1
## [2,]    2
## [3,]    3
## [4,]    4
rownames(), colnames(): Row names and
Column names of a matrix##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8
## NULL
## NULL
## [1] "row_1" "row_2" "row_3" "row_4"
## [1] "col_1" "col_2"
##       row_1 row_2 row_3 row_4
## col_1     1     2     3     4
## col_2     5     6     7     8
##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8
lst <- list( # list constructor/creator
  1:3, # atomic double/numeric vector  of length = 3 #<< 
  "a", # atomic character vector of length = 1 (aka scalar) #<< 
  c(TRUE, FALSE, TRUE), # atomic logical vector of length = 3 #<< 
  c(2.3, 5.9) # atomic double/numeric vector of length =3 #<< 
)
lst # printing the list## [1] "1:3"                  "a"                    "c(TRUE, FALSE, TRUE)"
## [4] "c(2.3, 5.9)"
Lists are a special type of vector that can contain elements of different classes
Lists can be explicitly created using the list()
function, which takes an arbitrary number of arguments
## [[1]]
## [1] 1
## 
## [[2]]
## [1] "a"
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] 1+4i
## $a
## [1] 1
## 
## $b
## [1] "a"
## 
## $c
## [1] TRUE
## 
## $d
## [1] 1+4i
vector() function## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
## 
## [[5]]
## NULL
lst <- list( # list constructor/creator
  1:3, # atomic double/numeric vector  of length = 3 #<< 
  "a", # atomic character vector of length = 1 (aka scalar) #<< 
  c(TRUE, FALSE, TRUE), # atomic logical vector of length = 3 #<< 
  c(2.3, 5.9) # atomic double/numeric vector of length =3 #<< 
)
lst # printing the list## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] "a"
## 
## [[3]]
## [1]  TRUE FALSE  TRUE
## 
## [[4]]
## [1] 2.3 5.9
## List of 4
##  $ : int [1:3] 1 2 3
##  $ : chr "a"
##  $ : logi [1:3] TRUE FALSE TRUE
##  $ : num [1:2] 2.3 5.9
## [[1]]
## [1] 1 2 3
## [1] 1 2 3
## [1] "1:3"                  "a"                    "c(TRUE, FALSE, TRUE)"
## [4] "c(2.3, 5.9)"
Factors are used to represent categorical and ordinal data and can be unordered or ordered
One can think of a factor as an integer vector where each integer has a label
factor() function is used to create a factor
object
## [1] yes yes no  yes no 
## Levels: no yes
## [1] "yes" "yes" "no"  "yes" "no"
## x
##  no yes 
##   2   3
## [1] "no"  "yes"
## [1] yes yes no  yes no 
## Levels: yes no
## [1] "yes" "yes" "no"  "yes" "no"
Data frames are used to store tabular data in R
They are extremely important type of object in R for our program. Many courses will require to use tabular data
Data frames are represented as a special type of list where every element of the list has to have the same length
Each element of the list can be thought of as a column and the length of each element as the number of rows
Unlike matrices, data frames can store different classes of objects in each column
Functions that may be used for data frame object:
data.frame(): creates data framesrownames(), colnames()dim()nrow(), ncol()str()##   x     z
## 1 1  TRUE
## 2 2  TRUE
## 3 3 FALSE
## 4 4 FALSE
## 'data.frame':    4 obs. of  2 variables:
##  $ x: int  1 2 3 4
##  $ z: logi  TRUE TRUE FALSE FALSE
## [1] 4
## [1] 2
##   x     z
## 1 1  TRUE
## 2 2  TRUE
## 3 3 FALSE
## 4 4 FALSE
## [1] 1 2 3 4
## [1] 1 2 3 4
## [1]  TRUE  TRUE FALSE FALSE
## [1]  TRUE  TRUE FALSE FALSE
##   x    z
## 1 1 TRUE
## [1] 1 2
Example: Edgar Anderson’s Iris Data
iris gives the measurements in
centimeters of the variables sepal length and width and petal length and
width, respectively, for 50 flowers from each of 3 species of iris
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | 
|---|---|---|---|
| setosa | |||
| 5.1 | 3.5 | 1.4 | 0.2 | 
| 4.9 | 3.0 | 1.4 | 0.2 | 
| 4.7 | 3.2 | 1.3 | 0.2 | 
| versicolor | |||
| 7.0 | 3.2 | 4.7 | 1.4 | 
| 6.4 | 3.2 | 4.5 | 1.5 | 
| 6.9 | 3.1 | 4.9 | 1.5 | 
| virginica | |||
| 6.3 | 3.3 | 6.0 | 2.5 | 
| 5.8 | 2.7 | 5.1 | 1.9 | 
| 7.1 | 3.0 | 5.9 | 2.1 | 
Example: Edgar Anderson’s Iris Data: first 6 rows
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
Example: Edgar Anderson’s Iris Data: last 6 rows
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 145          6.7         3.3          5.7         2.5 virginica
## 146          6.7         3.0          5.2         2.3 virginica
## 147          6.3         2.5          5.0         1.9 virginica
## 148          6.5         3.0          5.2         2.0 virginica
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9         3.0          5.1         1.8 virginica
Example: rows 111 through 116 (all columns)
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 111          6.5         3.2          5.1         2.0 virginica
## 112          6.4         2.7          5.3         1.9 virginica
## 113          6.8         3.0          5.5         2.1 virginica
## 114          5.7         2.5          5.0         2.0 virginica
## 115          5.8         2.8          5.1         2.4 virginica
## 116          6.4         3.2          5.3         2.3 virginica
Example: accessing columns
Using three methods:
Single brackets: ([]) using index or name
Double brackets ([[]]) using index or name
The dollar sign ($) using name
## [1] 5.1 4.9 4.7 4.6 5.0 5.4
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"
## [1] 5.1 4.9 4.7 4.6 5.0 5.4
## [1] TRUE
## [1] 5.1 4.9 4.7 4.6 5.0 5.4
## [1] TRUE
##   Sepal.Length
## 1          5.1
## 2          4.9
## 3          4.7
## 4          4.6
## 5          5.0
## 6          5.4
Example: column values > some value
## [1] FALSE FALSE FALSE FALSE FALSE FALSE
## [1] 6
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 106          7.6         3.0          6.6         2.1 virginica
## 118          7.7         3.8          6.7         2.2 virginica
## 119          7.7         2.6          6.9         2.3 virginica
## 123          7.7         2.8          6.7         2.0 virginica
## 132          7.9         3.8          6.4         2.0 virginica
## 136          7.7         3.0          6.1         2.3 virginica
##     Sepal.Width   Species
## 106         3.0 virginica
## 118         3.8 virginica
## 119         2.6 virginica
## 123         2.8 virginica
## 132         3.8 virginica
## 136         3.0 virginica
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 106          7.6         3.0          6.6         2.1 virginica
## 119          7.7         2.6          6.9         2.3 virginica
## 123          7.7         2.8          6.7         2.0 virginica
## 136          7.7         3.0          6.1         2.3 virginica
## [1] 5.7 5.2 5.0 5.2 5.4 5.1
What really happens here?
## [1]  1.0  3.0   NA  6.0 -5.0 -0.5
## [1]  1.0  3.0   NA -0.5
## [1]  TRUE  TRUE    NA FALSE  TRUE  TRUE
## [1]  1.0  3.0   NA -5.0 -0.5
Example: column select
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"
## [1] 5.1 4.9 4.7 4.6 5.0 5.4
## [1] 5.1 4.9 4.7 4.6 5.0 5.4
order() function##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 14          4.3         3.0          1.1         0.1  setosa
## 9           4.4         2.9          1.4         0.2  setosa
## 39          4.4         3.0          1.3         0.2  setosa
## 43          4.4         3.2          1.3         0.2  setosa
## 42          4.5         2.3          1.3         0.3  setosa
## 4           4.6         3.1          1.5         0.2  setosa
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 132          7.9         3.8          6.4         2.0 virginica
## 118          7.7         3.8          6.7         2.2 virginica
## 119          7.7         2.6          6.9         2.3 virginica
## 123          7.7         2.8          6.7         2.0 virginica
## 136          7.7         3.0          6.1         2.3 virginica
## 106          7.6         3.0          6.6         2.1 virginica
# by Sepal.Length (descending) and Petal.Length (ascending)
iris[order(-iris$Sepal.Length,iris$Petal.Length),][1:6,] ##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 132          7.9         3.8          6.4         2.0 virginica
## 136          7.7         3.0          6.1         2.3 virginica
## 118          7.7         3.8          6.7         2.2 virginica
## 123          7.7         2.8          6.7         2.0 virginica
## 119          7.7         2.6          6.9         2.3 virginica
## 106          7.6         3.0          6.6         2.1 virginica
colnames() and rownames() functions## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"
## [1] "1" "2" "3" "4" "5"
## [1] "L.Sepal" "W.Sepal" "L.Petal" "W.Petal" "Species"
# if one column name is not specified 
colnames(iris) <- c("L.Sepal","W.Sepal","L.Petal","W.Petal")
colnames(iris)## [1] "L.Sepal" "W.Sepal" "L.Petal" "W.Petal" NA
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"
## [1] "L.Sepal"      "W.Sepal"      "Petal.Length" "Petal.Width"  "Species"
# rownames(iris) <- iris$Species
# Error in `.rowNamesDF<-`(x, value = value) :
#  duplicate 'row.names' are not allowed
rownames(iris) <- paste0(iris$Species,"_",rownames(iris))
head(rownames(iris))## [1] "setosa_1" "setosa_2" "setosa_3" "setosa_4" "setosa_5" "setosa_6"
## [1] "virginica_145" "virginica_146" "virginica_147" "virginica_148"
## [5] "virginica_149" "virginica_150"
data(iris)
# adding a column, new.column with values Sepal.Length * Sepal.Width
iris$new.column <- iris$Sepal.Length * iris$Sepal.Width
head(iris[,c("Sepal.Length","Sepal.Width","new.column")])##   Sepal.Length Sepal.Width new.column
## 1          5.1         3.5      17.85
## 2          4.9         3.0      14.70
## 3          4.7         3.2      15.04
## 4          4.6         3.1      14.26
## 5          5.0         3.6      18.00
## 6          5.4         3.9      21.06
# assume sepal size defined as large if new.column > median  
# ifelse function - teaser 
cat("Median of Sepal.Size: ",median(iris$new.column))## Median of Sepal.Size:  17.66
iris$Sepal.Size <- ifelse(iris$new.column > median(iris$new.column), c("Large"), c("Small"))
head(iris[,c("Sepal.Length","Sepal.Width","new.column","Sepal.Size")])##   Sepal.Length Sepal.Width new.column Sepal.Size
## 1          5.1         3.5      17.85      Large
## 2          4.9         3.0      14.70      Small
## 3          4.7         3.2      15.04      Small
## 4          4.6         3.1      14.26      Small
## 5          5.0         3.6      18.00      Large
## 6          5.4         3.9      21.06      Large
# manually 
iris$Sepal.Size2 <- "Small"  # initialization
iris$Sepal.Size2[iris$new.column > median(iris$new.column)] <- "Large"  # initialization
head(iris[,c("new.column", "Sepal.Size", "Sepal.Size2")])##   new.column Sepal.Size Sepal.Size2
## 1      17.85      Large       Large
## 2      14.70      Small       Small
## 3      15.04      Small       Small
## 4      14.26      Small       Small
## 5      18.00      Large       Large
## 6      21.06      Large       Large
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9         3.0          5.1         1.8 virginica
# adding a row (here species is important, it is a factor)
iris <- rbind(iris,c(5,4,7,6,"virginica"))
tail(iris, 3)##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9           3          5.1         1.8 virginica
## 151            5           4            7           6 virginica
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9         3.0          5.1         1.8 virginica
##     Sepal.Length Sepal.Width Petal.Length Petal.Width
## 149          6.2         3.4          5.4         2.3
## 150          5.9         3.0          5.1         1.8
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"
##     Sepal.Width Petal.Width
## 149         3.4         2.3
## 150         3.0         1.8
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9         3.0          5.1         1.8 virginica
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 147          6.3         2.5          5.0         1.9 virginica
## 148          6.5         3.0          5.2         2.0 virginica
## 149          6.2         3.4          5.4         2.3 virginica
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 2          4.9         3.0          1.4         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 118          7.7         3.8          6.7         2.2 virginica
## 119          7.7         2.6          6.9         2.3 virginica
## 123          7.7         2.8          6.7         2.0 virginica
## 132          7.9         3.8          6.4         2.0 virginica
## 136          7.7         3.0          6.1         2.3 virginica
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 106          7.6           3          6.6         2.1 virginica
Information on Pipe operators largely taken from Base vs Magrittr Pipe. Information below is considered to be true for R version 4.2 and later
Pipe operators, |> and %>% are used to pipe an
object forward to a function or call expression
Some complex data manipulations can be made using pipe operator as a sequence of operations
|> is native pipe
operator and %>% is a pipe
operator coming with magrittr package
pipe passes the object on its left-hand side to the first argument of the function on the right-hand side.
|> and %>% perform the same
operations for some simple cases, there are differences as well,x %>% f(1)   = f(x, 1)  
x %>% f(1, .)= f(1, x)  
df %>% split(.$var) = split(df, df$var)  
df %>% {split(.$x, .$y)} = split(df$x, df$y)  
df %>% .$var = df$var   
# no paranthesis requred 
x %>% mean  
df %>% 
  .$var %>% 
  mean 
x |> f(1) = f(x, 1)  
x |> f(1, y = _) = f(1, y=x)  
# parantheses always necessary 
x |> mean()
# Good 
flights |>  
  filter(!is.na(arr_delay), !is.na(tailnum)) |> 
  count(dest)
# Avoid
flights|>filter(!is.na(arr_delay), !is.na(tailnum))|>count(dest)
library(tibble)
# see ?tibble::tibble
dept <- c('MIS', 'ECON', 'SENG', 'CENG', 'MAN')
some_numbers <- c(18L, 19L, 14L, 25L, 22L)
fsb_tbl <- tibble(
  department = dept, 
  count = some_numbers, 
  percentage = count / sum(count))## # A tibble: 5 × 3
##   department count percentage
##   <chr>      <int>      <dbl>
## 1 MIS           18      0.184
## 2 ECON          19      0.194
## 3 SENG          14      0.143
## 4 CENG          25      0.255
## 5 MAN           22      0.224
##   department count percentage
## 1        MIS    18  0.1836735
## 2       ECON    19  0.1938776
## 3       SENG    14  0.1428571
## 4       CENG    25  0.2551020
## 5        MAN    22  0.2244898
Tibble is a modern reimagining of the data frame. Tibbles are designed to be (as much as possible) drop-in replacements for data frames that fix those frustrations. A concise, and fun, way to summarise the main differences is that tibbles are lazy and surly: they do less and complain more. – Hadley Wickham
To learn more about the basics of tibble, please consult the reference below: