MIS 205

Data Import to and Export from

I. Ozkan, PhD

Professor
MIS
Cankaya University

iozkan@cankaya.edu.tr

Fall 2025

Reference

Learning Objectives for Today’s Class

Read text-files, binary files (e.g., Excel, SAS, SPSS, Stata, etc), json files, etc.
Export data from .

Data import

There are a few principal functions reading data into R:
- read.table, read.csv, for reading tabular data
- readLines, for reading lines of a text file
- source, for reading in R code files (inverse of dump)
- dget, for reading in R code files (inverse of dput)
- load, for reading in saved workspaces
- unserialize, for reading single R objects in binary form
There are many R packages available to import/export all kinds of data file(s)

Data import: `readr` package

the readr package, which is part of the core tidyverse
delimited text files with read_delim()
- .csv: comma separated values with read_csv() or read_csv2()
- .tsv: tab separated values read_tsv()
.fwf: fixed width files with read_fwf()
table (columns separated by space) read_table()
- .txt: text file format with read_table()
- .gz,.bz2, .xz, or .zip files will be uncompressed

Data Export

There are analogous functions for writing data to files:
- write.table, for writing tabular data to text files (i.e. CSV) or connections
- writeLines, for writing character data line-by-line to a file or connection
- dump, for dumping a textual representation of multiple R objects
- dput, for outputting a textual representation of an R object
- save, for saving an arbitrary number of R objects in binary format (possibly compressed) to a file
- serialize, for converting an R object into a binary format for outputting to a connection (or file)

Data Export: `readr` package

the readr package, which is part of the core tidyverse
delimited text files with write_delim()
- .csv: comma separated values with write_csv(), write_csv2(), or write_excel_csv(), write_excel_csv2()
- .tsv: tab separated values write_tsv()

Data Import Using RStudio

Use RStudio Environment menu option

Some Details: Reading CSV Data Files

read_csv() arguments with ?read_csv()
w/o using arguments, readr makes smart guesses, which means take a little longer
more specific, speed up the reading

read_csv(
  file,
  col_names = TRUE,
  col_types = NULL,
  locale = default_locale(),
  na = c("", "NA"),
  quoted_na = TRUE,
  quote = "\"",
  comment = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  progress = show_progress(),
  skip_empty_rows = TRUE
)

An Example: Reading CSV Data

Import csv data shown in R for Data Science (2e), chapter 7:

students <- readr::read_csv(file = "https://akademik.cankaya.edu.tr/~iozkan/mis207/students.csv")

dim(students)

## [1] 6 5

students

## # A tibble: 6 × 5
##   `Student ID` `Full Name`      favourite.food     mealPlan            AGE  
##          <dbl> <chr>            <chr>              <chr>               <chr>
## 1            1 Sunil Huffmann   Strawberry yoghurt Lunch only          4    
## 2            2 Barclay Lynn     French fries       Lunch only          5    
## 3            3 Jayendra Lyne    N/A                Breakfast and lunch 7    
## 4            4 Leon Rossini     Anchovies          Lunch only          <NA> 
## 5            5 Chidiegwu Dunkel Pizza              Breakfast and lunch five 
## 6            6 Güvenç Attila    Ice cream          Lunch only          6

Reading Excel Files with `readxl` Package

Microsoft Excel (with extensions .xlsfor MSFT Excel 2003 and earlier OR .xlsx for MSFT Excel 2007 and later)

read_excel(
  path,
  sheet = NULL,
  range = NULL,
  col_names = TRUE,
  col_types = NULL,
  na = "",
  trim_ws = TRUE,
  skip = 0,
  n_max = Inf,
  guess_max = min(1000, n_max),
  progress = readxl_progress(),
  .name_repair = "unique"
)

Reading CSVs with `Vroom` Package

Faster delimited reader at 1.4GB/sec
vroom is a relatively new tidyverse package that can read and write delimited files very efficiently
It is recommended for large CSV files, see tidyverse blog for a detailed introduction on the package

if(require(vroom)==FALSE) install.packages('vroom')
fast_df <- vroom::vroom("your_file.csv")

Reading Proprietary Binary Files with `haven` Package

Several functions from the haven can be used to read and write formats used by other statistical packages. Example functions include:

SAS
- .sas7bdat with read_sas()
Stata
- .dta with read_dta()
SPSS
- .sav with read_sav()

Please refer to the help files for each of those packages for more details.

JavaScript Object Notation: JSON Files

JSON (JavaScript Object Notation) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays… It is a common data format with diverse uses … including that of web applications with servers. — Wikipedia’s Definition of JSON

object: {}
array: []
value: string/character, number, object, array, logical, null

JSON Files

There are several packages to handle JSON format in R

JSON

{
  "firstName": "Mickey",
  "lastName": "Mouse",
  "address": {
    "city": "Mousetown",
    "postalCode": 10000
  }
  "logical": [true, false]
}

R list

list(
  firstName = "Mickey",
  lastName = "Mouse",
  address = list(
    city = "Mousetown",
    postalCode = 10000
  ),
  logical = c(TRUE, FALSE)
)

Data export

From Read to Write

read_*() to write_*()

Here are some ideas: do they come from the same package?

write_csv(students, file = "example.csv")
write_sas(students, path = "example.sas7bdat")
write_json(students, path = "example.json")

Summary of Main Points

By now, you should be able to do the following:

Subset data in
Read/Import text-files, binary files (e.g., Excel, SAS, SPSS, Stata, etc), json files, etc using
Export data from