Data Types

Data Types

A data type is an attribute of data that gives a programmer an indicator of how the data will be used. In R there are a wide assortment of data types that include scalars(character, numeric, logical, integer, complex), vectors, matrices, and data frames. It is important to know the data type of each variable in a dataset before analyses can be done. Data types can make calculations faster and more efficient depending on the analyses.

Scalars

Scalars are data types that hold only one type at a time, whether it be a character, numeric, logical, integer, or complex. We are going to download a multitude of datasets through the ‘datasets’ package in R to look at some examples of data types.

Character variable data types are usually alphabetic but always require "" when used in a dataset. Ex: “Apple”, “Porsche”, “Tom”… etc.

Numeric is self exaplanatory. This datatype is the most common data type that I encounter and include anything, well numeric. Ex: 9494, 950300, 00008… etc. The two most common types of numeric data types are integer and double.

1)Integer is a numeric data type that stores whole numbers, anything from negative to positive values. For SQL servers,integers range from -2,147,483,647 to 2,147,483,647.

2)Double is a numeric data type that stores numbers with up to 17 decimals. Ex. 59903.99595959

Logical is a special data type with only two possible values. Ex: 0/1, true/false, yes/no, etc. I typically use logical data types as a flag variable if I want to identify if a condition exists or not.

Complex data type is anything that is not included in the previous data types. Example is 1 - 2i, see example below

# numeric is probably the most common data type you will encounter in data science

# downloading 'datasets' package
library(datasets)

# viewing all datasets in the package
data()

# picking 'mtcars' dataset to view. Head function gives us a quick glance at all of the columns
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
# 'str' function like the head function lets us view all variables in the dataset but lists the data types in the dataset as well
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
#Here we can see the UC Berkley Admissions dataset that contains character variables
str(UCBAdmissions)
##  'table' num [1:2, 1:2, 1:6] 512 313 89 19 353 207 17 8 120 205 ...
##  - attr(*, "dimnames")=List of 3
##   ..$ Admit : chr [1:2] "Admitted" "Rejected"
##   ..$ Gender: chr [1:2] "Male" "Female"
##   ..$ Dept  : chr [1:6] "A" "B" "C" "D" ...
#Example of complex data type.  The 'class' function is used to determine the data type.
z = 1 + 2i

z
## [1] 1+2i
class(z)
## [1] "complex"

Vectors

A vector is a basic data structure that always contain elements of the same type

# the '<-' is used to store values in a variable, and 'c()' is the combine function
 x <- c(9, 4, 5, 6, 7, 10)

#all values in this data structure are double
typeof(x)
## [1] "double"
#Because vectors contain elements of the same type, this function will try to convert all values in the vector to one data type.  This conversion occurs from lower to higher types, from logical to integer to double to character, with character data type being the most influential
x <- c(1, 5.4, TRUE, "hello")

x
## [1] "1"     "5.4"   "TRUE"  "hello"
# Here we can see that all values were converted to a character data type
class(x)
## [1] "character"
#There are many caveats to vectors but for the sake of staying on topic, we will discuss in another post

Matrices

Matrices are data structures that are two dimensional, and like vectors will contain values of all the same data type.

Personally I do not use matrices all that much, except maybe correlation tables related to stock analyses, but otherwise they are important to know about.

#You can create a matrix using the 'matrix' function matrix(data, nrow, ncol, byrow, dimnames)
M <- matrix(c(1:45), nrow=5, byrow = TRUE)

#new function 'print' shows the input value or matrix
print(M)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,]    1    2    3    4    5    6    7    8    9
## [2,]   10   11   12   13   14   15   16   17   18
## [3,]   19   20   21   22   23   24   25   26   27
## [4,]   28   29   30   31   32   33   34   35   36
## [5,]   37   38   39   40   41   42   43   44   45

Data Frames

Data Frames are the data type that most people are familiar with. Like matrices, data frames are two dimensional structures but contain variables that can be composed of different data types such as numeric, character or logical. Data frames have rows and columns, usually with rows of unique observations

data()

#Looking at Freeny's Revenue Data.  
head(freeny)
##               y lag.quarterly.revenue price.index income.level market.potential
## 1962.25 8.79236               8.79636     4.70997      5.82110          12.9699
## 1962.5  8.79137               8.79236     4.70217      5.82558          12.9733
## 1962.75 8.81486               8.79137     4.68944      5.83112          12.9774
## 1963    8.81301               8.81486     4.68558      5.84046          12.9806
## 1963.25 8.90751               8.81301     4.64019      5.85036          12.9831
## 1963.5  8.93673               8.90751     4.62553      5.86464          12.9854
class(freeny)
## [1] "data.frame"

Hope this information is helpful.

Colin Benusa
Data Analyst

Experienced data analyst with a thorough understanding of complex statistical methods, experience with a variety of statistical tools, and knowledge in finance and financial analyses. I am a leader and dependable colleague with a unique array of past experiences that bring a new perspective to the workplace. I work well independently, but thrive in team-based environments that are fast paced and dynamic.