Introduction & Setup
This post will explain how to restructure data without a Graphical User Interface (GUI) program e.g. David Kenny’s website. Instead, I will show you how to manipulate data both in a general sense and specifically for dyadic data analysis in a more flexible way by writing my own code and sharing it with you. Be forewarned, writing R-scripts can be frustrating but the payoffs are huge. I highly encourage you to move away from GUIs and start writing scripts for many, many reasons that I won’t discuss here.
To manipulate and restructure data, I will be using some really useful R-packages that all have the same underlying philosophy (mostly because they were written by the same guy: Hadley Wickham). I would encourage you to reference his book R for Data Science, which explains some very useful theory on tidy data, research workflows, and tools that help execute a data analysis project in a clean, reproducible way.
To setup my session I will first load the packages I will need.
library(tidyverse)
library(haven)
library(pixiedust)
The tidyverse
Notice that I am loading a package called tidyverse
. This package will load the most commonly used R-packages for importing, tidying, and transforming data. These are:
ggplot2
for plottingtibble
for working withdata.frames
in a more efficient waytidyr
for “tidying” data (more later)readr
for reading tabular data into Rpurrr
for performing iteration over data structuresdplyr
for manipulating and joining data
For our purposes, we will mostly be using tidyr
, dplyr
, and purrr
. We will also be using a package that comes with the tidyverse
package but is not loaded explicitly by loading the tidyverse
package: haven
, which is very useful for loading SPSS, SAS, and Stata files into R. Note that when you install the tidyverse
package, you will also install many other very useful packages (see below):
tidyverse_packages() # list all packages including in the tidyverse
## [1] "broom" "cli" "crayon" "dbplyr"
## [5] "dplyr" "dtplyr" "forcats" "googledrive"
## [9] "googlesheets4" "ggplot2" "haven" "hms"
## [13] "httr" "jsonlite" "lubridate" "magrittr"
## [17] "modelr" "pillar" "purrr" "readr"
## [21] "readxl" "reprex" "rlang" "rstudioapi"
## [25] "rvest" "stringr" "tibble" "tidyr"
## [29] "xml2" "tidyverse"
Practice data
To begin our data restructuring walk-through, I first downloaded practice data from David Kenny’s data restructuring link. Reading through this webpage, it is clear that there are three different ways you might need to restructure data for dyadic data analysis:
- Converting individual data to dyadic
- Converting from individual data to pairwise
- Converting dyadic data to pairwise data
David Kenny’s website provides some input data for his data restructuring GUI programs. This is nice because I can download the input data and the output data and check the “correct” output data file against the one that I will produce later.
First, I downloaded all the data and put them in the same folder where I am conducting my analysis. The starting individual level data file is a SPSS file whereas the pairwise and dyadic data files are comma separated value files (.csv). Below I download the files and get them into R. Note how nice it is to work with many and potentially diverse files in one place.
# Individual level data:
indv <- read_sav("http://davidakenny.net/kkc/c1/indiv.sav") %>% # Use read_sav for SPSS files
rename(dyad_id = dyad) %>%
rename_all(tolower)
# Dyadic level data
dyad <- read_csv("http://davidakenny.net/progs/dyad.csv",col_types = "dddddddddd") %>%
rename_all(tolower)
# Pairwise level data:
pair <- read_csv("http://davidakenny.net/progs/pairwise.csv", col_types = "ddddddddddddd") %>%
rename_all(tolower)
1. Individual to Dyadic
Going from individual level data to dyadic level data is probably the most straightforward task of the three outlined above. With individual level data, each row represents a “case” or individual. Importantly, each individual is nested within a dyad, as indicated by the dyad_id
column below.
dyad_id | gender | self1 | self2 | self3 | self4 | betw |
---|---|---|---|---|---|---|
3 | 1 | 4 | 4 | 3 | 4 | 5 |
3 | -1 | 5 | 5 | 5 | 5 | 5 |
10 | 1 | 3 | 4 | 4 | 5 | 6 |
10 | -1 | 4 | 5 | 4 | 5 | 6 |
11 | 1 | 4 | 3 | 5 | 5 | 8 |
11 | -1 | 5 | 5 | 5 | 5 | 8 |
17 | 1 | 3 | 3 | 4 | 5 | 11 |
17 | -1 | 4 | 4 | 4 | 4 | 11 |
21 | 1 | 4 | 4 | 5 | 5 | 22 |
21 | -1 | 3 | 5 | 5 | 4 | 22 |
Note that our task will be to take the cells highlighted in dark gray and make new columns that will become our “partner” data. When we do this, we will be cutting out these rows entirely and our total N will be cut in half when we do this. Note that this is possible because our new columns will be named in a way that differentiates partners from actors thus eliminating our need for the gender
column. Since dyad_id
is redundant, we only need one row per dyad to distinguish our cases. Thus our resulting data should look like the table below.
Note that the dark gray cells are the same rows from the individual level data rows in dark gray above.
dyad_id | self1_w | self2_w | self3_w | self4_w | self1_h | self2_h | self3_h | self4_h | betw |
---|---|---|---|---|---|---|---|---|---|
3 | 5 | 5 | 5 | 5 | 4 | 4 | 3 | 4 | 5 |
10 | 4 | 5 | 4 | 5 | 3 | 4 | 4 | 5 | 6 |
11 | 5 | 5 | 5 | 5 | 4 | 3 | 5 | 5 | 8 |
17 | 4 | 4 | 4 | 4 | 3 | 3 | 4 | 5 | 11 |
21 | 3 | 5 | 5 | 4 | 4 | 4 | 5 | 5 | 22 |
So let’s actually perform this operation using R functions from the tidyverse
package. Below is the code that I used to convert individual data to dyad level data:
indv_dyad <- indv %>% # To be explained soon ;)
arrange(dyad_id) %>% #
gather(key,value,-dyad_id,-betw,-gender) %>% #
mutate(gender = ifelse(gender == 1,"h","w")) %>% #
unite(new_key,key,gender,sep = "_",remove=T) %>% #
spread(new_key,value) #
Let’s walk through the code in steps.
First, I’m taking the starting data indv
and telling dplyr
to use the function arrange()
. I pass the variable dyad_id
to arrange()
to tell dplyr
to sort the columns from the lowest to highest dyad_id
. This step is unnecessary but I like to arrange data in ways that make sense so I can better reason about the data and the functions I will need to call in order to complete a given data manipulation task.
Next, I use the function gather()
from the tidyr
package. This function is powerful; it takes your data set and rearranges it into “key-value pairs”. The most basic action gather()
performs is taking your entire data set and creating two columns: one for the key
name and the other for the values
. The first two arguments for gather()
are key
and value
. These are simply arbitrary names that will label the two columns I described above. Next, you can indicate columns that you don’t want to gather by typing the column name with an -
in front of it.
For our case, I simply named our key
and value
columns “key” and “value” (remember these are arbitrary). Then I told gather
, don’t gather our dyad_id
, betw
, and gender
columns. This simply means that they will be repeated however many times necessary. For our data, we are gathering all the “self” variables (there are 4 * 2 people within each dyad = 8 rows per dyad), thus all of our columns with a -
sign will be repeated 8 times per dyad.
indv %>% # <------------ Original data frame
arrange(dyad_id) %>% # Sort by the dyad_id column
gather(key,value, # Gather data into key and value columns
-dyad_id,# |
-betw, # | Do NOT gather these columns,
-gender) # | repeat them instead
dyad_id | gender | key | value |
---|---|---|---|
3 | 1 | self1 | 4 |
3 | -1 | self1 | 5 |
3 | 1 | self2 | 4 |
3 | -1 | self2 | 5 |
3 | 1 | self3 | 3 |
3 | -1 | self3 | 5 |
3 | 1 | self4 | 4 |
3 | -1 | self4 | 5 |
10 | 1 | self1 | 3 |
10 | -1 | self1 | 4 |
10 | 1 | self2 | 4 |
10 | -1 | self2 | 5 |
10 | 1 | self3 | 4 |
10 | -1 | self3 | 4 |
10 | 1 | self4 | 5 |
10 | -1 | self4 | 5 |
11 | 1 | self1 | 4 |
11 | -1 | self1 | 5 |
11 | 1 | self2 | 3 |
11 | -1 | self2 | 5 |
11 | 1 | self3 | 5 |
11 | -1 | self3 | 5 |
11 | 1 | self4 | 5 |
11 | -1 | self4 | 5 |
17 | 1 | self1 | 3 |
17 | -1 | self1 | 4 |
17 | 1 | self2 | 3 |
17 | -1 | self2 | 4 |
17 | 1 | self3 | 4 |
17 | -1 | self3 | 4 |
17 | 1 | self4 | 5 |
17 | -1 | self4 | 4 |
21 | 1 | self1 | 4 |
21 | -1 | self1 | 3 |
21 | 1 | self2 | 4 |
21 | -1 | self2 | 5 |
21 | 1 | self3 | 5 |
21 | -1 | self3 | 5 |
21 | 1 | self4 | 5 |
21 | -1 | self4 | 4 |
The next step is to make our key
column more specific. That is, as it currently stands, the rows of the the key
column only differentiate between each self variable (e.g. self1, self2, etc.). We need this column to differentiate between which self question AND which partner answered the that particular self question. Luckily, we have that information in our gender
variable. Our next task is to add gender information to our key column. Note that for some reason Dr. Kenny uses the suffixes "_h" and "_w" to differentiate actor and partner self scores so we’ll stick with that.
On the line after our call to gather()
, I use a function called mutate()
, which creates a new column (i.e. variable) as a function of other columns (variables). Here I just want to match Dr. Kenny’s example so I’m going to change gender
to be coded as 1 = "h"
and -1 = "w"
. This is achieved by using mutate()
and using the expression gender = ifelse(gender==1, "h","w")
inside mutate
. This tells R to replace gender
with the result of our ifelse
call. ifelse()
is useful because it takes a logical condition as it’s first argument, in our case gender==1
, for each row ifelse()
checks to see if that condition is true. If it is, it replaces the value for that row with the second argument of the ifelse()
, in our case "h"
. The third argument of ifelse()
controls what happens if the condition is not met (i.e. FALSE
), for example if gender is not equal to 1. Here we said we want to ifelse()
to replace gender
with "w"
when gender==1
(i.e. when gender==-1
).
Now that we have our newly recoded gender
column, we can unite()
gender
with our key
column. When we call unite()
, we are asking R to do exactly what it sounds like, “unite” the values of gender
with key
into a new column called new_key
. Note that we can specify and separator string, in our case "_"
which will separate the values in key
from gender
. See below:
Note that the default behavior for unite()
is to remove the original columns that were used to make the newly united column. This is usually a good idea. Here I’ve kept them for visualization purposes.
indv %>% # <------------ Original data frame
arrange(dyad_id) %>% # Sort by the dyad_id column
gather(key,value, # Gather data into key and value columns
-dyad_id, # <
-betw, # | Do NOT gather these columns, repeat them instead
-gender) %>% # <
unite(new_key, # <
key, # | Create a new variable, "new_key",
gender, # | by combining the values of
sep = "_", # | "key" and "gender"
remove=F) # <----- This is normally set to TRUE
dyad_id | key | gender | new_key | value |
---|---|---|---|---|
3 | self1 | h | self1_h | 4 |
3 | self1 | w | self1_w | 5 |
3 | self2 | h | self2_h | 4 |
3 | self2 | w | self2_w | 5 |
3 | self3 | h | self3_h | 3 |
3 | self3 | w | self3_w | 5 |
3 | self4 | h | self4_h | 4 |
3 | self4 | w | self4_w | 5 |
10 | self1 | h | self1_h | 3 |
10 | self1 | w | self1_w | 4 |
10 | self2 | h | self2_h | 4 |
10 | self2 | w | self2_w | 5 |
10 | self3 | h | self3_h | 4 |
10 | self3 | w | self3_w | 4 |
10 | self4 | h | self4_h | 5 |
10 | self4 | w | self4_w | 5 |
11 | self1 | h | self1_h | 4 |
11 | self1 | w | self1_w | 5 |
11 | self2 | h | self2_h | 3 |
11 | self2 | w | self2_w | 5 |
11 | self3 | h | self3_h | 5 |
11 | self3 | w | self3_w | 5 |
11 | self4 | h | self4_h | 5 |
11 | self4 | w | self4_w | 5 |
17 | self1 | h | self1_h | 3 |
17 | self1 | w | self1_w | 4 |
17 | self2 | h | self2_h | 3 |
17 | self2 | w | self2_w | 4 |
17 | self3 | h | self3_h | 4 |
17 | self3 | w | self3_w | 4 |
17 | self4 | h | self4_h | 5 |
17 | self4 | w | self4_w | 4 |
21 | self1 | h | self1_h | 4 |
21 | self1 | w | self1_w | 3 |
21 | self2 | h | self2_h | 4 |
21 | self2 | w | self2_w | 5 |
21 | self3 | h | self3_h | 5 |
21 | self3 | w | self3_w | 5 |
21 | self4 | h | self4_h | 5 |
21 | self4 | w | self4_w | 4 |
The last step in our process is to spread()
our new_key
column into new columns and use the value
column to fill up the cells in these new columns. Remember that our new_key
column now contains information about actors and partners as well as each of the 4 self variables. Each of these names, for example self1_h
will become a new column after we use spread()
. See below:
indv %>% # <------------ Original data frame
arrange(dyad_id) %>% # Sort by the dyad_id column
gather(key,value, # Gather data into key and value columns
-dyad_id, # <
-betw, # | Do NOT gather these columns, repeat them instead
-gender) %>% # <
unite(new_key, # <
key, # | Create a new variable, "new_key",
gender, # | by combining the values of
sep = "_", # | "key" and "gender"
remove=F) %>% # <
spread(new_key, # Spread the values of "key" into new columns and
value) # fill the cells of these columns with the values of "value"
dyad_id | self1_w | self2_w | self3_w | self4_w | self1_h | self2_h | self3_h | self4_h |
---|---|---|---|---|---|---|---|---|
3 | 5 | 5 | 5 | 5 | 4 | 4 | 3 | 4 |
10 | 4 | 5 | 4 | 5 | 3 | 4 | 4 | 5 |
11 | 5 | 5 | 5 | 5 | 4 | 3 | 5 | 5 |
17 | 4 | 4 | 4 | 4 | 3 | 3 | 4 | 5 |
21 | 3 | 5 | 5 | 4 | 4 | 4 | 5 | 5 |
Notice how our newly created data set looks identical to the original dyad
data downloaded from Dr. Kenny’s website. Let’s do a formal check. Using the dplyr
function setequal()
, we can check to see if there exactly the same number of columns (with the same names) and exactly the same number of rows (with the same exact values):
setequal(dyad,indv_dyad) # Match the two data sets, are they equal?
## [1] FALSE
They are! YAY!
2. Individual to Pairwise
Now that we know how to transform data from the individual level format to a dyadic one, let’s go over how to go from an individual level format to a pairwise format. Recall that in individual level data sets, we have one row per individual that is nested within a dyad. In pairwise data structures, we will keep this same general structure. Specifically, our input data file and output data file will have the same number of rows (i.e. the same N). The critical difference is that each row will represent BOTH the actor and partner data. That is, each individual’s data will be reflected as actor variables for that specific individual’s original row but will be reflected as partner data in that specific person’s partner row.
To illustrate, let’s look at the original individual data set again:
dyad_id | gender | self1 | self2 | self3 | self4 | betw |
---|---|---|---|---|---|---|
3 | 1 | 4 | 4 | 3 | 4 | 5 |
3 | -1 | 5 | 5 | 5 | 5 | 5 |
10 | 1 | 3 | 4 | 4 | 5 | 6 |
10 | -1 | 4 | 5 | 4 | 5 | 6 |
11 | 1 | 4 | 3 | 5 | 5 | 8 |
11 | -1 | 5 | 5 | 5 | 5 | 8 |
17 | 1 | 3 | 3 | 4 | 5 | 11 |
17 | -1 | 4 | 4 | 4 | 4 | 11 |
21 | 1 | 4 | 4 | 5 | 5 | 22 |
21 | -1 | 3 | 5 | 5 | 4 | 22 |
Note that we only have one set of “self” variables but each person has unique scores on these variables in their respective rows. What we need to do is add new columns reflecting partner data but maintain the same number of rows and inserting data from each dyad member’s partner into their row. Here is the output file from Dr. Kenny’s website:
dyad_id | partnum | betw | gender_a | self1_a | self2_a | self3_a | self4_a | gender_p | self1_p | self2_p | self3_p | self4_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 1 | 5 | 1 | 4 | 4 | 3 | 4 | -1 | 5 | 5 | 5 | 5 |
3 | 2 | 5 | -1 | 5 | 5 | 5 | 5 | 1 | 4 | 4 | 3 | 4 |
10 | 1 | 6 | 1 | 3 | 4 | 4 | 5 | -1 | 4 | 5 | 4 | 5 |
10 | 2 | 6 | -1 | 4 | 5 | 4 | 5 | 1 | 3 | 4 | 4 | 5 |
11 | 1 | 8 | 1 | 4 | 3 | 5 | 5 | -1 | 5 | 5 | 5 | 5 |
11 | 2 | 8 | -1 | 5 | 5 | 5 | 5 | 1 | 4 | 3 | 5 | 5 |
17 | 1 | 11 | 1 | 3 | 3 | 4 | 5 | -1 | 4 | 4 | 4 | 4 |
17 | 2 | 11 | -1 | 4 | 4 | 4 | 4 | 1 | 3 | 3 | 4 | 5 |
21 | 1 | 22 | 1 | 4 | 4 | 5 | 5 | -1 | 3 | 5 | 5 | 4 |
21 | 2 | 22 | -1 | 3 | 5 | 5 | 4 | 1 | 4 | 4 | 5 | 5 |
Notice how the data values in rows shaded in either dark gray or light gray are flipped across variables with the suffix "_a" and "_p". This is how the data look when every person’s data is reflected as both actor data and partner data. We have the same N as before, however, we have 5 new variables that reflect each partner’s data.
To see how to do this in R, we need to touch on some of the same concepts as before with Individual to Dyadic transformations. However, because systematically flipping certain pairs of rows and using them to create new columns is a relatively rare thing, I had to write some special code but I think it still fits within our tidyverse
framework discussed this far.
Below is the code I used to transform the individual data from Dr. Kenny’s website to a pairwise format. Let’s walk through it:
indv_pair <- indv %>%
split(.$dyad_id) %>%
map_df(function(x){
person1 <- x %>%
mutate(act.par = ifelse(gender == 1,"a","p")) %>%
gather(key,value,-dyad_id,-betw,-act.par) %>%
unite(new_key,key,act.par) %>%
spread(new_key,value)
person2 <- x %>%
mutate(act.par = ifelse(gender == 1,"p","a")) %>%
gather(key,value,-dyad_id,-betw,-act.par) %>%
unite(new_key,key,act.par) %>%
spread(new_key,value)
bind_rows(person1,person2)
}) %>%
mutate(partnum = ifelse(gender_a == 1,1,2)) %>%
select(dyad_id,partnum,betw,ends_with("_a"),ends_with("_p"))
First, I want to take the original indv
data set we worked with in the last walk-through. Here however, I’m going to use a function called split()
, which will take my data and create mini-data sets based on a grouping variable. Here I want split
to split my data according to dyad_id
. Note that, because split()
is not a tidyverse
function and because I am using the pipe
operator i.e %>%
, I needed to supply split()
with .
and the index operator $
to find the variable dyad_id
within the indv
data set. This tells split()
which variable within indv
I should split the data by, in this case dyad_id
. Here is the result:
indv %>%
split(.$dyad_id)
dyad_id | gender | self1 | self2 | self3 | self4 | betw |
---|---|---|---|---|---|---|
3 | 1 | 4 | 4 | 3 | 4 | 5 |
3 | -1 | 5 | 5 | 5 | 5 | 5 |
dyad_id | gender | self1 | self2 | self3 | self4 | betw |
---|---|---|---|---|---|---|
10 | 1 | 3 | 4 | 4 | 5 | 6 |
10 | -1 | 4 | 5 | 4 | 5 | 6 |
dyad_id | gender | self1 | self2 | self3 | self4 | betw |
---|---|---|---|---|---|---|
11 | 1 | 4 | 3 | 5 | 5 | 8 |
11 | -1 | 5 | 5 | 5 | 5 | 8 |
dyad_id | gender | self1 | self2 | self3 | self4 | betw |
---|---|---|---|---|---|---|
17 | 1 | 3 | 3 | 4 | 5 | 11 |
17 | -1 | 4 | 4 | 4 | 4 | 11 |
dyad_id | gender | self1 | self2 | self3 | self4 | betw |
---|---|---|---|---|---|---|
21 | 1 | 4 | 4 | 5 | 5 | 22 |
21 | -1 | 3 | 5 | 5 | 4 | 22 |
Next, we will use the map()
function from the purrr
package, which is part of the tidyverse
. map()
is another very powerful and flexible function that applies a function to each element of a list
or data.frame
. Here, map()
will be applying a function to each of those mini-datasets split()
created. That is, the result of split()
is a list
, which can hold anything inside them, of data.frames
. Since there is no explicit function for performing pairwise data restructuring, we are going to make our own function. This is where the function map()
becomes very flexible. It can apply a ready made function to every element of a list or data frame or you can define your own within the call to map()
, which is what I’m going to do.
Note that I have the suffix _df
at the end of map
. This simply means that I want map()
to make sure that the result is a data.frame
and nothing else. If my function does not return a data.frame
, map()
will throw an error telling me so. Normally, the first argument to map()
is a list
or data.frame
but remember we are piping in the list
of data.frame's
that split()
produced by cutting up indv
by dyad_id
.
Next, we tell map()
what function to perform to each of our mini-datasets. I use function(x)
to say that I want to define a new function and it will take the argument x
. The first thing I want to do is take x
and do some stuff to it and call it person1
. Note that map()
is going to iterate over our list
of data.frames
and this means that inside our function x
represents each individual mini-dataset we created.
Because we are flipping data around, I’m first going to have map()
take each mini-dataset and create a new variable using mutate()
called act.par
. I’m going to use ifelse()
to created act.par
based on gender
. If gender==1
, I want act.par=="a"
and if not I want act.par=="p"
. I’m using “a” and “p” to refer to actor and partner, respectively. Then I’m going to do some familiar things with gather()
, unite()
, and spread()
. Essentially, I’m gathering up all variables except dyad_id
, betw
, and act.par
(which will get repeated). Then combining key
and act.par
and spreading those columns back out. This will result in our mini-dataset having 1 row.
Then I repeat this process for a new object called person2
. This time, however, ifelse()
is flipping it’s conditions such that if gender==1
it get’s replaced with “p” and “a” if gender==-1
. For each mini-dataset, I have two objects, person1
and person2
. All, I need to do now is put person1
and person2
together and I have a pairwise mini-dataset.
Here is what the result would look like for one dyad mini-dataset:
indv %>%
split(.$dyad_id) %>%
map_df(function(x){
person1 <- x %>%
mutate(act.par = ifelse(gender == 1,"a","p")) %>%
gather(key,value,-dyad_id,-betw,-act.par) %>%
unite(new_key,key,act.par) %>%
spread(new_key,value)
person2 <- x %>%
mutate(act.par = ifelse(gender == 1,"p","a")) %>%
gather(key,value,-dyad_id,-betw,-act.par) %>%
unite(new_key,key,act.par) %>%
spread(new_key,value)
bind_rows(person1,person2)
})
Person 1 (gathered)
dyad_id | betw | new_key | value |
---|---|---|---|
3 | 5 | gender_a | 1 |
3 | 5 | gender_p | -1 |
3 | 5 | self1_a | 4 |
3 | 5 | self1_p | 5 |
3 | 5 | self2_a | 4 |
3 | 5 | self2_p | 5 |
3 | 5 | self3_a | 3 |
3 | 5 | self3_p | 5 |
3 | 5 | self4_a | 4 |
3 | 5 | self4_p | 5 |
Person 1 (spread out)
dyad_id | betw | gender_a | self1_a | self2_a | self3_a | self4_a | gender_p | self1_p | self2_p | self3_p | self4_p |
---|---|---|---|---|---|---|---|---|---|---|---|
3 | 5 | 1 | 4 | 4 | 3 | 4 | -1 | 5 | 5 | 5 | 5 |
Person 2 (gathered)
dyad_id | betw | new_key | value |
---|---|---|---|
3 | 5 | gender_p | 1 |
3 | 5 | gender_a | -1 |
3 | 5 | self1_p | 4 |
3 | 5 | self1_a | 5 |
3 | 5 | self2_p | 4 |
3 | 5 | self2_a | 5 |
3 | 5 | self3_p | 3 |
3 | 5 | self3_a | 5 |
3 | 5 | self4_p | 4 |
3 | 5 | self4_a | 5 |
Person 2 (spread out)
dyad_id | betw | gender_a | self1_a | self2_a | self3_a | self4_a | gender_p | self1_p | self2_p | self3_p | self4_p |
---|---|---|---|---|---|---|---|---|---|---|---|
3 | 5 | -1 | 5 | 5 | 5 | 5 | 1 | 4 | 4 | 3 | 4 |
Combined mini-dataset
dyad_id | betw | gender_a | self1_a | self2_a | self3_a | self4_a | gender_p | self1_p | self2_p | self3_p | self4_p |
---|---|---|---|---|---|---|---|---|---|---|---|
3 | 5 | 1 | 4 | 4 | 3 | 4 | -1 | 5 | 5 | 5 | 5 |
3 | 5 | -1 | 5 | 5 | 5 | 5 | 1 | 4 | 4 | 3 | 4 |
Recall that a convenient quality of the purrr
package’s map()
functions is that you can supply a suffix to map()
such as map_df()
and that particular map()
function will be sure to give you a data.frame
as a result. This means that, although split()
gave us a list
, this list
was comprised of data.frames
so map_df()
will automatically combine all of our mini-datasets back into one larger dataset. The result will be our final pairwise-transformed data set. The last two lines just adds a new partnum
variable to help us remember who is who and then I simply order the variables according the order that Dr. Kenny has them ordered:
dyad_id | partnum | betw | gender_a | self1_a | self2_a | self3_a | self4_a | gender_p | self1_p | self2_p | self3_p | self4_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 1 | 5 | 1 | 4 | 4 | 3 | 4 | -1 | 5 | 5 | 5 | 5 |
3 | 2 | 5 | -1 | 5 | 5 | 5 | 5 | 1 | 4 | 4 | 3 | 4 |
10 | 1 | 6 | 1 | 3 | 4 | 4 | 5 | -1 | 4 | 5 | 4 | 5 |
10 | 2 | 6 | -1 | 4 | 5 | 4 | 5 | 1 | 3 | 4 | 4 | 5 |
11 | 1 | 8 | 1 | 4 | 3 | 5 | 5 | -1 | 5 | 5 | 5 | 5 |
11 | 2 | 8 | -1 | 5 | 5 | 5 | 5 | 1 | 4 | 3 | 5 | 5 |
17 | 1 | 11 | 1 | 3 | 3 | 4 | 5 | -1 | 4 | 4 | 4 | 4 |
17 | 2 | 11 | -1 | 4 | 4 | 4 | 4 | 1 | 3 | 3 | 4 | 5 |
21 | 1 | 22 | 1 | 4 | 4 | 5 | 5 | -1 | 3 | 5 | 5 | 4 |
21 | 2 | 22 | -1 | 3 | 5 | 5 | 4 | 1 | 4 | 4 | 5 | 5 |
Is our new pairwise dataset identical to Dr. Kenny’s?
setequal(pair,indv_pair)
## [1] TRUE
It is. It is indeed. ;)
3. Dyad to Pairwise
The final case where you might need to restructure your data from a dyadic structure to a pairwise structure. To do this transformation, we will simply do some reverse engineering of the transformations we’ve already performed. Note that at this point in the tutorial, you’ve already learned a lot about how to do different transformations using tidyverse
packages and functions. Now we just need to apply the same skills we’ve used already to a new situation.
Recall that our dyad data structure has half as many rows as our individual level data. Each row represents a dyad and we have two sets of self ratings - one for the actor and the other for the partner - denoted with a suffix "_a" or "_p". See below:
dyad_id | self1_w | self2_w | self3_w | self4_w | self1_h | self2_h | self3_h | self4_h | betw |
---|---|---|---|---|---|---|---|---|---|
3 | 5 | 5 | 5 | 5 | 4 | 4 | 3 | 4 | 5 |
10 | 4 | 5 | 4 | 5 | 3 | 4 | 4 | 5 | 6 |
11 | 5 | 5 | 5 | 5 | 4 | 3 | 5 | 5 | 8 |
17 | 4 | 4 | 4 | 4 | 3 | 3 | 4 | 5 | 11 |
21 | 3 | 5 | 5 | 4 | 4 | 4 | 5 | 5 | 22 |
To move from this dyadic data structure to a pairwise data structure, we need to expand this data again so that we have one row per person (i.e. double the N) but we need to keep both sets of “self” variables for actors and partners. Below is the code I use to move from a dyadic data structure to a pairwise structure:
dyad_pair <- dyad %>%
gather(key,value,-dyad_id,-betw) %>%
mutate(gender = ifelse(str_detect(key,"_h"),1,-1),
key = str_replace(key,"_w|_h","")) %>%
spread(key,value) %>%
split(.$dyad_id) %>%
map_df(function(x){
person1 <- x %>%
mutate(act.par = ifelse(gender == 1,"a","p")) %>%
gather(key,value,-dyad_id,-betw,-act.par) %>%
unite(key,key,act.par) %>%
spread(key,value)
person2 <- x %>%
mutate(act.par = ifelse(gender == 1,"p","a")) %>%
gather(key,value,-dyad_id,-betw,-act.par) %>%
unite(key,key,act.par) %>%
spread(key,value)
bind_rows(person1,person2)
}) %>%
mutate(partnum = ifelse(gender_a == 1,1,2)) %>%
select(dyad_id,partnum,betw,ends_with("_a"),ends_with("_p"))
All of this code should look very familiar. In fact, most of it is copied and pasted from our individual to pairwise data restructuring code. This is because the only real difference between dyadic to pairwise and individual to pairwise data transformation is turning dyadic data back into individual level data. After that is complete, we follow the same steps we took when we converted individual to pairwise data transformation.
Note that going from dyadic to individual level data is an easy task because all you need to do perform the reverse actions on the dyadic data that you used to get there in the first place. Note that the functions we have been using from the tidyr
package (one of the foundational packages in the tidyverse
) are all reversible. For example, the function gather()
and spread()
actually undo each other. The same is true for unite()
and a function we have not used yet, separate()
; unite()
puts the values of two columns together whereas separate()
breaks them apart, undoing the work of unite()
.
To illustrate, let’s look at our code from the individual to dyadic data restructuring walk-through. Note steps 1-4:
indv_dyad <- indv %>% #
arrange(dyad_id) %>% #
gather(key,value,-dyad_id,-betw,-gender) %>% # <- 1) gather
mutate(gender = ifelse(gender == 1,"h","w")) %>% # <- 2) Recode gender
unite(new_key,key,gender,sep = "_",remove=T) %>% # <- 3) unite gender and key
spread(new_key,value) # <---------------------------- 4) spread your colums out
And take a look at the code that we will use to go back to individual level data. We will now perform the reverse operations (the opposite functions of the above code) in reverse order (performing steps 1-4 in reverse order):
dyad %>%
gather(key,value,-dyad_id,-betw) %>% # <----------- Undo step 4): use 'gather()'
separate(key,c("key","gender"),sep = "_") %>% # <- Undo step 3): undo 'unite()'
mutate(gender = ifelse(gender=="h",1,-1)) %>% # <- Undo step 2): recode gender
spread(key,value) # <------------------------------ Undo step 1): undo gather
- Here is the result of the first step, undoing
spread()
:
dyad %>%
gather(key,value,-dyad_id,-betw)
dyad_id | betw | key | value |
---|---|---|---|
3 | 5 | self1_w | 5 |
3 | 5 | self2_w | 5 |
3 | 5 | self3_w | 5 |
3 | 5 | self4_w | 5 |
3 | 5 | self1_h | 4 |
3 | 5 | self2_h | 4 |
3 | 5 | self3_h | 3 |
3 | 5 | self4_h | 4 |
10 | 6 | self1_w | 4 |
10 | 6 | self2_w | 5 |
10 | 6 | self3_w | 4 |
10 | 6 | self4_w | 5 |
10 | 6 | self1_h | 3 |
10 | 6 | self2_h | 4 |
10 | 6 | self3_h | 4 |
10 | 6 | self4_h | 5 |
11 | 8 | self1_w | 5 |
11 | 8 | self2_w | 5 |
11 | 8 | self3_w | 5 |
11 | 8 | self4_w | 5 |
11 | 8 | self1_h | 4 |
11 | 8 | self2_h | 3 |
11 | 8 | self3_h | 5 |
11 | 8 | self4_h | 5 |
17 | 11 | self1_w | 4 |
17 | 11 | self2_w | 4 |
17 | 11 | self3_w | 4 |
17 | 11 | self4_w | 4 |
17 | 11 | self1_h | 3 |
17 | 11 | self2_h | 3 |
17 | 11 | self3_h | 4 |
17 | 11 | self4_h | 5 |
21 | 22 | self1_w | 3 |
21 | 22 | self2_w | 5 |
21 | 22 | self3_w | 5 |
21 | 22 | self4_w | 4 |
21 | 22 | self1_h | 4 |
21 | 22 | self2_h | 4 |
21 | 22 | self3_h | 5 |
21 | 22 | self4_h | 5 |
- Now the second step, undoing
unite()
:
dyad %>%
gather(key,value,-dyad_id,-betw) %>%
separate(key,c("key","gender"),sep = "_")
dyad_id | betw | key | gender | value |
---|---|---|---|---|
3 | 5 | self1 | w | 5 |
3 | 5 | self2 | w | 5 |
3 | 5 | self3 | w | 5 |
3 | 5 | self4 | w | 5 |
3 | 5 | self1 | h | 4 |
3 | 5 | self2 | h | 4 |
3 | 5 | self3 | h | 3 |
3 | 5 | self4 | h | 4 |
10 | 6 | self1 | w | 4 |
10 | 6 | self2 | w | 5 |
10 | 6 | self3 | w | 4 |
10 | 6 | self4 | w | 5 |
10 | 6 | self1 | h | 3 |
10 | 6 | self2 | h | 4 |
10 | 6 | self3 | h | 4 |
10 | 6 | self4 | h | 5 |
11 | 8 | self1 | w | 5 |
11 | 8 | self2 | w | 5 |
11 | 8 | self3 | w | 5 |
11 | 8 | self4 | w | 5 |
11 | 8 | self1 | h | 4 |
11 | 8 | self2 | h | 3 |
11 | 8 | self3 | h | 5 |
11 | 8 | self4 | h | 5 |
17 | 11 | self1 | w | 4 |
17 | 11 | self2 | w | 4 |
17 | 11 | self3 | w | 4 |
17 | 11 | self4 | w | 4 |
17 | 11 | self1 | h | 3 |
17 | 11 | self2 | h | 3 |
17 | 11 | self3 | h | 4 |
17 | 11 | self4 | h | 5 |
21 | 22 | self1 | w | 3 |
21 | 22 | self2 | w | 5 |
21 | 22 | self3 | w | 5 |
21 | 22 | self4 | w | 4 |
21 | 22 | self1 | h | 4 |
21 | 22 | self2 | h | 4 |
21 | 22 | self3 | h | 5 |
21 | 22 | self4 | h | 5 |
- Next, we recode gender back to -1 and 1:
dyad %>%
gather(key,value,-dyad_id,-betw) %>%
separate(key,c("key","gender"),sep = "_") %>%
mutate(gender = ifelse(gender=="h",1,-1))
dyad_id | betw | key | gender | value |
---|---|---|---|---|
3 | 5 | self1 | -1 | 5 |
3 | 5 | self2 | -1 | 5 |
3 | 5 | self3 | -1 | 5 |
3 | 5 | self4 | -1 | 5 |
3 | 5 | self1 | 1 | 4 |
3 | 5 | self2 | 1 | 4 |
3 | 5 | self3 | 1 | 3 |
3 | 5 | self4 | 1 | 4 |
10 | 6 | self1 | -1 | 4 |
10 | 6 | self2 | -1 | 5 |
10 | 6 | self3 | -1 | 4 |
10 | 6 | self4 | -1 | 5 |
10 | 6 | self1 | 1 | 3 |
10 | 6 | self2 | 1 | 4 |
10 | 6 | self3 | 1 | 4 |
10 | 6 | self4 | 1 | 5 |
11 | 8 | self1 | -1 | 5 |
11 | 8 | self2 | -1 | 5 |
11 | 8 | self3 | -1 | 5 |
11 | 8 | self4 | -1 | 5 |
11 | 8 | self1 | 1 | 4 |
11 | 8 | self2 | 1 | 3 |
11 | 8 | self3 | 1 | 5 |
11 | 8 | self4 | 1 | 5 |
17 | 11 | self1 | -1 | 4 |
17 | 11 | self2 | -1 | 4 |
17 | 11 | self3 | -1 | 4 |
17 | 11 | self4 | -1 | 4 |
17 | 11 | self1 | 1 | 3 |
17 | 11 | self2 | 1 | 3 |
17 | 11 | self3 | 1 | 4 |
17 | 11 | self4 | 1 | 5 |
21 | 22 | self1 | -1 | 3 |
21 | 22 | self2 | -1 | 5 |
21 | 22 | self3 | -1 | 5 |
21 | 22 | self4 | -1 | 4 |
21 | 22 | self1 | 1 | 4 |
21 | 22 | self2 | 1 | 4 |
21 | 22 | self3 | 1 | 5 |
21 | 22 | self4 | 1 | 5 |
- Finally, we
spread()
the columns back out, undoinggather()
dyad %>%
gather(key,value,-dyad_id,-betw) %>%
separate(key,c("key","gender"),sep = "_") %>%
mutate(gender = ifelse(gender=="h",1,-1)) %>%
spread(key,value)
dyad_id | betw | gender | self1 | self2 | self3 | self4 |
---|---|---|---|---|---|---|
3 | 5 | -1 | 5 | 5 | 5 | 5 |
3 | 5 | 1 | 4 | 4 | 3 | 4 |
10 | 6 | -1 | 4 | 5 | 4 | 5 |
10 | 6 | 1 | 3 | 4 | 4 | 5 |
11 | 8 | -1 | 5 | 5 | 5 | 5 |
11 | 8 | 1 | 4 | 3 | 5 | 5 |
17 | 11 | -1 | 4 | 4 | 4 | 4 |
17 | 11 | 1 | 3 | 3 | 4 | 5 |
21 | 22 | -1 | 3 | 5 | 5 | 4 |
21 | 22 | 1 | 4 | 4 | 5 | 5 |
Now our dataset is back to its individual level form. To get to a pairwise data structure, we simply do exactly the same steps we performed in Individual to Pairwise tutorial. Here is the full code again:
dyad_pair <- dyad %>% #
gather(key,value,-dyad_id,-betw) %>% #
mutate(gender = ifelse(str_detect(key,"_h"),1,-1), # Going back to individual level
key = str_replace(key,"_w|_h","")) %>% #
spread(key,value) %>% #
split(.$dyad_id) %>% #
map_df(function(x){ #
#
person1 <- x %>% #
mutate(act.par = ifelse(gender == 1,"a","p")) %>% #
gather(key,value,-dyad_id,-betw,-act.par) %>% # These are the same
unite(key,key,act.par) %>% # steps we took when
spread(key,value) # we transfomred individual
# to pairwise data structures
person2 <- x %>% #
mutate(act.par = ifelse(gender == 1,"p","a")) %>% #
gather(key,value,-dyad_id,-betw,-act.par) %>% #
unite(key,key,act.par) %>% #
spread(key,value) #
#
bind_rows(person1,person2) #
}) %>%
mutate(partnum = ifelse(gender_a == 1,1,2)) %>%
select(dyad_id,partnum,betw,ends_with("_a"),ends_with("_p"))
dyad_id | partnum | betw | gender_a | self1_a | self2_a | self3_a | self4_a | gender_p | self1_p | self2_p | self3_p | self4_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 1 | 5 | 1 | 4 | 4 | 3 | 4 | -1 | 5 | 5 | 5 | 5 |
3 | 2 | 5 | -1 | 5 | 5 | 5 | 5 | 1 | 4 | 4 | 3 | 4 |
10 | 1 | 6 | 1 | 3 | 4 | 4 | 5 | -1 | 4 | 5 | 4 | 5 |
10 | 2 | 6 | -1 | 4 | 5 | 4 | 5 | 1 | 3 | 4 | 4 | 5 |
11 | 1 | 8 | 1 | 4 | 3 | 5 | 5 | -1 | 5 | 5 | 5 | 5 |
11 | 2 | 8 | -1 | 5 | 5 | 5 | 5 | 1 | 4 | 3 | 5 | 5 |
17 | 1 | 11 | 1 | 3 | 3 | 4 | 5 | -1 | 4 | 4 | 4 | 4 |
17 | 2 | 11 | -1 | 4 | 4 | 4 | 4 | 1 | 3 | 3 | 4 | 5 |
21 | 1 | 22 | 1 | 4 | 4 | 5 | 5 | -1 | 3 | 5 | 5 | 4 |
21 | 2 | 22 | -1 | 3 | 5 | 5 | 4 | 1 | 4 | 4 | 5 | 5 |
Finally, is our dyad_pair
dataset the same as Dr. Kenny’s pairwise dataset?
setequal(pair,dyad_pair)
## [1] TRUE
Success!!