This vignette explains the aim, input, output, and methods of the
function remify::remify().
The objective of remify::remify() is to process raw relational event sequences supplied by the user along with other inputs that characterize the data (actors’ names, event types, starting time point of the event sequence, manual risk set specification, etc.). The internal routines process the structure of the input event sequence into a standardized format, providing objects that are used by the other packages in ‘remverse’.
As example, we will use the data randomREH
(documentation available via ?randomREH).
library(remify) # loading library
data(randomREH) # loading data
names(randomREH) # objects inside the list 'randomREH'## [1] "edgelist" "actors" "types" "origin" "omit_dyad"
Input arguments that can be supplied to remify()
are: edgelist, directed, ordinal,
model, thin, actors,
riskset, manual.riskset,
event_type, origin, time.units,
attach_riskset, riskset_decode,
riskset_max_decode, event_covariates, and
ncores.
The edgelist must be a data.frame with three mandatory
columns: the time of the interaction in the first column, and the two
actors forming the dyad in the second and third column. The naming of
the first three columns is not required but the order must be
[time, actor1, actor2]. For directed networks, the second
column is the sender and the third is the receiver. For undirected
networks, the order of the second and third columns is ignored (dyads
are sorted alphanumerically). Optional columns are weight
(event weights affecting endogenous statistics) and event type columns
(see event_type below).
## time actor1 actor2 type
## 1 2020-03-05 02:47:08 Kayla Kiffani competition
## 2 2020-03-05 02:50:18 Colton Justin conflict
## 3 2020-03-05 03:30:26 Kelsey Maya cooperation
## 4 2020-03-05 03:38:50 Alexander Colton competition
## 5 2020-03-05 03:56:16 Wyatt Kelsey conflict
## 6 2020-03-05 04:06:45 Derek Breanna competition
A logical TRUE/FALSE value indicating
whether events are directed (TRUE) or undirected
(FALSE). If FALSE, dyads are sorted according
to their alphanumeric order
(e.g. [actor1, actor2] = ["Colton", "Alexander"] becomes
["Alexander", "Colton"]). Note that undirected networks are
only supported for tie-oriented modeling.
A logical TRUE/FALSE value indicating
whether only the order of events matters in the model
(TRUE) or whether waiting times must also be considered
(FALSE). Based on this argument, the processing of the time
variable is carried out differently and remstimate will use
either the ordinal (if ordinal = TRUE) or the
interval (if ordinal = FALSE) time likelihood.
When ordinal = TRUE, the intereventTime field
of the returned object is NULL.
Either "tie" (default) or "actor". For
"tie", the risk set is at the dyad level. For
"actor", the model has two sub-processes: a sender rate
model and a receiver choice model. Actor-oriented modeling requires
directed = TRUE. When model = "actor", the
returned object additionally contains sender_riskset,
receiver_riskset, and activeN.
An integer >= 1 controlling event-time thinning based on unique
time points. Keeps every thin-th unique event time and maps
each event time to the next kept time point. This reduces the number of
unique time points and thus memory and computation in downstream steps.
The default is 1 (no thinning).
An optional character vector of actor names. If NULL
(default), actor names are taken from the input edgelist.
Specifying actors explicitly is useful when actors that
could interact during the study did not appear in any observed event and
should nonetheless be included in the risk set.
## [1] "Crystal" "Colton" "Lexy" "Kelsey" "Michaela" "Zackary"
## [7] "Richard" "Maya" "Wyatt" "Kiffani" "Alexander" "Kayla"
## [13] "Derek" "Justin" "Andrey" "Francesca" "Megan" "Mckenna"
## [19] "Charles" "Breanna"
The riskset argument specifies the type of risk set.
Four options are available:
"full" (default): all possible dyadic events given the
number of actors (and event types) are at risk throughout the entire
event history."active": only the dyadic events observed at least once
in the event history are at risk. This can substantially reduce
computation time for sparse networks."active_saturated": extends the active risk set by
adding the reverse direction for each observed dyad (if A→B is observed,
B→A is also at risk) and includes all event types for each observed
actor pair. This reflects the assumption that observing any interaction
between two actors implies both directions and all types are
possible."manual": a user-defined static risk set specified via
manual.riskset. Observed dyads absent from
manual.riskset are automatically added.More details about risk set definitions are provided in
vignette(topic = "riskset", package = "remify").
Required when riskset = "manual". A
data.frame with columns actor1 and
actor2 (and optionally type) specifying the
complete set of dyads that are at risk throughout the event history.
This defines a static risk set: unlike the deprecated
omit_dyad argument, the risk set does not vary over time.
Any observed dyads from the edgelist that are missing from
manual.riskset are added automatically.
# Example: restrict the risk set to a specific set of dyads
my_riskset <- data.frame(
actor1 = c("Alexander", "Colton", "Lexy"),
actor2 = c("Kayla", "Lexy", "Alexander")
)
reh_manual <- remify(
edgelist = randomREH$edgelist,
directed = TRUE,
model = "tie",
riskset = "manual",
manual.riskset = my_riskset
)An optional character string giving the name of the column in
edgelist that contains event types (marks). If
NULL (default), remify() uses
edgelist$type if it exists; otherwise events are treated as
untyped. If a column name is supplied, that column is used as the
event-type mark. When event types are present, the dyadic risk set is
extended over types: each dyad is duplicated for each event type.
The initial time (\(t_0\)) of the
observation period. If known, it can be specified here; it must have the
same class as the time column in the input
edgelist. If left unspecified (NULL), it is
set by default to one average waiting time before the first observed
event. In the randomREH data a \(t_0\) is provided:
## [1] "2020-03-05 02:32:53 CET"
A character string specifying the time unit for converting time
values when edgelist$time is of class Date or
POSIXct. Ignored for numeric or integer time. Default is
"auto", which selects seconds.
Logical (default TRUE). When TRUE, a list
riskset_info is attached to the returned
remify object. This list contains the effective risk set
representation (e.g., riskset_idx, decoded dyad tables, and
basic risk set metadata) and is intended to make the object
self-describing and easier to inspect.
Controls how the included risk set dyads are decoded and attached in
riskset_info$included:
"labels" (default): attach a decoded dyad table with
actor (and type) name labels."ids": attach a decoded dyad table with integer IDs
only."none": do not attach a decoded dyad table.Integer (default 200000L). Maximum number of included
dyads for which riskset_decode = "labels" is performed. If
the risk set exceeds this threshold, decoding falls back to
"ids" with a warning, to avoid excessive memory usage.
An optional character vector of column names in edgelist
to retain as additional event-level variables in the returned
remify object. These are stored in
reh$event_covariates together with time,
actor1, actor2, and an internal
.event_id. This is useful when downstream functions need
access to event-level marks that are not part of the core
reh$edgelist.
An optional integer specifying the number of cores used in the
parallelization of internal processing functions (default is
1).
The output of remify() is an S3 object of class
remify. Its top-level elements are:
## [1] "M" "N" "C" "D"
## [5] "intereventTime" "edgelist" "edgelist_id" "meta"
## [9] "ids" "index" "riskset_info"
M is the number of observed time points. If two or more
events occur at the same time point, M counts unique time
points and the total number of events is returned by E (see
below). If all events occur at different time points, M
equals the number of events.
## [1] 9915
E is the total number of observed events, returned only
when simultaneous events exist (i.e., when M < E).
D is the number of dyads in the full risk set, i.e., the
largest possible risk set size:
When riskset is "active" or
"manual", activeD gives the number of dyads in
the reduced risk set.
## [1] 380
A numeric vector of waiting times between subsequent events: \[\begin{bmatrix} t_1 - t_0 \\ t_2 - t_1 \\ \cdots \\ t_M - t_{M-1} \end{bmatrix}\]
## [1] 14.244936 3.166164 40.139102 8.401134 17.434267 10.467975
intereventTime is NULL when
ordinal = TRUE.
The processed input edgelist as a data.frame with
columns [time, actor1, actor2] (plus type
and/or weight if supplied). Events are re-ordered by time
if necessary.
## time actor1 actor2 type
## 1 14.24494 Kayla Kiffani competition
## 2 17.41110 Colton Justin conflict
## 3 57.55020 Kelsey Maya cooperation
## 4 65.95134 Alexander Colton competition
## 5 83.38560 Wyatt Kelsey conflict
## 6 93.85358 Derek Breanna competition
A per-event integer ID summary (internal use by downstream packages).
A list of metadata describing the processed event history. This
replaces the old attr()-based interface. Key fields:
## [1] "model" "directed" "ordinal"
## [4] "weighted" "with_type" "with_type_riskset"
## [7] "C_riskset" "riskset" "riskset_source"
## [10] "origin" "ncores" "dictionary"
## [13] "event_covariates"
meta$model: tie-oriented or actor-oriented
modelingmeta$directed: whether events are directedmeta$ordinal: whether ordinal likelihood is usedmeta$riskset: the type of risk setmeta$dictionary: a list of two data.frames
— actors (columns actorName,
actorID) and types (columns
typeName, typeID) — sorted
alphanumericallymeta$origin: the starting time \(t_0\)meta$ncores: number of cores used## [1] TRUE
## [1] "tie"
## $actors
## actorName actorID
## 1 Alexander 1
## 2 Andrey 2
## 3 Breanna 3
## 4 Charles 4
## 5 Colton 5
## 6 Crystal 6
## 7 Derek 7
## 8 Francesca 8
## 9 Justin 9
## 10 Kayla 10
## 11 Kelsey 11
## 12 Kiffani 12
## 13 Lexy 13
## 14 Maya 14
## 15 Mckenna 15
## 16 Megan 16
## 17 Michaela 17
## 18 Richard 18
## 19 Wyatt 19
## 20 Zackary 20
##
## $types
## typeName typeID
## 1 competition 1
## 2 conflict 2
## 3 cooperation 3
A list of per-event integer IDs for actor1,
actor2, dyad, and type. These
replace the old attr(reh, "actor1ID") etc. interface.
## [1] 182
## [1] 10
A list of decoded risk set tables. For tie-oriented modeling,
contains dyad_map (full risk set) or
dyad_map_active (active/manual risk set). For
actor-oriented modeling, contains sender_map.
For tie-oriented modeling, a dynamic risk set modification list
(empty list when riskset is not "manual"). For
actor-oriented modeling, always an empty list.
When attach_riskset = TRUE (the default), this list
contains the effective risk set representation used for estimation. The
field riskset_info$included contains the decoded risk set
dyad table (format controlled by riskset_decode):
## actor1 actor2 dyadID actor1ID actor2ID
## 1 Alexander Andrey 1 1 2
## 2 Alexander Breanna 2 1 3
## 3 Alexander Charles 3 1 4
## 4 Alexander Colton 4 1 5
## 5 Alexander Crystal 5 1 6
## 6 Alexander Derek 6 1 7
When model = "actor", the following additional elements
are returned:
sender_riskset: integer vector of actor IDs allowed to
send.receiver_riskset: named list of integer vectors of
allowed receiver IDs per sender.activeN: number of active senders.index$sender_map: a data.frame with
columns senderID and actorName for active
senders.reh_actor <- remify(
edgelist = randomREH$edgelist,
directed = TRUE,
ordinal = FALSE,
model = "actor",
actors = randomREH$actors,
riskset = "full",
origin = randomREH$origin
)
reh_actor$activeN## [1] 20
## senderID actorName
## 1 1 Alexander
## 2 2 Andrey
## 3 3 Breanna
## 4 4 Charles
## 5 5 Colton
## 6 6 Crystal
The available methods for a remify object are:
print, summary, dim, and
plot.
Both print() and summary() print a brief
summary of the relational network data.
## Relational Event Network
## (processed for tie-oriented modeling):
## > events = 9915
## > actors = 20
## > (event) types = 3
## > riskset = full
## >> included dyads = 380
## >> extend_riskset_by_type = FALSE
## > directed = TRUE
## > ordinal = FALSE
## > weighted = FALSE
## > time length ~ 114974
## > interevent time
## >> minimum ~ 0
## >> maximum ~ 96.8567
Returns key dimensions of the network: number of events, number of
actors, number of event types (if more than one), number of possible
dyads (\(D\)), and number of active
dyads (activeD, shown only if
riskset = "active" or "manual").
## events actors types dyads
## 9915 20 3 380
plot() returns a set of descriptive plots (selected via
the which argument):
n_intervals evenly-spaced time intervals (values in \([0,1]\); opacity and size of points are
proportional to the normalized measure).igraph: undirected and
directed edge graphs (edges’ opacity is proportional to event counts;
vertices’ opacity is proportional to degree).op <- par(no.readonly = TRUE)
par(mai=rep(0.8,4), cex.main=0.9, cex.axis=0.75)
plot(x = edgelist_reh, which = 1, n_intervals = 13)plot(x = edgelist_reh, which = 5, n_intervals = 13,
igraph.edge.color = "#cfcece", igraph.vertex.color = "#7bbfef")The plots vary for undirected networks: