This vignette explains the aim, input, output, and methods of the function remify::remify().


Aim

The objective of remify::remify() is to process raw relational event sequences supplied by the user along with other inputs that characterize the data (actors’ names, event types, starting time point of the event sequence, manual risk set specification, etc.). The internal routines process the structure of the input event sequence into a standardized format, providing objects that are used by the other packages in ‘remverse’.

As example, we will use the data randomREH (documentation available via ?randomREH).

library(remify) # loading library
data(randomREH) # loading data
names(randomREH) # objects inside the list 'randomREH'
## [1] "edgelist"  "actors"    "types"     "origin"    "omit_dyad"

Input

Input arguments that can be supplied to remify() are: edgelist, directed, ordinal, model, thin, actors, riskset, manual.riskset, event_type, origin, time.units, attach_riskset, riskset_decode, riskset_max_decode, event_covariates, and ncores.


edgelist

The edgelist must be a data.frame with three mandatory columns: the time of the interaction in the first column, and the two actors forming the dyad in the second and third column. The naming of the first three columns is not required but the order must be [time, actor1, actor2]. For directed networks, the second column is the sender and the third is the receiver. For undirected networks, the order of the second and third columns is ignored (dyads are sorted alphanumerically). Optional columns are weight (event weights affecting endogenous statistics) and event type columns (see event_type below).

head(randomREH$edgelist)
##                  time    actor1  actor2        type
## 1 2020-03-05 02:47:08     Kayla Kiffani competition
## 2 2020-03-05 02:50:18    Colton  Justin    conflict
## 3 2020-03-05 03:30:26    Kelsey    Maya cooperation
## 4 2020-03-05 03:38:50 Alexander  Colton competition
## 5 2020-03-05 03:56:16     Wyatt  Kelsey    conflict
## 6 2020-03-05 04:06:45     Derek Breanna competition

directed

A logical TRUE/FALSE value indicating whether events are directed (TRUE) or undirected (FALSE). If FALSE, dyads are sorted according to their alphanumeric order (e.g. [actor1, actor2] = ["Colton", "Alexander"] becomes ["Alexander", "Colton"]). Note that undirected networks are only supported for tie-oriented modeling.


ordinal

A logical TRUE/FALSE value indicating whether only the order of events matters in the model (TRUE) or whether waiting times must also be considered (FALSE). Based on this argument, the processing of the time variable is carried out differently and remstimate will use either the ordinal (if ordinal = TRUE) or the interval (if ordinal = FALSE) time likelihood. When ordinal = TRUE, the intereventTime field of the returned object is NULL.


model

Either "tie" (default) or "actor". For "tie", the risk set is at the dyad level. For "actor", the model has two sub-processes: a sender rate model and a receiver choice model. Actor-oriented modeling requires directed = TRUE. When model = "actor", the returned object additionally contains sender_riskset, receiver_riskset, and activeN.


thin

An integer >= 1 controlling event-time thinning based on unique time points. Keeps every thin-th unique event time and maps each event time to the next kept time point. This reduces the number of unique time points and thus memory and computation in downstream steps. The default is 1 (no thinning).


actors

An optional character vector of actor names. If NULL (default), actor names are taken from the input edgelist. Specifying actors explicitly is useful when actors that could interact during the study did not appear in any observed event and should nonetheless be included in the risk set.

randomREH$actors
##  [1] "Crystal"   "Colton"    "Lexy"      "Kelsey"    "Michaela"  "Zackary"  
##  [7] "Richard"   "Maya"      "Wyatt"     "Kiffani"   "Alexander" "Kayla"    
## [13] "Derek"     "Justin"    "Andrey"    "Francesca" "Megan"     "Mckenna"  
## [19] "Charles"   "Breanna"

riskset

The riskset argument specifies the type of risk set. Four options are available:

  • "full" (default): all possible dyadic events given the number of actors (and event types) are at risk throughout the entire event history.
  • "active": only the dyadic events observed at least once in the event history are at risk. This can substantially reduce computation time for sparse networks.
  • "active_saturated": extends the active risk set by adding the reverse direction for each observed dyad (if A→B is observed, B→A is also at risk) and includes all event types for each observed actor pair. This reflects the assumption that observing any interaction between two actors implies both directions and all types are possible.
  • "manual": a user-defined static risk set specified via manual.riskset. Observed dyads absent from manual.riskset are automatically added.

More details about risk set definitions are provided in vignette(topic = "riskset", package = "remify").


manual.riskset

Required when riskset = "manual". A data.frame with columns actor1 and actor2 (and optionally type) specifying the complete set of dyads that are at risk throughout the event history. This defines a static risk set: unlike the deprecated omit_dyad argument, the risk set does not vary over time. Any observed dyads from the edgelist that are missing from manual.riskset are added automatically.

# Example: restrict the risk set to a specific set of dyads
my_riskset <- data.frame(
  actor1 = c("Alexander", "Colton", "Lexy"),
  actor2 = c("Kayla",     "Lexy",   "Alexander")
)

reh_manual <- remify(
  edgelist      = randomREH$edgelist,
  directed      = TRUE,
  model         = "tie",
  riskset       = "manual",
  manual.riskset = my_riskset
)

event_type

An optional character string giving the name of the column in edgelist that contains event types (marks). If NULL (default), remify() uses edgelist$type if it exists; otherwise events are treated as untyped. If a column name is supplied, that column is used as the event-type mark. When event types are present, the dyadic risk set is extended over types: each dyad is duplicated for each event type.


origin

The initial time (\(t_0\)) of the observation period. If known, it can be specified here; it must have the same class as the time column in the input edgelist. If left unspecified (NULL), it is set by default to one average waiting time before the first observed event. In the randomREH data a \(t_0\) is provided:

randomREH$origin
## [1] "2020-03-05 02:32:53 CET"

time.units

A character string specifying the time unit for converting time values when edgelist$time is of class Date or POSIXct. Ignored for numeric or integer time. Default is "auto", which selects seconds.


attach_riskset

Logical (default TRUE). When TRUE, a list riskset_info is attached to the returned remify object. This list contains the effective risk set representation (e.g., riskset_idx, decoded dyad tables, and basic risk set metadata) and is intended to make the object self-describing and easier to inspect.


riskset_decode

Controls how the included risk set dyads are decoded and attached in riskset_info$included:

  • "labels" (default): attach a decoded dyad table with actor (and type) name labels.
  • "ids": attach a decoded dyad table with integer IDs only.
  • "none": do not attach a decoded dyad table.

riskset_max_decode

Integer (default 200000L). Maximum number of included dyads for which riskset_decode = "labels" is performed. If the risk set exceeds this threshold, decoding falls back to "ids" with a warning, to avoid excessive memory usage.


event_covariates

An optional character vector of column names in edgelist to retain as additional event-level variables in the returned remify object. These are stored in reh$event_covariates together with time, actor1, actor2, and an internal .event_id. This is useful when downstream functions need access to event-level marks that are not part of the core reh$edgelist.


ncores

An optional integer specifying the number of cores used in the parallelization of internal processing functions (default is 1).


Running the example

edgelist_reh <- remify(
  edgelist  = randomREH$edgelist,
  directed  = TRUE,   # events are directed
  ordinal   = FALSE,  # model with waiting times
  model     = "tie",  # tie-oriented modeling
  actors    = randomREH$actors,
  riskset   = "full",
  origin    = randomREH$origin
)

Output

The output of remify() is an S3 object of class remify. Its top-level elements are:

names(edgelist_reh)
##  [1] "M"              "N"              "C"              "D"             
##  [5] "intereventTime" "edgelist"       "edgelist_id"    "meta"          
##  [9] "ids"            "index"          "riskset_info"

M

M is the number of observed time points. If two or more events occur at the same time point, M counts unique time points and the total number of events is returned by E (see below). If all events occur at different time points, M equals the number of events.

edgelist_reh$M
## [1] 9915

E

E is the total number of observed events, returned only when simultaneous events exist (i.e., when M < E).


N

N is the total number of actors that could interact in the network.

edgelist_reh$N
## [1] 20

C

C is the number of event types. If no event types are present, C is 1.

edgelist_reh$C
## [1] 3

D and activeD

D is the number of dyads in the full risk set, i.e., the largest possible risk set size:

  • directed: \(D = N \times (N-1) \times C\)
  • undirected: \(D = \frac{N \times (N-1)}{2} \times C\)

When riskset is "active" or "manual", activeD gives the number of dyads in the reduced risk set.

edgelist_reh$D
## [1] 380

intereventTime

A numeric vector of waiting times between subsequent events: \[\begin{bmatrix} t_1 - t_0 \\ t_2 - t_1 \\ \cdots \\ t_M - t_{M-1} \end{bmatrix}\]

head(edgelist_reh$intereventTime)
## [1] 14.244936  3.166164 40.139102  8.401134 17.434267 10.467975

intereventTime is NULL when ordinal = TRUE.


edgelist

The processed input edgelist as a data.frame with columns [time, actor1, actor2] (plus type and/or weight if supplied). Events are re-ordered by time if necessary.

head(edgelist_reh$edgelist)
##       time    actor1  actor2        type
## 1 14.24494     Kayla Kiffani competition
## 2 17.41110    Colton  Justin    conflict
## 3 57.55020    Kelsey    Maya cooperation
## 4 65.95134 Alexander  Colton competition
## 5 83.38560     Wyatt  Kelsey    conflict
## 6 93.85358     Derek Breanna competition

edgelist_id

A per-event integer ID summary (internal use by downstream packages).


meta

A list of metadata describing the processed event history. This replaces the old attr()-based interface. Key fields:

names(edgelist_reh$meta)
##  [1] "model"             "directed"          "ordinal"          
##  [4] "weighted"          "with_type"         "with_type_riskset"
##  [7] "C_riskset"         "riskset"           "riskset_source"   
## [10] "origin"            "ncores"            "dictionary"       
## [13] "event_covariates"
  • meta$model: tie-oriented or actor-oriented modeling
  • meta$directed: whether events are directed
  • meta$ordinal: whether ordinal likelihood is used
  • meta$riskset: the type of risk set
  • meta$dictionary: a list of two data.frames — actors (columns actorName, actorID) and types (columns typeName, typeID) — sorted alphanumerically
  • meta$origin: the starting time \(t_0\)
  • meta$ncores: number of cores used
edgelist_reh$meta$directed
## [1] TRUE
edgelist_reh$meta$model
## [1] "tie"
edgelist_reh$meta$dictionary
## $actors
##    actorName actorID
## 1  Alexander       1
## 2     Andrey       2
## 3    Breanna       3
## 4    Charles       4
## 5     Colton       5
## 6    Crystal       6
## 7      Derek       7
## 8  Francesca       8
## 9     Justin       9
## 10     Kayla      10
## 11    Kelsey      11
## 12   Kiffani      12
## 13      Lexy      13
## 14      Maya      14
## 15   Mckenna      15
## 16     Megan      16
## 17  Michaela      17
## 18   Richard      18
## 19     Wyatt      19
## 20   Zackary      20
## 
## $types
##      typeName typeID
## 1 competition      1
## 2    conflict      2
## 3 cooperation      3

ids

A list of per-event integer IDs for actor1, actor2, dyad, and type. These replace the old attr(reh, "actor1ID") etc. interface.

# dyad ID of the first event
edgelist_reh$ids$dyad[1]
## [1] 182
# sender ID of the first event
edgelist_reh$ids$actor1[1]
## [1] 10

index

A list of decoded risk set tables. For tie-oriented modeling, contains dyad_map (full risk set) or dyad_map_active (active/manual risk set). For actor-oriented modeling, contains sender_map.


omit_dyad

For tie-oriented modeling, a dynamic risk set modification list (empty list when riskset is not "manual"). For actor-oriented modeling, always an empty list.


riskset_info

When attach_riskset = TRUE (the default), this list contains the effective risk set representation used for estimation. The field riskset_info$included contains the decoded risk set dyad table (format controlled by riskset_decode):

head(edgelist_reh$riskset_info$included)
##      actor1  actor2 dyadID actor1ID actor2ID
## 1 Alexander  Andrey      1        1        2
## 2 Alexander Breanna      2        1        3
## 3 Alexander Charles      3        1        4
## 4 Alexander  Colton      4        1        5
## 5 Alexander Crystal      5        1        6
## 6 Alexander   Derek      6        1        7

Actor-oriented model output

When model = "actor", the following additional elements are returned:

  • sender_riskset: integer vector of actor IDs allowed to send.
  • receiver_riskset: named list of integer vectors of allowed receiver IDs per sender.
  • activeN: number of active senders.
  • index$sender_map: a data.frame with columns senderID and actorName for active senders.
reh_actor <- remify(
  edgelist = randomREH$edgelist,
  directed = TRUE,
  ordinal  = FALSE,
  model    = "actor",
  actors   = randomREH$actors,
  riskset  = "full",
  origin   = randomREH$origin
)
reh_actor$activeN
## [1] 20
head(reh_actor$index$sender_map)
##   senderID actorName
## 1        1 Alexander
## 2        2    Andrey
## 3        3   Breanna
## 4        4   Charles
## 5        5    Colton
## 6        6   Crystal

Methods

The available methods for a remify object are: print, summary, dim, and plot.


dim()

Returns key dimensions of the network: number of events, number of actors, number of event types (if more than one), number of possible dyads (\(D\)), and number of active dyads (activeD, shown only if riskset = "active" or "manual").

dim(edgelist_reh)
## events actors  types  dyads 
##   9915     20      3    380

plot()

plot() returns a set of descriptive plots (selected via the which argument):

  1. Histogram of the inter-event times.
  2. Activity tile plot: dyad frequency heatmap with in-degree and out-degree (or total-degree for undirected networks) bar plots on the margins.
  3. Normalized in-/out-/total-degree of actors over n_intervals evenly-spaced time intervals (values in \([0,1]\); opacity and size of points are proportional to the normalized measure).
  4. Per time interval: number of events, proportion of observed dyads, proportion of active senders and active receivers (directed networks only).
  5. Network visualization via igraph: undirected and directed edge graphs (edges’ opacity is proportional to event counts; vertices’ opacity is proportional to degree).
op <- par(no.readonly = TRUE)
par(mai=rep(0.8,4), cex.main=0.9, cex.axis=0.75)
plot(x = edgelist_reh, which = 1, n_intervals = 13)

summary plots

plot(x = edgelist_reh, which = 2, n_intervals = 13)

summary plots

plot(x = edgelist_reh, which = 3, n_intervals = 13)

summary plotssummary plots

plot(x = edgelist_reh, which = 4, n_intervals = 13)

summary plots

plot(x = edgelist_reh, which = 5, n_intervals = 13,
     igraph.edge.color = "#cfcece", igraph.vertex.color = "#7bbfef")

summary plots

par(op)

The plots vary for undirected networks:

edgelist_reh_undir <- remify(
  edgelist = randomREH$edgelist,
  directed = FALSE,
  model    = "tie"
)