This notebook was developed to accompany the tutorial of a short course offered at the 2017 Annual Meeting of the American Political Science Association. The instructors for the course are Karsten Donnay (University of Zurich), Eric Dunford (Georgetown University), Andrew Linke (University of Utah), Erin McGrath (University of Maryland), David Backer (University of Maryland), and David Cunningham (University of Maryland). This short course focuses on newly developed software tools designed by the instructors, which enable more effective work with multiple datasets that have geospatial properties, which are increasingly employed in research conducted throughout the social sciences. The aims of the course are to familiarize participants with the use of these tools and associated best practices. At the end of the course, participants should understand why and how they could use these tools to support relevant research that requires integrating datasets with particular geospatial properties.
The first part of the notebook walks through the functionality, applications and best practices of the geomerge
package, which was just released. This package has been designed primarily to facilitate addressing challenges related to the integration of datasets with different geospatial properties. The package is illustrated using example data for Nigeria 2011. The illustration covers integration of Polygon
, Raster
and Point
data, including how to generate spatial panel data.
The first part of the notebook walks through the functionality, applications and best practices of the meltt
package, which was released earlier this year. This package has been designed to facilitate the integration of event data from multiple sources with differing properties. The package is illustrated by drawing on conflict event data from four prominent event datasets covering conflict observed in Nigeria during 2011.
The tutorial is designed to be hands-on, with participants working through the illustrative examples, accessing and processing datasets using the commands available in the geomerge
and meltt
packages. Doing so requires, at a minimum, an installation of the R programming software. Some knowledge of R is useful, though not mandatory. During this short course and tutorial, participants should learn about the utility, logic, and functionality of the two packages even without any significant expertise in R.
Before we get started, please set your work directory to the directory into which you unpacked the tutorial files (including the “data” directory).
setwd("YOUR DIRECTORY")
The use case: In various social science settings, empirical research relies on event data, which seeks to capture information on individual occurrences of phenomena, in a manner that is spatially and temporally disaggregated. Common examples of events for which data are available include incidents of armed conflict, which were discussed earlier (i.e., ACLED
), as well as neighborhood crime, terrorist attacks, car accidents, and marathon running times. Event data provide a granular picture of the distributions of locations and timings of a specific phenomenon.
In an increasing number of instances, more than one available event dataset captures the same or related topics. For example, multiple datasets capture similar information on battles between organized armed actors engaged in civil wars. Different event datasets may have information that is valuable, which is not offered by one dataset alone. Integrating event datasets could be useful to bolster spatial and temporal coverage, to encompass a broader spectrum of a phenomenon (i.e., more types), to collate existing information about events (i.e., compile more characteristics and/or more details), and/or to cross-validate the coding of these datasets (i.e., check whether different datasets yield the same measures of a given phenomenon).
The main challenges: Integrating multiple event datasets requires comparison of the entries in those datasets. Comparison is essential to avoid duplicate entries of the same event. Mere pooling of datasets could yield such duplicates, if different datasets happen to record some or all of the same events. Comparison is also necessary to establish matches. Knowing what entries match across multiple event datasets is useful when collating information about events, or when engaging in cross-validation.
Comparing event data, with the goal of establishing which entries do and do not match across multiple datasets, is notoriously difficult for the following reasons:
Spatial and Temporal Fuzziness. Information about events can differ depending on the original sources from which event data are derived. For example, news media sources can vary in their reporting of the location and/or timing of an event—especially if precise on-the-ground information is hard to come by. This variation can result in both spatial and temporal fuzziness, where the same event is “measured” with distinct locations and days across different datasets, due to their reliance on different original sources. The fuzziness can be large or small—and is not always consistent even within the same event dataset, again depending on the original sources.
Jittering Locations. Different geo-referencing software can produce slightly different longitude and latitude coordinates for the same named place. Those differences result in an artificial geo-spatial “jitter” around the same location, depending on which gazetteer is used in the software.
Conceptual Differences. Different event datasets are designed for different reasons. Each dataset will likely reflect a distinctive coding scheme—even for the same specific category of events, let alone for the same general category of events. For example, a dataset recording local muggings and burglaries might have a schema that records these types of events categorically (i.e. “mugging”, “break in”, etc.), whereas another crime dataset might record violent crimes and do so ordinally (1, 2, 3, etc.). Both datasets might be capturing the same event (e.g., a violent mugging), but each has a distinct way of coding that event.
In the past, researchers seeking to overcome these hurdles have typically relied on hand-coding processes to match data. This approach is extremely time consuming and costly, especially to do systematically and on a large scale. At the maximum, each entry in one dataset may need to be compared carefully to each entry in every other dataset of interest. This sort of process requires a lot of meticulous work to yield high-quality, reliable results. In practice, the results can be prone to mistakes, because of differences in what coders see as well as inattentiveness, sloppiness, and other forms of human error. The results of hand-coding are typically hard to reproduce and replicate exactly. For one, doing so requires performing the same comparisons all over again. If done by hand once more, this is time-consuming and costly. Even if automated, the correspondence is unlikely to be perfect. Human coding simply does not ensure 100% consistent output. The performance can be excellent, with clear rules of coding that are strictly observed, though rarely at the level of an automated algorithm.
meltt
provides a tool for integrating multiple event datasets in an automated, fast, inexpensive, flexible, transparent, reproducible fashion. More information about the specifics of the method can be found in the package documentation, which will soon be accompanied by article that describes and applies the methodology.
meltt
provides a means of integrating multiple event datasets on the same or similar topics. The output of the package can expand the spatial and temporal scope of coverage, extending analyses in ways that may improve both internal and external validity. The package can be used to integrate data on different types of events, while mitigating against any duplication in records. The output can therefore be valuable and more reliable in studying relationships among types of events. Further, users can rely on the package to collate information on events as recorded in multiple datasets, to enrich the available details. A final benefit is to engage in cross-validation, checking how different datasets measure the same phenomenon.
The package can also be installed through the CRAN repository.
# install.packages("meltt")
Again, we recommend that users install the latest development version of the package from Github for the purposes of this tutorial.
devtools::install_github("kdonnay/meltt")
Important: Currently, the package requires that users have both Python (>= 2.7) and a version of the numpy
module installed on their computers. To quickly obtain both, install an Anaconda platform. meltt
will use these programs in the background.
library(meltt)
As an illustration, we use several well-established sources of conflict event data, including ACLED, the Uppsala Conflict Data Program’s Georeference Event Data (UCDP-GED), the Social Conflict Analysis Database (SCAD), and the Global Terrorism Database (GTD). Each of these datasets records information about the spatio-temporal occurrence of conflict activity within the country. You downloaded Nigeria_2011.Rdata
together with this tutorial file. To create Nigeria_2011.Rdata
, we subset entries from the UCDP-GED, ACLED, SCAD, and GTD datasets for Nigeria 2011.
# Load Data
load("data/Nigeria_2011.Rdata")
library(raster)
## Loading required package: sp
# Quick visual overview of ACLED data
plot(states)
plot(SpatialPoints(cbind(acled$LONGITUDE,acled$LATITUDE)),new=TRUE,add=TRUE)
# Quick visual overview of UCPD-GED data
plot(states)
plot(SpatialPoints(cbind(ged$longitude,ged$latitude)),new=TRUE,add=TRUE)
# Quick visual overview of GTD data
plot(states)
plot(SpatialPoints(cbind(gtd$longitude[!is.na(gtd$longitude)],gtd$latitude[!is.na(gtd$longitude)])),new=TRUE,add=TRUE)
# Quick visual overview of SCAD data
plot(states)
plot(SpatialPoints(cbind(scad$longitude,scad$latitude)),new=TRUE,add=TRUE)
Each dataset contains information on the:
date
: when the event occurred;enddate
: if the event occurred across more than one day (i.e., an “episode”);longitude
& latitude
: geo-location information;event type
: the kind of activity for that entry;actor
: who initiated the activity.We will rely on this information to place entries into “bins” for purposes of appropriately and efficiently comparing entries across datasets, ultimately allowing the identification of potential matching entries (i.e., entries that appear to concern the same event). To reiterate, matching can be useful for several reasons. Perhaps most important is to ensure that integration does not lead to duplicate entries within the integrated data. The user may also be interested to collate information on events as recorded in different datasets, or to cross-validate the measurement of events based on the information available in different datasets.
meltt
formalizes all input assumptions the user needs to make in order to compare event datasets and identify entries that may match (i.e., concern the same event). First, the user must specify a spatial and temporal window within which any potential match could plausibly fall. That is, how close in space and time do entries need to be to qualify as potentially recording the same event?
Second, to articulate how different coding schemas overlap, the user needs to input an event taxonomy. A taxonomy is a formalization of how variables overlap, moving from as granular as possible to as general as possible. In this case, we are going to explore two taxonomies to help integrate the data: an event taxonomy that generalizes across event types, and an actor taxonomy that generalizes across the various actors located in the data.
To generate a taxonomy, it must exist across all datasets being integrated. For example, there must be some form of event type variable in each dataset to compare events. Lacking such information simply means the dataset missing the comparable parameter cannot be compared to the other datasets.
For the datasets of interest, we see that each contains information on an event’s type, but that information differs significantly across each dataset, given that each was created for different purposes and that each seeks to capture different types of activities (some of which overlap across data, and some that do not).
For example, observe how the information regarding event type is presented differently across the four datasets.
cat("\n GED \n",
unique(ged$type_of_violence),
"\n\n ACLED \n",
unique(acled$EVENT_TYPE),
"\n\n GTD \n",
unique(gtd$attacktype1),
"\n\n SCAD \n",
unique(scad$etype)
)
##
## GED
## 3 2 1
##
## ACLED
## Strategic development Battle-No change of territory Violence against civilians Remote violence Riots/Protests
##
## GTD
## 7 2 3 1 6
##
## SCAD
## 7 4 8 9 3 1 2
The corresponding variable from each dataset records information on the type of event a little differently. The idea of introducing a taxonomy is then, as mentioned before, to generalize across each category by clarifying how each coding scheme maps onto the other.
From the data folder, let’s load an event and actor taxonomy that we already put together for the Nigeria 2011 data.
load("data/taxonomies.Rdata")
A taxonomy allows a researcher to make all assumption regarding how variables map onto each other explicit. Zooming in on the actor taxonomy for the Nigeria 2011 data, we can see that as we move up the taxonomy levels, the more general the bins become. That is, we attempt to be as granular as possible when located the overlap on the first level and then we become more general, ending in just two categories (violent or nonviolent).
# View(event_tax)
event_tax
## data.source base.categories
## 1 acled Non-violent transfer of territory
## 2 acled Headquarters or base established
## 3 acled Protests
## 4 acled Non-violent activity by a conflict actor
## 5 acled Riots
## 6 acled Battle-Non-state actor overtakes territory
## 7 acled Battle-Government regains territory
## 8 acled Battle-No change of territory
## 9 acled Remote violence
## 10 acled Violence against civilians
## 11 acled Strategic development
## 12 ged 1
## 13 ged 2
## 14 ged 3
## 15 gtd 4
## 16 gtd 5
## 17 gtd 6
## 18 gtd 3
## 19 gtd 1
## 20 gtd 2
## 21 gtd 8
## 22 gtd 9
## 23 gtd 7
## 24 scad 1
## 25 scad 2
## 26 scad 3
## 27 scad 4
## 28 scad 5
## 29 scad 6
## 30 scad 7
## 31 scad 8
## 32 scad 9
## 33 scad 10
## 34 scad -9
## Level_1_text Level_2_text
## 1 Territorial Dispute Nonviolent Possession
## 2 Territorial Dispute Nonviolent Possession
## 3 Protest/Demonstration Nonviolent Displays
## 4 Protest/Demonstration Nonviolent Displays
## 5 Violent Protest/Demonstration Violent Displays
## 6 Territorial Dispute Violent Possession
## 7 Territorial Dispute Violent Possession
## 8 Territorial Dispute Violent Attack
## 9 Strategic Destruction Violent Attack (Bombing)
## 10 Atrocity Violent Attack (Against Civilians)
## 11 Protest/Demonstration Nonviolent Displays
## 12 Opposition-led Violence Violent Attack
## 13 Opposition-led Violence Violent Attack (No State)
## 14 Atrocity Violent Attack (Against Civilians)
## 15 Coercion Violent Possession
## 16 Coercion Violent Possession
## 17 Coercion Violent Possession
## 18 Strategic Destruction Violent Attack (Bombing)
## 19 Strategic Assault Violent Attack
## 20 Strategic Assault Violent Attack
## 21 Strategic Assault Violent Attack
## 22 Strategic Assault Violent Attack
## 23 Strategic Destruction Violent Attack
## 24 Protest/Demonstration Nonviolent Displays
## 25 Protest/Demonstration Nonviolent Displays
## 26 Protest/Demonstration Nonviolent Displays
## 27 Protest/Demonstration Nonviolent Displays
## 28 Violent Protest/Demonstration Violent Displays
## 29 Violent Protest/Demonstration Violent Displays
## 30 State-led Violence Violent Attack
## 31 Opposition-led Violence Violent Attack
## 32 Within-Regime Violence Violent Attack
## 33 Opposition-led Violence Violent Attack (No State)
## 34 State-led Violence Violent Attack
## Level_3_text Level_4_text
## 1 Nonviolent Action Nonviolent Event
## 2 Nonviolent Action Nonviolent Event
## 3 Nonviolent Action Nonviolent Event
## 4 Nonviolent Action Nonviolent Event
## 5 Violent Action Violent Event
## 6 Violent Attack Violent Event
## 7 Violent Attack Violent Event
## 8 Violent Attack Violent Event
## 9 Violent Attack Violent Event
## 10 Violent Attack Violent Event
## 11 Nonviolent Action Nonviolent Event
## 12 Violent Attack Violent Event
## 13 Violent Attack Violent Event
## 14 Violent Attack Violent Event
## 15 Violent Action Violent Event
## 16 Violent Action Violent Event
## 17 Violent Action Violent Event
## 18 Violent Attack Violent Event
## 19 Violent Attack Violent Event
## 20 Violent Attack Violent Event
## 21 Violent Attack Violent Event
## 22 Violent Attack Violent Event
## 23 Violent Attack Violent Event
## 24 Nonviolent Action Nonviolent Event
## 25 Nonviolent Action Nonviolent Event
## 26 Nonviolent Action Nonviolent Event
## 27 Nonviolent Action Nonviolent Event
## 28 Violent Action Violent Event
## 29 Violent Action Violent Event
## 30 Violent Attack Violent Event
## 31 Violent Attack Violent Event
## 32 Violent Attack Violent Event
## 33 Violent Attack Violent Event
## 34 Violent Attack Violent Event
Likewise, we similarly formalized the actor taxonomy in a similar manner.
actor_tax
## data.source base.categories
## 1 scad Soldiers
## 2 scad Protesters
## 3 scad Police
## 4 scad Gunmen
## 5 scad Islamists
## 6 scad Women
## 7 scad Christians
## 8 scad Youths
## 9 scad Unknown
## 10 scad Pirates
## 11 scad Civilians
## 12 scad Muslims
## 13 scad Fulani
## 14 scad Unknown gunmen
## 15 scad Boko Haram
## 16 scad Unknown attackers
## 17 scad Unknown bombers
## 18 scad Muslim youths
## 19 scad Nigeria Labour Congress
## 20 scad Militants
## 21 scad Muslim sect
## 22 scad Chistian youths
## 23 scad Cattle rustlers
## 24 scad Muslim gangs
## 25 scad Fulani Muslims
## 26 scad Unknown assailants
## 27 scad Supporters of Muhammadu Buhari
## 28 scad Muslim attackers
## 29 scad Copy cat killers
## 30 scad Suspected militant
## 31 scad Suspected militants
## 32 scad Gang of robbers
## 33 scad Umar Quality
## 34 scad National Union of Electrity Workers
## 35 scad Muslim Fulani tribesmen
## 36 scad Oodua People's Congress
## 37 scad Ezza community
## 38 scad Boko Haram
## 39 scad Boko Haram
## 40 scad Boko Haram
## 41 scad Boko Haram
## 42 scad Boko Haram
## 43 scad Boko Haram
## 44 scad Boko Haram
## 45 scad Boko Haram
## 46 acled Military Forces of Nigeria (1999-2015)
## 47 acled Boko Haram
## 48 acled Boko Haram
## 49 acled Unidentified Armed Group (Nigeria)
## 50 acled Rioters (Nigeria)
## 51 acled Protesters (Nigeria)
## 52 acled PDP: Peoples Democratic Party
## 53 acled Military Forces of Nigeria (1999-2015) Joint Task Force
## 54 acled Police Forces of Nigeria (1999-2015)
## 55 acled Muslim Militia (Nigeria)
## 56 acled Fulani Ethnic Militia (Nigeria)
## 57 acled Christian Militia (Nigeria)
## 58 acled Ezza Ethnic Militia (Nigeria)
## 59 acled Muslim Youth Sect (Nigeria)
## 60 acled Christian Youth Sect (Nigeria)
## 61 acled DDM: Delta Democratic Militia
## 62 acled National Youth Council of Nigeria
## 63 acled Boko Haram
## 64 ged Supporters of ACN
## 65 ged Christians (Nigeria)
## 66 ged Supporters of ANPP
## 67 ged Government of Nigeria
## 68 ged Hausa
## 69 ged Black Axe
## 70 ged Government of Nigeria
## 71 ged Deebam
## 72 ged Fulani
## 73 ged Greenlanders
## 74 ged Jama'atu Ahlis Sunna Lidda'awati wal-Jihad
## 75 ged Birom
## 76 ged Ezilo
## 77 ged Government of Cameroon
## 78 ged Government of Cameroon
## 79 ged Boko Haram
## 80 ged NURTW-Auxiliary
## 81 gtd Unknown
## 82 gtd Pirates
## 83 gtd Muslims
## 84 gtd Protesters
## 85 gtd Gunmen
## 86 gtd Youths
## 87 gtd Militants
## 88 gtd Boko Haram
## 89 gtd Delta Democratic Militia
## 90 gtd Ansaru (Jama'atu Ansarul Muslimina Fi Biladis Sudan)
## 91 acled Bajju Ethnic Militia (Nigeria)
## Level_1 Level_2 Level_3
## 1 government violent groups violent groups
## 2 movement groups political groups nonviolent groups
## 3 government violent groups violent groups
## 4 violent groups violent groups violent groups
## 5 religious groups civilian groups nonviolent groups
## 6 civilians civilian groups nonviolent groups
## 7 ethnic groups civilian groups nonviolent groups
## 8 civilians civilian groups nonviolent groups
## 9 unknown unknown violent groups
## 10 violent groups violent groups violent groups
## 11 civilians civilian groups nonviolent groups
## 12 ethnic groups civilian groups nonviolent groups
## 13 ethnic groups civilian groups nonviolent groups
## 14 violent groups violent groups violent groups
## 15 torg violent groups violent groups
## 16 violent groups violent groups violent groups
## 17 violent groups violent groups violent groups
## 18 civilians civilian groups nonviolent groups
## 19 civilian groups civilian groups nonviolent groups
## 20 violent groups violent groups violent groups
## 21 ethnic groups civilian groups nonviolent groups
## 22 civilians civilian groups nonviolent groups
## 23 civilian groups civilian groups nonviolent groups
## 24 violent groups violent groups violent groups
## 25 ethnic groups civilian groups nonviolent groups
## 26 violent groups violent groups violent groups
## 27 movement groups political groups nonviolent groups
## 28 violent groups violent groups violent groups
## 29 violent groups violent groups violent groups
## 30 violent groups violent groups violent groups
## 31 violent groups violent groups violent groups
## 32 violent groups violent groups violent groups
## 33 unknown unknown violent groups
## 34 nonviolent organizations civilian groups nonviolent groups
## 35 civilians civilian groups nonviolent groups
## 36 civilians civilian groups nonviolent groups
## 37 civilians civilian groups nonviolent groups
## 38 violent groups violent groups violent groups
## 39 violent groups violent groups violent groups
## 40 violent groups violent groups violent groups
## 41 violent groups violent groups violent groups
## 42 torg violent groups violent groups
## 43 violent groups violent groups violent groups
## 44 violent groups violent groups violent groups
## 45 violent groups violent groups violent groups
## 46 violent groups violent groups violent groups
## 47 violent groups violent groups violent groups
## 48 violent groups violent groups violent groups
## 49 violent groups violent groups violent groups
## 50 violent groups violent groups violent groups
## 51 movement groups political groups nonviolent groups
## 52 civilians civilian groups nonviolent groups
## 53 violent groups violent groups violent groups
## 54 government violent groups violent groups
## 55 violent groups violent groups violent groups
## 56 violent groups violent groups violent groups
## 57 violent groups violent groups violent groups
## 58 violent groups violent groups violent groups
## 59 civilians civilian groups nonviolent groups
## 60 civilians civilian groups nonviolent groups
## 61 violent groups violent groups violent groups
## 62 government violent groups violent groups
## 63 violent groups violent groups violent groups
## 64 violent groups violent groups violent groups
## 65 violent groups violent groups violent groups
## 66 violent groups violent groups violent groups
## 67 government violent groups violent groups
## 68 violent groups violent groups violent groups
## 69 violent groups violent groups violent groups
## 70 violent groups violent groups violent groups
## 71 violent groups violent groups violent groups
## 72 violent groups violent groups violent groups
## 73 violent groups violent groups violent groups
## 74 violent groups violent groups violent groups
## 75 violent groups violent groups violent groups
## 76 violent groups violent groups violent groups
## 77 government violent groups violent groups
## 78 violent groups violent groups violent groups
## 79 violent groups violent groups violent groups
## 80 torg violent groups violent groups
## 81 violent groups violent groups violent groups
## 82 violent groups violent groups violent groups
## 83 violent groups violent groups violent groups
## 84 violent groups violent groups violent groups
## 85 violent groups violent groups violent groups
## 86 violent groups violent groups violent groups
## 87 violent groups violent groups violent groups
## 88 torg violent groups violent groups
## 89 torg violent groups violent groups
## 90 torg violent groups violent groups
## 91 violent groups violent groups violent groups
Generally, specifications of taxonomy levels can be as granular or as broad as one chooses. The more fine-grained the levels one includes to describe the overlap, the more specific the match. At the same time, if categories are too narrow, it is difficult to conceptualize potential matches across datasets. As a rule, there is a tradeoff between specific categories that can better differentiate among possible duplicate entries and unspecific categories that more easily recognize potentially matching information across datasets.
As a general rule, we therefore recommend to include, whenever it is conceptually warranted, both specific fine-grained categories and a few increasingly broader ones. In this case, meltt
will have more information to work with when differentiating between sets of potential matches. In establishing which entries are most likely to correspond, meltt
in case of more than two potential matches in one dataset always automatically favors the one that more precisely corresponds. A good taxonomy is the key to matching data, and is the primary vehicle by which a user’s assumptions – regarding how data fits together – is made transparent.
There are a few technical details regarding how the data must be organized to submit as arguments in meltt
.
data.frame
is read into the taxonomy
argument of meltt
as part of a single list object.taxonomies = list("event_tax" = event_tax,
"actor_tax" = actor_tax)
str(taxonomies)
## List of 2
## $ event_tax:'data.frame': 34 obs. of 6 variables:
## ..$ data.source : chr [1:34] "acled" "acled" "acled" "acled" ...
## ..$ base.categories: chr [1:34] "Non-violent transfer of territory" "Headquarters or base established" "Protests" "Non-violent activity by a conflict actor" ...
## ..$ Level_1_text : chr [1:34] "Territorial Dispute" "Territorial Dispute" "Protest/Demonstration" "Protest/Demonstration" ...
## ..$ Level_2_text : chr [1:34] "Nonviolent Possession" "Nonviolent Possession" "Nonviolent Displays" "Nonviolent Displays" ...
## ..$ Level_3_text : chr [1:34] "Nonviolent Action" "Nonviolent Action" "Nonviolent Action" "Nonviolent Action" ...
## ..$ Level_4_text : chr [1:34] "Nonviolent Event" "Nonviolent Event" "Nonviolent Event" "Nonviolent Event" ...
## $ actor_tax:'data.frame': 91 obs. of 5 variables:
## ..$ data.source : chr [1:91] "scad" "scad" "scad" "scad" ...
## ..$ base.categories: chr [1:91] "Soldiers" "Protesters" "Police" "Gunmen" ...
## ..$ Level_1 : chr [1:91] "government" "movement groups" "government" "violent groups" ...
## ..$ Level_2 : chr [1:91] "violent groups" "political groups" "violent groups" "violent groups" ...
## ..$ Level_3 : chr [1:91] "violent groups" "nonviolent groups" "violent groups" "violent groups" ...
meltt
relies on simple naming conventions to identify which variable is what when matching.names(taxonomies)
## [1] "event_tax" "actor_tax"
In this case, let’s rename the variables in the data to correspond with the naming convention of the taxonomies that we designated.
# for events
names(ged)[names(ged)=='type_of_violence'] = 'event_tax'
names(acled)[names(acled)=='EVENT_TYPE'] = 'event_tax'
names(scad)[names(scad)=='etype'] = 'event_tax'
names(gtd)[names(gtd)=='attacktype1'] = 'event_tax'
# for actors
names(ged)[names(ged)=='side_a'] = 'actor_tax' # given UCPD dyad conventions
names(acled)[names(acled)=='ACTOR1'] = 'actor_tax'
names(scad)[names(scad)=='actor1'] = 'actor_tax'
names(gtd)[names(gtd)=='gname'] = 'actor_tax'
data.source
and base.categories
column: this last convention helps meltt
identify which variable is contained in which data object. The data.source
column should reflect the names of the of the data objects for input data and the base.categories
should reflect the original coding of the variable on which the taxonomy is built.head( event_tax[, c( "data.source","base.categories" ) ] )
## data.source base.categories
## 1 acled Non-violent transfer of territory
## 2 acled Headquarters or base established
## 3 acled Protests
## 4 acled Non-violent activity by a conflict actor
## 5 acled Riots
## 6 acled Battle-Non-state actor overtakes territory
date
,enddate
(if one exists), longitude
, and latitude
column: the variables must be named accordingly (no deviations in naming conventions are allowed). The dates should be in an R
date format (as.Date()
), and the geo-reference information must be numeric (as.numeric()
).As you might have already realized from looking at the data, they are not perfectly organized in this way, so we will need to do a little cleaning prior to processing.
# Cleaning UCDP-GED
ged$date_start = as.Date(ged$date_start)
names(ged)[names(ged)=='date_start'] = 'date'
ged$date_end = as.Date(ged$date_end)
names(ged)[names(ged)=='date_end'] = 'enddate'
# Cleaning ACLED
colnames(acled) = tolower(colnames(acled))
acled$event_date = as.Date(acled$event_date)
names(acled)[names(acled)=='event_date'] = 'date'
# Cleaning GTD
gtd$date = as.Date(paste0(gtd$iyear,"-",gtd$imonth,"-",gtd$iday))
# Time and Location in formation must be complete. Cannot process entries
# missing this information. Here GTD is missing lat/lon information for one
# entry; thus, we drop it.
gtd = gtd[!is.na(gtd$latitude),]
# Cleaning SCAD
scad$startdate = as.Date(scad$startdate)
names(scad)[names(scad)=='startdate'] = 'date'
scad$enddate = as.Date(scad$enddate)
Lastly, the ACLED data codes protest and riots into a single category. We opt to disaggregate this level further by breaking the event type up into “Protests” and “Riots” categories, using the number of reported fatalities as the delimiter.
acled$event_tax[acled$event_tax == "Riots/Protests" &
(acled$fatalities==0 |
is.na(acled$fatalities))] = "Protests"
acled$event_tax[acled$event_tax == "Riots/Protests" & acled$fatalities>0] = "Riots"
Once the taxonomy is formalized, matching several datasets is straightforward. The meltt()
function takes four main arguments:
...
: input data;taxonomies =
: list object containing the user-input taxonomies;spatwindow =
: the spatial window (in kilometers);twindow =
: the temporal window (in days).Below we assume that any two events among the four different datasets occurring within 3 kilometers and 1 days of each other could plausibly be the same event. This “fuzziness” basically sets the boundaries on how precise we believe the spatial location and timing of events is coded. It is usually best practice to vary these specifications systematically to ensure that no one specific combination drives the outcomes of the integration task.
We then assume that event categories map onto each other according to the way that we formalized in the taxonomies outlined above. We fold all this information together using the meltt()
function and then store the results in an object named output
.
output <- meltt(acled,ged,scad,gtd,
taxonomies = taxonomies,
twindow = 1,spatwindow = 3)
## meltt: Matching Event Data by Location, Time and Type.
## Karsten Donnay and Eric Dunford, 2018
##
## NOTE: Depending on the size and number of datasets integration may take some time!
##
##
## meltt(acled, ged, scad, gtd, taxonomies = taxonomies, twindow = 1,
## spatwindow = 3)
##
## Checking meltt() arguments and inputs: Done.
##
## Please note the following:
##
## One or more of the input datasets contains episodal data but no 'enddate' varible was specified for dataset 'acled'. If an end date variable exists, please relabel as 'enddate'.
##
## One or more of the input datasets contains episodal data but no 'enddate' varible was specified for dataset 'gtd'. If an end date variable exists, please relabel as 'enddate'.
## Preparing data for integration: Done.
## Integrating dataset 1 with dataset 2: Done.
## Integrating merged data and dataset 3: Done.
## Integrating merged data and dataset 4: Done.
## Integration completed!
The above message notes that two of the datasets (ACLED and GTD) do not have enddate. That is, they do not contain episodal data. MELTT
will create a placeholder for the enddate that mirrors the date.
meltt
also contains a range of adjustments to offer the user additional controls regarding how the events are matched. These auxiliary arguments are:
smartmatch
: when TRUE
(default), all available taxonomy levels are used and meltt
uses a matching score that ensures that fine-grained agreements is favored over broader agreement, if more than one taxonomy level exists. When FALSE
, only specific taxonomy levels are considered.certainty
: specification of the exact taxonomy level to match on when smartmatch = FALSE
.partial
: specifies whether matches along only some of the taxonomy dimensions are permitted.averaging
: implement averaging of all values events are match on when matching across multiple data.frames. That is, as events are matched dataset by dataset, the metadata is averaged. (Note: that this can generate distortion in the output).weight
: specified weights for each taxonomy level to increase or decrease the importance of each taxonomy’s contribution to the matching score.At times, one might want to know which taxonomy level is doing the heavy lifting. By turning off smartmatch
, and specifying certain taxonomy levels by which to compare events, or by weighting taxonomy levels differently, one is able to better assess which assumptions are driving the final integration results. This can help with fine-tuning the input assumptions for meltt
to gain the most valid match possible.
When printed, the meltt
object offers a brief summary of the output.
output
## MELTT Complete: 4 datasets successfully integrated.
## =========================================================
## Total No. of Input Observations: 915
## No. of Unique Obs (after deduplication): 691
## No. of Unique Matches: 150
## No. of Duplicates Removed: 224
## =========================================================
In matching the four conflict datasets, there are 915 total entries. Of those, 151 of them are events contained within two or more datasets based on their timestamp, location and event characteristics (as expressed by the taxonomies). As such, MELTT removes 225 entries that are found to be duplicates, leaving us with 690 “unique” entries.
Likewise, the summary()
function offers a more informed summary of the output.
summary(output)
##
## MELTT output
## ============================================================
## No. of Input Datasets: 4
## Data Object Names: acled, ged, scad, gtd
## Spatial Window: 3km
## Temporal Window: 1 Day(s)
##
## No. of Taxonomies: 2
## Taxonomy Names: event_tax, actor_tax
## Taxonomy Depths: 4, 3
##
## Total No. of Input Observations: 915
## No. of Unique Matches: 150
## - No. of Event-to-Event Matches: 150
## - No. of Episode-to-Episode Matches: 0
## No. of Duplicates Removed: 224
## No. of Unique Obs (after deduplication): 691
## ------------------------------------------------------------
## Summary of Overlap
## acled ged scad gtd Freq
## X 224
## X 141
## X 90
## X 86
## X X 43
## X X 7
## X X 24
## X X 4
## X X 6
## X X 3
## X X X 8
## X X X 34
## X X X 6
## X X X 4
## X X X X 11
## ============================================================
## *Note: 40 episode(s) flagged as potentially matching to an event.
## Review flagged match with meltt.inspect()
Given that meltt objects can be saved and referenced later, the summary function offers a recap on the input parameters and assumptions that underpin the match (i.e. the datasets, the spatiotemporal window, the taxonomies, etc.). Again, information regarding the total number of observations, the number of unique and duplicate entries, and the number matches found is reported, but this time information regarding how many of those matches were event-to-event (i.e. events that played out along one time unit where the date is equal to the end date) and episode-to-episode (i.e. events that played out over a couple of days).
A summary of overlap is also provided, articulating how the different input datasets overlap and where. For example, only 11 entries appear in all four datasets, while 4 entries are found to match across GED/SCAD/GTD, 6 across ACLED/SCAD/GTD, and 34 across ACLED/GED/GTD.
Note: events that have been flagged as matching to episodes require manual review using the meltt.inspect()
function. The summary output tells us that 40 episodes are flagged as potentially matching to some event. Technically speaking, episodes (events with different start and end dates) and events are at different units of analysis; thus, user discretion is required to help sort out these types of matches. The meltt.inspect()
function eases this process of manual assessment. See below for more information.
For quick visualizations of the matched output, meltt
contains three plotting functions.
plot()
offers a bar plot that graphically articulates the unique and overlapping entries. Note that the entries from the leading dataset (i.e. the dataset first entered into meltt) is all black. In this representation, all matching (or duplicate) entries are expressed in reference to the datasets that came before it. Any match found in GED is with respect to ACLED, any in SCAD with respect to ACLED and GED, and so forth. As such, the leading data set is always in black.
plot(output)
## Warning: attributes are not identical across measure variables;
## they will be dropped
tplot()
offers a time series plot of the meltt output. The plot works as a reflection, where raw counts of the unique entries are plotted right-side up and the raw counts of the removed duplicates are plotted below it. This offers a quick snapshot of when duplicates are found. Temporal clustering of duplicates may indicate an issue with the data and/or the input assumptions, or it’s potentially evidence of a unique artifact of the data itself.
Users can specify the temporal unit that the data should be binned (day, week, month, year). Give that the data only covers one month, we’ll look at the output by day.
tplot(output, time_unit = "months")
Similarly, mplot()
presents a summary of the spatial distribution of the data by mapping the spatial points. Events where matches were detected are labeled by blue circles. Again, the goal is to get a sense of the spatial distribution of the matches to both identify any clustering/disproportionate coverage in where matches are located, and to also get a sense of the spread of the integrated output.
mplot(output)
The mplot()
command, in fact, opens an interactive data browser in the viewer window allowing a granular inspection of the spatial matches. Information regarding the input criteria in which each entry was assessed (e.g. the taxonomy inputs) are retained and can be referenced by hovering over the point with the mouse.
meltt
provides two methods for extracting data from the output object.
meltt.data()
returns the de-duplicated data along with any necessary columns the user might need. This is the primary function for extracting matched data and moving on with subsequent analysis. The columns =
argument takes any vector of variable names and returns those variables in the output. If no variables are specified, meltt
returns the spatio-temporal and taxonomy variables that were employed during the match. In addition, the function returns a unique event and data ID for reference.
uevents <- meltt_data(output)
head(uevents)
## dataset event date latitude longitude event_tax
## 1 acled 1 2011-01-01 9.92850 8.892100 Strategic development
## 2 gtd 1 2011-01-01 11.83333 13.150000 7
## 3 acled 24 2011-01-03 11.84640 13.160300 Violence against civilians
## 4 acled 25 2011-01-03 11.84640 13.160300 Violence against civilians
## 5 gtd 3 2011-01-03 5.50000 5.983333 3
## 6 acled 30 2011-01-04 9.28330 12.466700 Protests
## actor_tax
## 1 Boko Haram
## 2 Boko Haram
## 3 Unidentified Armed Group (Nigeria)
## 4 Boko Haram
## 5 Delta Democratic Militia
## 6 Rioters (Nigeria)
The number of entries in this data frame corresponds with the number of de-duplicated entries in the data.
dim(uevents)
## [1] 691 7
In addition, we can extract specific columns of data by using the columns=
argument. Below we extract all the event summary columns for the four datasets to retrieve the qualitative descriptions of the reported events.
uevents2 <- meltt_data(output,
columns = c("date","event_tax","longitude","latitude",
"notes","summary","source_headline","issuenote"))
head(uevents2)
## dataset event date latitude longitude event_tax
## 1 acled 1 2011-01-01 9.92850 8.892100 Strategic development
## 2 gtd 1 2011-01-01 11.83333 13.150000 7
## 3 acled 24 2011-01-03 11.84640 13.160300 Violence against civilians
## 4 acled 25 2011-01-03 11.84640 13.160300 Violence against civilians
## 5 gtd 3 2011-01-03 5.50000 5.983333 3
## 6 acled 30 2011-01-04 9.28330 12.466700 Protests
## notes
## 1 Suspected Boko Haram arsonists burnt a church in a northern Nigerian city. Arsonists Saturday night who set a fire on the church that gutted a section of it before the fire was put out by residents. No one was hurt in the attack as there were n
## 2 <NA>
## 3 Gunmen killed three people at a movie theatre in a northern city in an attack police believe is politically-motivated ahead of general elections. The assailants were believed to be thugs loyal to a local politician.
## 4 Suspected members of a radical Islamist sect blamed for a spate of recent attacks in northern Nigeria shot dead an off-duty policeman in Maiduguri. The victim was wearing civilian clothes and was about to enter his home when the attack took pla
## 5 <NA>
## 6 A riot broke out at Jimeta Prison complex when suspected Boko Haram inmates attempted a prison break by overpowering guards. The attempted break-out was unsuccessful.
## summary
## 1 <NA>
## 2 01/01/2011: On Saturday night, in Maiduguri, Borno, Nigeria, unidentified gunmen set fire to Victory Christ Church by unknown means. No casualties were reported and one section of the church was damaged. No group claimed responsibility, but the militant group Boko Haram was thought to be responsible for the attack
## 3 <NA>
## 4 <NA>
## 5 01/03/2011: On Monday night around 0100, in Ughelli, Delta, Nigeria, three people were injured when unidentified militants detonated an improvised explosive device targeting the Independent National Electoral Commission (INEC) office building on Post Officer Road. The building was burned and completely destroyed in the attack. The attack was motivated by the INEC's rigging of the recent election. The militant group Delta Democratic Militia claimed responsibility for the attack.
## 6 <NA>
## source_headline issuenote
## 1 <NA> <NA>
## 2 <NA> <NA>
## 3 <NA> <NA>
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 <NA> <NA>
Note that there is some overlap in the descriptions across datasets. These are events that have been matched up. The information from the original dataset can be retrieved, even if the entry itself has been flagged as a duplicate and removed. In this way, meltt
operates as a sophisticated merge
function.
meltt.duplicates()
, on the other hand, returns a data frame of all events that matched up. This provides a quick way of examining and assessing the events that matched. Since the quality of any match is only as good as the assumptions we input, its key that the user qualitatively evaluate the meltt output to assess whether any assumptions should be adjusted. Like meltt.data()
, the columns =
argument can be customized to return variables of interest.
Note that the data is presented differently than in meltt.data()
; here each dataset (and its corresponding variables) is presented in a separate column. This representation is chose for ease of comparison. The requested columns are intended to assist with validation.
Below we do not specify specific columns. As such, all columns (with unique IDs on each variable) are returned. This returns a wide dataset.
dups <- meltt_duplicates(output)
head(dups)
## acled_dataset acled_event ged_dataset ged_event scad_dataset scad_event
## 1 0 0 2 66 3 105
## 2 0 0 2 194 0 0
## 3 0 0 2 160 3 21
## 4 0 0 0 0 3 123
## 5 0 0 2 70 0 0
## 6 0 0 2 67 3 110
## gtd_dataset gtd_event gtd_eventid gtd_iyear gtd_imonth gtd_iday
## 1 0 0 NA NA NA NA
## 2 4 114 201110030005 2011 10 3
## 3 4 32 201103020016 2011 3 2
## 4 4 163 201112150028 2011 12 15
## 5 4 156 201111270022 2011 11 27
## 6 4 143 201111090012 2011 11 9
## gtd_approxdate gtd_extended gtd_resolution gtd_country gtd_country_txt
## 1 <NA> NA <NA> NA <NA>
## 2 <NA> 0 <NA> 147 Nigeria
## 3 <NA> 0 <NA> 147 Nigeria
## 4 <NA> 0 <NA> 147 Nigeria
## 5 <NA> 0 <NA> 147 Nigeria
## 6 <NA> 0 <NA> 147 Nigeria
## gtd_region gtd_region_txt gtd_provstate gtd_city gtd_latitude
## 1 NA <NA> <NA> <NA> NA
## 2 11 Sub-Saharan Africa Borno Maiduguri 11.83333
## 3 11 Sub-Saharan Africa Niger Suleja 9.18053
## 4 11 Sub-Saharan Africa Borno Maiduguri 11.83321
## 5 11 Sub-Saharan Africa Borno Maiduguri 11.83612
## 6 11 Sub-Saharan Africa Borno Mainok 11.82986
## gtd_longitude gtd_specificity gtd_vicinity
## 1 NA NA NA
## 2 13.15000 1 0
## 3 7.17934 1 0
## 4 13.15010 1 0
## 5 13.17764 1 0
## 6 12.63022 1 0
## gtd_location
## 1 <NA>
## 2 At a tea shop in Maiduguri, Borno, Nigeria.
## 3 At a secondary school in Suleja, Niger, Nigeria.
## 4 <NA>
## 5 Gwange ward
## 6 On the outskirts of Maiduguri, approximately 75 kilometers from Damaturu city
## gtd_summary
## 1 <NA>
## 2 10/03/2011: On Monday morning, in Maiduguri, Borno, Nigeria, two militants fired upon and killed a tea seller and a civilian outside a tea shop. No group has claimed responsibility, but the militant group Boko Haram was thought to be responsible for the attack.
## 3 03/02/2011: On Wednesday afternoon around 1330, in Suleja, Niger, Nigeria, 10 people were killed and 34 others were injured when one man threw an improvised explosive device at a Peoples Democratic Party campaign rally for Niger Governor Babangida Aliyu, at a Nigerian government secondary school. Babangida Aliyu was not injured, but one bus sustained an unknown amount of damage in the attack. No group has claimed responsibility for the attack.
## 4 12/15/2011: Suspected members of Boko Haram opened fire on a group of civilians standing outside of a shop in Maiduguri city, Borno state, Nigeria. Five civilians were killed in the shooting; however, there were no reported injuries. The assailants were traveling in a vehicle at the time of the attack and fled the scene following the shooting.
## 5 11/27/2011: Three unidentified gunmen shot and killed a government employee in Gwange ward of Maiduguri city, Borno state, Nigeria. The victim, Kala Boro, was a protocol officer for the Borno state Government House. The assailants followed him home from work and shot him while he was in his car. This was one of two multiple incidents; the assailants killed an herbalist in a separate incident after killing Boro. No group claimed responsibility for the incident; however, sources suspect the involvement of Boko Harm.
## 6 11/9/2011: Approximately 20 suspected members of Boko Haram attacked a police station in Mainok village, Borno state, Nigeria. The assailants threw explosives inside and burned the police station down; there were no reported injuries as the police station had been closed some time before. The attack on the police station happened in conjunction with an attack on a federal road safety office in the same village.
## gtd_crit1 gtd_crit2 gtd_crit3 gtd_doubtterr gtd_alternative
## 1 NA NA NA NA NA
## 2 1 1 1 0 NA
## 3 1 1 1 0 NA
## 4 1 1 1 0 NA
## 5 1 1 1 0 NA
## 6 1 1 1 0 NA
## gtd_alternative_txt gtd_multiple gtd_success gtd_suicide gtd_event_tax
## 1 <NA> NA NA NA NA
## 2 <NA> 0 1 0 2
## 3 <NA> 0 1 0 3
## 4 <NA> 0 1 0 2
## 5 <NA> 1 1 0 2
## 6 <NA> 1 1 0 3
## gtd_attacktype1_txt gtd_attacktype2 gtd_attacktype2_txt gtd_attacktype3
## 1 <NA> NA <NA> <NA>
## 2 Armed Assault NA <NA> <NA>
## 3 Bombing/Explosion NA <NA> <NA>
## 4 Armed Assault NA <NA> <NA>
## 5 Armed Assault NA <NA> <NA>
## 6 Bombing/Explosion NA <NA> <NA>
## gtd_attacktype3_txt gtd_targtype1 gtd_targtype1_txt
## 1 <NA> NA <NA>
## 2 <NA> 14 Private Citizens & Property
## 3 <NA> 14 Private Citizens & Property
## 4 <NA> 14 Private Citizens & Property
## 5 <NA> 2 Government (General)
## 6 <NA> 3 Police
## gtd_targsubtype1 gtd_targsubtype1_txt
## 1 NA <NA>
## 2 77 Laborer (General)/Occupation Identified
## 3 67 Unnamed Civilian/Unspecified
## 4 74 Marketplace/Plaza/Square
## 5 18 Government Personnel (excluding police, military)
## 6 22 Police Building (headquarters, station, school)
## gtd_corp1
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 Civilians
## 5 Borno state Government House
## 6 Police
## gtd_target1
## 1 <NA>
## 2 A tea seller was targeted in the attack.
## 3 Civilians
## 4 Civilians grouped outside of a shop in Maiduguri
## 5 Kala Boro, a protocol officer for the Borno state Government House
## 6 An abandoned police outpost in Mainok village
## gtd_natlty1 gtd_natlty1_txt gtd_targtype2 gtd_targtype2_txt
## 1 NA <NA> NA <NA>
## 2 147 Nigeria NA <NA>
## 3 147 Nigeria 2 Government (General)
## 4 147 Nigeria NA <NA>
## 5 147 Nigeria NA <NA>
## 6 147 Nigeria NA <NA>
## gtd_targsubtype2 gtd_targsubtype2_txt
## 1 NA <NA>
## 2 NA <NA>
## 3 15 Politician or Political Party Movement/Meeting/Rally
## 4 NA <NA>
## 5 NA <NA>
## 6 NA <NA>
## gtd_corp2 gtd_target2
## 1 <NA> <NA>
## 2 <NA> <NA>
## 3 Niger Government The Niger state governor was targeted in the attack.
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 <NA> <NA>
## gtd_natlty2 gtd_natlty2_txt gtd_targtype3 gtd_targtype3_txt gtd_targsubtype3
## 1 NA <NA> <NA> <NA> <NA>
## 2 NA <NA> <NA> <NA> <NA>
## 3 147 Nigeria <NA> <NA> <NA>
## 4 NA <NA> <NA> <NA> <NA>
## 5 NA <NA> <NA> <NA> <NA>
## 6 NA <NA> <NA> <NA> <NA>
## gtd_targsubtype3_txt gtd_corp3 gtd_target3 gtd_natlty3 gtd_natlty3_txt
## 1 <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA>
## gtd_actor_tax gtd_gsubname gtd_gname2 gtd_gsubname2 gtd_gname3 gtd_gsubname3
## 1 <NA> <NA> <NA> <NA> <NA> <NA>
## 2 Boko Haram <NA> <NA> <NA> <NA> <NA>
## 3 Unknown <NA> <NA> <NA> <NA> <NA>
## 4 Boko Haram <NA> <NA> <NA> <NA> <NA>
## 5 Boko Haram <NA> <NA> <NA> <NA> <NA>
## 6 Boko Haram <NA> <NA> <NA> <NA> <NA>
## gtd_motive
## 1 <NA>
## 2 The specific motive for the attack is unknown.
## 3 The specific motive for the attack is unknown.
## 4 Specific motive is unknown; however, Boko Haram is engaged in an active campaign to enforce Sharia law.
## 5 Unknown
## 6 Unknown
## gtd_guncertain1 gtd_guncertain2 gtd_guncertain3 gtd_individual gtd_nperps
## 1 NA NA <NA> NA NA
## 2 1 NA <NA> 0 2
## 3 0 NA <NA> 0 1
## 4 1 NA <NA> 0 -99
## 5 1 NA <NA> 0 3
## 6 1 NA <NA> 0 20
## gtd_nperpcap gtd_claimed gtd_claimmode gtd_claimmode_txt gtd_claim2
## 1 NA NA NA <NA> NA
## 2 0 0 NA <NA> NA
## 3 0 0 NA <NA> NA
## 4 0 0 NA <NA> NA
## 5 0 0 NA <NA> NA
## 6 0 0 NA <NA> NA
## gtd_claimmode2 gtd_claimmode2_txt gtd_claim3 gtd_claimmode3
## 1 <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA>
## gtd_claimmode3_txt gtd_compclaim gtd_weaptype1 gtd_weaptype1_txt
## 1 <NA> <NA> NA <NA>
## 2 <NA> <NA> 5 Firearms
## 3 <NA> <NA> 6 Explosives/Bombs/Dynamite
## 4 <NA> <NA> 5 Firearms
## 5 <NA> <NA> 5 Firearms
## 6 <NA> <NA> 6 Explosives/Bombs/Dynamite
## gtd_weapsubtype1 gtd_weapsubtype1_txt gtd_weaptype2
## 1 NA <NA> NA
## 2 5 Unknown Gun Type NA
## 3 17 Other Explosive Type NA
## 4 4 Rifle/Shotgun (non-automatic) NA
## 5 2 Automatic Weapon NA
## 6 16 Unknown Explosive Type NA
## gtd_weaptype2_txt gtd_weapsubtype2 gtd_weapsubtype2_txt gtd_weaptype3
## 1 <NA> NA <NA> NA
## 2 <NA> NA <NA> NA
## 3 <NA> NA <NA> NA
## 4 <NA> NA <NA> NA
## 5 <NA> NA <NA> NA
## 6 <NA> NA <NA> NA
## gtd_weaptype3_txt gtd_weapsubtype3 gtd_weapsubtype3_txt gtd_weaptype4
## 1 <NA> NA <NA> <NA>
## 2 <NA> NA <NA> <NA>
## 3 <NA> NA <NA> <NA>
## 4 <NA> NA <NA> <NA>
## 5 <NA> NA <NA> <NA>
## 6 <NA> NA <NA> <NA>
## gtd_weaptype4_txt gtd_weapsubtype4 gtd_weapsubtype4_txt
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 <NA> <NA> <NA>
## gtd_weapdetail gtd_nkill gtd_nkillus
## 1 <NA> NA NA
## 2 Unknown firearms were used in the attack. 2 0
## 3 An improvised explosive device was used in the attack. 10 0
## 4 Kalashnikov rifles 5 0
## 5 Kalashnikov rifles 1 0
## 6 A building was bombed and burnt down 0 0
## gtd_nkillter gtd_nwound gtd_nwoundus gtd_nwoundte gtd_property gtd_propextent
## 1 NA NA NA NA NA NA
## 2 0 0 0 0 0 NA
## 3 0 34 0 0 -9 NA
## 4 0 0 0 0 -9 NA
## 5 0 0 0 0 -9 NA
## 6 0 0 0 0 1 4
## gtd_propextent_txt gtd_propvalue
## 1 <NA> NA
## 2 <NA> NA
## 3 <NA> NA
## 4 <NA> NA
## 5 <NA> NA
## 6 Unknown NA
## gtd_propcomment
## 1 <NA>
## 2 <NA>
## 3 The attack caused an unknown amount of property damage to the school.
## 4 <NA>
## 5 <NA>
## 6 A police post was bombed and burned down
## gtd_ishostkid gtd_nhostkid gtd_nhostkidus gtd_nhours gtd_ndays gtd_divert
## 1 NA NA NA NA NA <NA>
## 2 0 NA NA NA NA <NA>
## 3 0 NA NA NA NA <NA>
## 4 0 NA NA NA NA <NA>
## 5 0 NA NA NA NA <NA>
## 6 0 NA NA NA NA <NA>
## gtd_kidhijcountry gtd_ransom gtd_ransomamt gtd_ransomamtus gtd_ransompaid
## 1 <NA> NA NA <NA> NA
## 2 <NA> NA NA <NA> NA
## 3 <NA> NA NA <NA> NA
## 4 <NA> NA NA <NA> NA
## 5 <NA> NA NA <NA> NA
## 6 <NA> NA NA <NA> NA
## gtd_ransompaidus gtd_ransomnote gtd_hostkidoutcome gtd_hostkidoutcome_txt
## 1 <NA> <NA> NA <NA>
## 2 <NA> <NA> NA <NA>
## 3 <NA> <NA> NA <NA>
## 4 <NA> <NA> NA <NA>
## 5 <NA> <NA> NA <NA>
## 6 <NA> <NA> NA <NA>
## gtd_nreleased
## 1 NA
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
## gtd_addnotes
## 1 <NA>
## 2 <NA>
## 3 The most recent available sources listed the fatalities for this attack from three to 10, and the injuries for this attack from 21 to 34, so the majority casualty figures have been used in order to preserve statistical accuracy in the database.
## 4 <NA>
## 5 <NA>
## 6 <NA>
## gtd_scite1
## 1 <NA>
## 2 Yahoo News, "Police: Radical Sect Kills Three in Northeast Nigeria," Associated Press, http://news.yahoo.com/police-radical-sect-kills-3-northeast-nigeria-163009693.html;_ylt=AsyNFAPfBy7lw5yKvmQ_G6696Q8F;_ylu=X3oDMTQ0Nmc3b2t0BG1pdANUb3BTdG9yeSBXb3JsZFNGIEFmcmljYVNTRgRwa2cDNzc5YjBjYTItMWU1Mi0zYTQzLThhMmItMTFhMmQ1NmQ3ODRjBHBvcwM5BHNlYwN0b3Bfc3RvcnkEdmVyAzE2NTZkNmEwLWVkZGQtMTFlMC1iOWJmLTdhNDJlZDhhODIyZg--;_ylg=X3oDMTFxaTJhMjZtBGludGwDdXMEbGFuZwNlbi11cwRwc3RhaWQDBHBzdGNhdAN3b3JsZHxhZnJpY2EEcHQDc2VjdGlvbnM-;_ylv=3 (October 3, 2011).
## 3 Chinwendu Nnadozi, "Explosion Kills 10 at PDP Rally in Niger," Daily Independent, March 03, 2011, http://www.independentngonline.com/DailyIndependent/Article.aspx?id=29876.
## 4 "Update:APNewsNow," Associated Press, December 17, 2011.
## 5 Nazifi Dawud Khalid, "Gunmen Kill Protocol Officer, Herbalist in Maiduguri," Daily Trust, November 28, 2011.
## 6 "Suspected Islamists attack Nigerian police post, govt office," Agence France Presse, November 10, 2011.
## gtd_scite2
## 1 <NA>
## 2 Yahoo News, "Gunmen Kill Three in Violence-torn Nigerian City: Police," Agence France Presse, http://news.yahoo.com/gunmen-kill-three-violence-torn-nigerian-city-police-181747975.html;_ylt=Argzz.1hYeHUn2SzxZ5zGEK96Q8F;_ylu=X3oDMTQ0aW8zb2JmBG1pdANUb3BTdG9yeSBXb3JsZFNGIEFmcmljYVNTRgRwa2cDMTJkNTg0ZTEtZTJiOC0zYjViLWFhNGMtYWViN2YzZmVmMGFhBHBvcwM0BHNlYwN0b3Bfc3RvcnkEdmVyAzYxN2FmZGYwLWVkZWMtMTFlMC1iNmYxLTExZmRmMGMzOTllOA--;_ylg=X3oDMTFxaTJhMjZtBGludGwDdXMEbGFuZwNlbi11cwRwc3RhaWQDBHBzdGNhdAN3b3JsZHxhZnJpY2EEcHQDc2VjdGlvbnM-;_ylv=3 (October 3, 2011).
## 3 Xinhua News Agency, "Ten People Feared Killed in Nigeria Rally Bomb Blast," Xinhua News Agency, March 04, 2011, http://news.xinhuanet.com/english2010/world/2011-03/04/c_13760030.htm.
## 4 <NA>
## 5 <NA>
## 6 Michael Olugbode and John Shiklam, "Gunmen Attack Police Station, FRSC Office," This Day, November 11, 2011.
## gtd_scite3
## 1 <NA>
## 2 Washington Post, "Police: Radical Muslim Sect Kills Tea Seller, Pharmacist and Bystander in Nigeria's Northeast," Associated Press, October 3, 2011, http://www.washingtonpost.com/world/africa/police-radical-muslim-sect-kills-tea-seller-pharmacist-and-bystander-in-nigerias-northeast/2011/10/03/gIQAPQoFIL_story.html.
## 3 Jane\xd5s Intelligence, \xd2IED Attack Targets Political Rally near Nigerian Capital,\xd3 Terrorism Watch Report, March 03, 2011.
## 4 <NA>
## 5 <NA>
## 6 <NA>
## gtd_dbsource gtd_INT_LOG gtd_INT_IDEO gtd_INT_MISC gtd_INT_ANY
## 1 <NA> NA NA NA NA
## 2 ISVG 0 0 0 0
## 3 ISVG -9 -9 0 -9
## 4 START Primary Collection 0 0 0 0
## 5 START Primary Collection 0 0 0 0
## 6 START Primary Collection 0 0 0 0
## gtd_related gtd_date gtd_data.source gtd_enddate
## 1 <NA> <NA> <NA> <NA>
## 2 <NA> 2011-10-03 gtd 2011-10-03
## 3 <NA> 2011-03-02 gtd 2011-03-02
## 4 <NA> 2011-12-15 gtd 2011-12-15
## 5 201111270022, 201111270023 2011-11-27 gtd 2011-11-27
## 6 201111090012, 201111090013 2011-11-09 gtd 2011-11-09
## scad_eventid scad_id scad_ccode scad_countryname scad_date scad_enddate
## 1 4751061 1061 475 Nigeria 2011-11-03 2011-11-03
## 2 NA NA NA <NA> <NA> <NA>
## 3 4750991 991 475 Nigeria 2011-03-03 2011-03-03
## 4 4751078 1078 475 Nigeria 2011-12-15 2011-12-15
## 5 NA NA NA <NA> <NA> <NA>
## 6 4751065 1065 475 Nigeria 2011-11-10 2011-11-10
## scad_duration scad_stday scad_stmo scad_styr scad_eday scad_emo scad_eyr
## 1 1 3 11 2011 3 11 2011
## 2 NA NA NA NA NA NA NA
## 3 1 3 3 2011 3 3 2011
## 4 1 15 12 2011 15 12 2011
## 5 NA NA NA NA NA NA NA
## 6 1 10 11 2011 10 11 2011
## scad_event_tax scad_escalation scad_actor_tax scad_actor2 scad_actor3
## 1 8 0 Suspected militant <NA> <NA>
## 2 NA NA <NA> <NA> <NA>
## 3 9 0 Unknown assailants <NA> <NA>
## 4 9 0 Suspected militants <NA> <NA>
## 5 NA NA <NA> <NA> <NA>
## 6 8 0 Suspected militants <NA> <NA>
## scad_target1 scad_target2 scad_cgovtarget scad_rgovtarget
## 1 Soldiers <NA> 1 0
## 2 <NA> <NA> NA NA
## 3 Peoples Demoncratic Party <NA> 0 0
## 4 Civilians <NA> 0 0
## 5 <NA> <NA> NA NA
## 6 Police <NA> 1 0
## scad_npart scad_ndeath scad_repress scad_elocal scad_ilocal scad_sublocal
## 1 1 1 0 Maiduguri Maiduguri 1
## 2 NA NA NA <NA> <NA> NA
## 3 1 4 0 Suleja Suleja 1
## 4 -99 5 0 Maiduguri Maiduguri 1
## 5 NA NA NA <NA> <NA> NA
## 6 -99 2 0 Mainok Mainok 1
## scad_locnum scad_gislocnum scad_issue1 scad_issue2 scad_issue3
## 1 2 2 6 NA NA
## 2 NA NA NA NA NA
## 3 2 2 1 NA NA
## 4 2 2 6 NA NA
## 5 NA NA NA NA NA
## 6 3 3 6 NA NA
## scad_issuenote
## 1 A suspected militant opens fire, killing a soldier.
## 2 <NA>
## 3 Assailants toss a bomb at an election rally from a moving car. They miss their target, instead hitting a roadside vegetable market.
## 4 Suspected Boko Haram militants shoot dead 5 civilians in a drive-by shooting.
## 5 <NA>
## 6 Suspected militants bomb a police station.
## scad_nsource scad_notes scad_coder scad_acd_questionable scad_latitude
## 1 0 <NA> CL 1 11.83330
## 2 <NA> <NA> <NA> NA NA
## 3 1 <NA> CL 1 9.18052
## 4 1 <NA> CL 1 11.83330
## 5 <NA> <NA> <NA> NA NA
## 6 1 <NA> CL 1 11.82880
## scad_longitude scad_geo_comments scad_location_precision scad_year
## 1 13.15000 <NA> <NA> 2011
## 2 NA <NA> <NA> NA
## 3 7.17933 <NA> <NA> 2011
## 4 13.15000 <NA> <NA> 2011
## 5 NA <NA> <NA> NA
## 6 12.63450 <NA> <NA> 2011
## scad_data.source ged_id ged_year ged_active_year ged_event_tax
## 1 scad 42105 2011 1 1
## 2 <NA> 42306 2011 1 3
## 3 scad 42272 2011 1 3
## 4 scad NA NA NA NA
## 5 <NA> 42110 2011 1 1
## 6 scad 42106 2011 1 1
## ged_conflict_new_id ged_conflict_name
## 1 297 Nigeria:Government
## 2 1850 Jama'atu Ahlis Sunna Lidda'awati wal-Jihad - Civilians
## 3 1850 Jama'atu Ahlis Sunna Lidda'awati wal-Jihad - Civilians
## 4 NA <NA>
## 5 297 Nigeria:Government
## 6 297 Nigeria:Government
## ged_dyad_new_id
## 1 640
## 2 2332
## 3 2332
## 4 NA
## 5 640
## 6 640
## ged_dyad_name
## 1 Government of Nigeria - Jama'atu Ahlis Sunna Lidda'awati wal-Jihad
## 2 Jama'atu Ahlis Sunna Lidda'awati wal-Jihad - Civilians
## 3 Jama'atu Ahlis Sunna Lidda'awati wal-Jihad - Civilians
## 4 <NA>
## 5 Government of Nigeria - Jama'atu Ahlis Sunna Lidda'awati wal-Jihad
## 6 Government of Nigeria - Jama'atu Ahlis Sunna Lidda'awati wal-Jihad
## ged_side_a_new_id ged_gwnoa ged_actor_tax
## 1 84 475 Government of Nigeria
## 2 1051 NA Jama'atu Ahlis Sunna Lidda'awati wal-Jihad
## 3 1051 NA Jama'atu Ahlis Sunna Lidda'awati wal-Jihad
## 4 NA NA <NA>
## 5 84 475 Government of Nigeria
## 6 84 475 Government of Nigeria
## ged_side_b_new_id ged_gwnob ged_side_b
## 1 1051 <NA> Jama'atu Ahlis Sunna Lidda'awati wal-Jihad
## 2 1 <NA> Civilians
## 3 1 <NA> Civilians
## 4 NA <NA> <NA>
## 5 1051 <NA> Jama'atu Ahlis Sunna Lidda'awati wal-Jihad
## 6 1051 <NA> Jama'atu Ahlis Sunna Lidda'awati wal-Jihad
## ged_number_of_sources
## 1 -1
## 2 -1
## 3 -1
## 4 NA
## 5 -1
## 6 -1
## ged_source_article
## 1 Agence France Presse 4 November 2011 "Soldier shot dead amid arms searches in Nigeria
## 2 Agence France Presse 3 October 2011 "Gunmen kill three in violence-torn Nigerian city: police
## 3 Reuters News 3 March 2011 "UPDATE 1-Three killed in explosion at Nigeria election rally"; Agence France Presse 8 March 2011 "Nigerian opposition politician charged with bombing"; The Guardian/BBC 14 September 2011 "Nigerian court remands suspected Islamic sect members to custody
## 4 <NA>
## 5 Agence France Presse 28 November 2011 "Nigerian Islamists kill state governor's aide"; Daily Trust/BBC 28 November 2011 "Nigeria: Suspected Boko Haram gunmen kill protocol officer in Borno State
## 6 Agence France Presse 10 November 2011 "Suspected Islamists attack Nigerian police post, govt office
## ged_source_office ged_source_date ged_source_headline
## 1 <NA> NA <NA>
## 2 <NA> NA <NA>
## 3 <NA> NA <NA>
## 4 <NA> NA <NA>
## 5 <NA> NA <NA>
## 6 <NA> NA <NA>
## ged_source_original ged_where_prec ged_where_coordinates
## 1 Borno state police commissioner 1 Maiduguri town
## 2 state police chief 1 Maiduguri town
## 3 police 1 Suleja town
## 4 <NA> NA <NA>
## 5 police 1 Maiduguri town
## 6 police, witnesses 1 Mainok town
## ged_adm_1 ged_adm_2 ged_latitude ged_longitude
## 1 Borno state Maiduguri lga 11.84644 13.16027
## 2 Borno state Maiduguri lga 11.84644 13.16027
## 3 Niger state Suleja lga 9.18052 7.17933
## 4 <NA> <NA> NA NA
## 5 Borno state Maiduguri lga 11.84644 13.16027
## 6 Borno state Kaga lga 11.83022 12.63067
## ged_geom_wkt ged_priogrid_gid ged_country ged_country_id
## 1 POINT (13.160274 11.846440) 146547 Nigeria 475
## 2 POINT (13.160274 11.846440) 146547 Nigeria 475
## 3 POINT (7.179330 9.180520) 142935 Nigeria 475
## 4 <NA> NA <NA> NA
## 5 POINT (13.160274 11.846440) 146547 Nigeria 475
## 6 POINT (12.630670 11.830220) 146546 Nigeria 475
## ged_region ged_event_clarity ged_date_prec ged_date ged_enddate
## 1 Africa 1 1 2011-11-02 2011-11-02
## 2 Africa 1 1 2011-10-03 2011-10-03
## 3 Africa 1 1 2011-03-03 2011-03-03
## 4 <NA> NA NA <NA> <NA>
## 5 Africa 1 1 2011-11-27 2011-11-27
## 6 Africa 1 1 2011-11-09 2011-11-09
## ged_deaths_a ged_deaths_b ged_deaths_civilians ged_deaths_unknown ged_best
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 0 0 0 0 0
## 4 NA NA NA NA NA
## 5 0 0 0 0 0
## 6 0 0 0 0 0
## ged_low ged_high ged_data.source acled_gwno acled_event_id_cnty
## 1 0 1 ged NA <NA>
## 2 0 3 ged NA <NA>
## 3 0 3 ged NA <NA>
## 4 NA NA <NA> NA <NA>
## 5 0 2 ged NA <NA>
## 6 0 4 ged NA <NA>
## acled_event_id_no_cnty acled_date acled_year acled_time_precision
## 1 NA <NA> NA NA
## 2 NA <NA> NA NA
## 3 NA <NA> NA NA
## 4 NA <NA> NA NA
## 5 NA <NA> NA NA
## 6 NA <NA> NA NA
## acled_event_tax acled_actor_tax acled_ally_actor_1 acled_inter1
## 1 <NA> <NA> <NA> NA
## 2 <NA> <NA> <NA> NA
## 3 <NA> <NA> <NA> NA
## 4 <NA> <NA> <NA> NA
## 5 <NA> <NA> <NA> NA
## 6 <NA> <NA> <NA> NA
## acled_actor1_id acled_actor2 acled_ally_actor_2 acled_inter2 acled_actor2_id
## 1 NA <NA> <NA> NA NA
## 2 NA <NA> <NA> NA NA
## 3 NA <NA> <NA> NA NA
## 4 NA <NA> <NA> NA NA
## 5 NA <NA> <NA> NA NA
## 6 NA <NA> <NA> NA NA
## acled_interaction acled_actor_dyad_id acled_country acled_admin1 acled_admin2
## 1 NA <NA> <NA> <NA> <NA>
## 2 NA <NA> <NA> <NA> <NA>
## 3 NA <NA> <NA> <NA> <NA>
## 4 NA <NA> <NA> <NA> <NA>
## 5 NA <NA> <NA> <NA> <NA>
## 6 NA <NA> <NA> <NA> <NA>
## acled_admin3 acled_location acled_latitude acled_longitude
## 1 <NA> <NA> NA NA
## 2 <NA> <NA> NA NA
## 3 <NA> <NA> NA NA
## 4 <NA> <NA> NA NA
## 5 <NA> <NA> NA NA
## 6 <NA> <NA> NA NA
## acled_geo_precision acled_source acled_notes acled_fatalities
## 1 NA <NA> <NA> NA
## 2 NA <NA> <NA> NA
## 3 NA <NA> <NA> NA
## 4 NA <NA> <NA> NA
## 5 NA <NA> <NA> NA
## 6 NA <NA> <NA> NA
## acled_data.source acled_enddate
## 1 <NA> <NA>
## 2 <NA> <NA>
## 3 <NA> <NA>
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 <NA> <NA>
The number of entries corresponds with the number of located matches.
dim(dups)
## [1] 150 263
By examining the nature of the overlapping output, we can get a better understanding of what events matched and why. The information regarding overlapping events could be just as valuable as a de-duplicated frame, given the research question. meltt.duplicates()
allows the research to make these kinds of inquiries.
Again, let’s extract descriptive information to qualitatively compare events, given the available text descriptions of the events. This offers a quick way to see how well our input assumptions performed when merging events.
dups2 <- meltt_duplicates(output,
columns = c("notes","summary","issuenote",
"source_headline"))
head(dups2)
## acled_dataset acled_event ged_dataset ged_event scad_dataset scad_event
## 1 0 0 2 66 3 105
## 2 0 0 2 194 0 0
## 3 0 0 2 160 3 21
## 4 0 0 0 0 3 123
## 5 0 0 2 70 0 0
## 6 0 0 2 67 3 110
## gtd_dataset gtd_event
## 1 0 0
## 2 4 114
## 3 4 32
## 4 4 163
## 5 4 156
## 6 4 143
## gtd_summary
## 1 <NA>
## 2 10/03/2011: On Monday morning, in Maiduguri, Borno, Nigeria, two militants fired upon and killed a tea seller and a civilian outside a tea shop. No group has claimed responsibility, but the militant group Boko Haram was thought to be responsible for the attack.
## 3 03/02/2011: On Wednesday afternoon around 1330, in Suleja, Niger, Nigeria, 10 people were killed and 34 others were injured when one man threw an improvised explosive device at a Peoples Democratic Party campaign rally for Niger Governor Babangida Aliyu, at a Nigerian government secondary school. Babangida Aliyu was not injured, but one bus sustained an unknown amount of damage in the attack. No group has claimed responsibility for the attack.
## 4 12/15/2011: Suspected members of Boko Haram opened fire on a group of civilians standing outside of a shop in Maiduguri city, Borno state, Nigeria. Five civilians were killed in the shooting; however, there were no reported injuries. The assailants were traveling in a vehicle at the time of the attack and fled the scene following the shooting.
## 5 11/27/2011: Three unidentified gunmen shot and killed a government employee in Gwange ward of Maiduguri city, Borno state, Nigeria. The victim, Kala Boro, was a protocol officer for the Borno state Government House. The assailants followed him home from work and shot him while he was in his car. This was one of two multiple incidents; the assailants killed an herbalist in a separate incident after killing Boro. No group claimed responsibility for the incident; however, sources suspect the involvement of Boko Harm.
## 6 11/9/2011: Approximately 20 suspected members of Boko Haram attacked a police station in Mainok village, Borno state, Nigeria. The assailants threw explosives inside and burned the police station down; there were no reported injuries as the police station had been closed some time before. The attack on the police station happened in conjunction with an attack on a federal road safety office in the same village.
## scad_issuenote
## 1 A suspected militant opens fire, killing a soldier.
## 2 <NA>
## 3 Assailants toss a bomb at an election rally from a moving car. They miss their target, instead hitting a roadside vegetable market.
## 4 Suspected Boko Haram militants shoot dead 5 civilians in a drive-by shooting.
## 5 <NA>
## 6 Suspected militants bomb a police station.
## scad_notes ged_source_headline acled_notes
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 <NA> <NA> <NA>
As noted above, event-to-episode matches are flagged, but not automatically matched. To do this, the user needs to inspect the flagged entries and dictate which are actual matches and which are not. Again, we implement the user step given that events and episodes technically occur at different units of analysis, and thus require discretion when ultimately determining their status as unique or duplicate entries. Note that we are developing a shiny app to help ease this assessment process.
The meltt.inspect()
function streamlines this process. The function outputs a list that contains comparative information on each potential event and episode match.
assess = meltt_inspect(output)
##
## Note:
## 40 entries flagged as event-to-episode matches. List generated for user evaluation for all potential matches.
The user then manually reviews each entry by cycling through the outputted list object.
# Information on the event
assess[[1]]$`Flagged Event Information`
## dataset obs.count data.source date enddate latitude longitude
## 61 1 60 acled 2011-02-07 2011-02-07 7.5887 8.2087
## event_tax actor_tax
## 61 Violence against civilians Fulani Ethnic Militia (Nigeria)
# Information on the episode
assess[[1]]$`Flagged Episode Information`
## dataset obs.count data.source date enddate latitude longitude
## 476 2 118 ged 2011-02-07 2011-02-11 7.5834 8.2055
## event_tax actor_tax
## 476 2 Fulani
The function takes an object of class meltt
and has two accompanying arguments: columns
and confirmed_matches
. When two events are found to match, the user can specify this information to fold in those entries as de-duplicated entries in the return frame. To accomplish this, the user must provide a Boolean argument that is equal length of the total number of flagged entries. In this manner, flagged entries marked as TRUE
are treated as matches, and those marked as FALSE
are treated as unique. The returned frame then reflects output similar to meltt.data()
.
By way of example:
length(assess)
## [1] 40
retain = rep(F,length(assess))
retain[1:20] = T # Let's say half are ID'ed as duplicates
retain
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE
uevents3 = meltt_inspect(output,columns="event_tax",confirmed_matches = retain)
## All confirmed event-to-episode duplicates have been removed.
dim(uevents3)
## [1] 687 6
Note that the total number of de-duplicated events has fallen, reflecting the newly identified (and now removed) duplicates of existing events. 691 of the original de-duplicated entries reduces to 687. Note that this reduction is a feature of the fact that multiple events can cling to the same episode. Thus, there are 40 events flagged as matching to episodes, but only 18 unique episodes that can potentially be removed as duplicates. Thus, emphasizing the need for user discretion.
Like most S3 objects, the output from meltt
is a nested list containing a range of useful information. The output from meltt
retains the original input data and taxonomies and the specification assumptions as well as lists of contender events (i.e. events that were flagged as potential matches but did not match as closely as another event). Note that we are expanding meltt’s functionality to include more posterior function to ease extraction of this information, but for now, it can simply be accessed using the usual $
key convention.
names(output)
## [1] "processed" "inputData" "parameters" "inputDataNames"
## [5] "taxonomy"
head(output$processed$event_contenders)
## dataset event bestmatch_data bestmatch_event bestmatch_score runnerUp1_data
## 1 1 24 2 7 0.5833333 0
## 2 1 58 2 85 0.3333333 0
## 3 1 69 2 236 0.5000000 0
## 4 1 78 2 8 0.4166667 0
## 5 1 103 2 106 0.3333333 0
## 6 1 177 2 204 0.2500000 0
## runnerUp1_event runnerUp1_score runnerUp2_data runnerUp2_event
## 1 0 0 0 0
## 2 0 0 0 0
## 3 0 0 0 0
## 4 0 0 0 0
## 5 0 0 0 0
## 6 0 0 0 0
## runnerUp2_score events_matched
## 1 0 1
## 2 0 1
## 3 0 1
## 4 0 1
## 5 0 1
## 6 0 1