---
author: "Joseph Larmarange"
title: "Introduction to prevR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to prevR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r include=FALSE}
Sys.setenv(LANG = "en")
```

This package performs a methodological approach for spatial estimation of regional trends of a prevalence using data from surveys using a stratified two-stage sample design (as Demographic and Health Surveys). In these kind of surveys, positive and control cases are spatially positioned at the centre of their corresponding surveyed cluster.

This package provides functions to estimate a prevalence surface using a kernel estimator with adaptative bandwidths of equal number of persons surveyed (a variant of the nearest neighbor technique) or with fixed bandwidths. The prevalence surface could also be calculated using a spatial interpolation (kriging or inverse distance weighting) after a moving average smoothing based on circles of equal number of observed persons or circles of equal radius.

With the kernel estimator approach, it's also possible to estimate a surface of relative risks.

The methodological approach has been described in: 

- Larmarange Joseph, Vallo Roselyne, Yaro Seydou, Msellati Philippe and Meda Nicolas (2011) "Methods for mapping regional trends of HIV prevalence from Demographic and Health Surveys (DHS)", *Cybergeo: European Journal of Geography*, no 558, <https://journals.openedition.org/cybergeo/24606>, DOI: 10.4000/cybergeo.24606

Application to generate HIV prevalence surfaces can be found at:

- Larmarange Joseph and Bendaud Victoria (2014) "HIV estimates at second subnational level from national population-based survey", AIDS, n° 28, p. S469-S476, DOI: 10.1097/QAD.0000000000000480

Other papers using **prevR** could be found on [Google Scholar](https://scholar.google.fr/scholar?oi=bibs&hl=fr&cites=1938663060375810997,6018660907435103505).

## Importing data

To create a **prevR** object, you need three elements:

- a data.frame with one row per survey cluster and containing the number of observations, the number of positive cases and coordinates of the cluster (you could optionally use weighted numbers)
- a vector identifying the columns of the data.frame containing the corresponding variables
- an optional `SpatialPolygons` defining the studied area

```{r}
library(prevR, quietly = TRUE)
col <- c(
  id = "cluster",
  x = "x",
  y = "y",
  n = "n",
  pos = "pos",
  c.type = "residence",
  wn = "weighted.n",
  wpos = "weighted.pos"
)
dhs <- as.prevR(fdhs.clusters, col, fdhs.boundary)
str(dhs)
print(dhs)
```

An interactive helper function `import.dhs()` could be used to compute statistics per cluster and to generate the **prevR** object for those who downloaded individual files (SPSS format) and location of clusters (dbf format) from DHS website (<https://dhsprogram.com/>).

```{r, eval = FALSE}
imported_data <- import.dhs("data.sav", "gps.dbf")
```

Boudaries of a specific country could be obtained with `create.boundary()`.

## Plotting a prevR object

```{r, fig.width=6, fig.height=6}
plot(dhs, main = "Clusters position")
plot(dhs, type = "c.type", main = "Clusters by residence")
plot(dhs, type = "count", main = "Observations by cluster")
plot(dhs, type = "flower", main = "Positive cases by cluster")
```

## Changing coordinates projection

```{r, fig.width=6, fig.height=6}
plot(dhs, axes = TRUE)
dhs <- changeproj(
  dhs,
  "+proj=utm +zone=30 +ellps=WGS84 +datum=WGS84 +units=m +no_defs"
)
print(dhs)
plot(dhs, axes = TRUE)
```

## Quick analysis

Function `quick.prevR()` allows to perform a quick analysis:

- an optimal value of **N** will be computed with `Noptim()`
- adaptative bandwidths will be calculated with `rings()`
- a prevalence surface will be computed with `kde()`
- the surface of rings radii will be generated with `krige()`
- a **ggplot2** of the prevalence surface will be generated and rings radii will be added as a contour plot

```{r, include=FALSE}
qa <- quick.prevR(
  fdhs,
  return.results = TRUE,
  return.plot = TRUE,
  plot.results = FALSE,
  progression = FALSE
)
```

```{r, eval=FALSE}
quick.prevR(fdhs)
```


```{r, echo=FALSE, fig.width=6, fig.height=6}
qa$plot
```

Several values of **N** could be specified, and several options allows you to return detailed results.

```{r, fig.width=8, fig.height=4}
res <- quick.prevR(
  fdhs,
  N = c(100, 200, 300),
  return.results = TRUE,
  return.plot = TRUE,
  plot.results = FALSE,
  progression = FALSE,
  nb.cells = 50
)
res$plot
```

## Step by step analysis

```{r, fig.width=6, fig.height=6}
# Calculating rings of the same number of observations for different values of N
dhs <- rings(fdhs, N = c(100, 200, 300, 400, 500), progression = FALSE)
print(dhs)
summary(dhs)

# Prevalence surface for N=300
prev.N300 <- kde(dhs, N = 300, nb.cells = 200, progression = FALSE)
plot(
  prev.N300["k.wprev.N300.RInf"],
  pal = prevR.colors.red,
  lty = 0,
  main = "Regional trends of prevalence (N=300)"
)

# with ggplot2
library(ggplot2)
ggplot(prev.N300) +
  aes(fill = k.wprev.N300.RInf) +
  geom_sf(colour = "transparent") +
  scale_fill_gradientn(colours = prevR.colors.red()) +
  labs(fill = "Prevalence (%)") +
  theme_prevR_light()

# Surface of rings' radius
radius.N300 <- krige("r.radius", dhs, N = 300, nb.cells = 200)
plot(
  radius.N300,
  pal = prevR.colors.blue,
  lty = 0,
  main = "Radius of circle (N=300)"
)
```

## Functions and methods provided by prevR

The content of **prevR** can be broken up as follows:

### Datasets

- `fdhs` is a fictive dataset used for testing the package.
- `TMWorldBorders` provides national borders of every countries in the World and could be used to define the limits of the studied area.

### Creating objects

prevR functions takes as input objects of class prevR.

- `import.dhs()` allows to import easily, through a step by step procedure, data from a DHS (Demographic and Health Surveys) downloaded from http://www.measuredhs.com.
- `as.prevR()` is a generic function to create an object of class prevR.
- `create.boundary()` could be used to select borders of a country and transfer them to as.prevR in order to define the studied area.

### Data visualization

- Methods `show()`, `print()` and `summary()` display a summary of a object of class prevR.
- The method `plot()` could be used on a object of class prevR for visualizing the studied area, spatial position of clusters, number of observations or number of positive cases by cluster.

### Data manipulation

- The method `changeproj()` changes the projection of the spatial coordinates.
- The method `as.data.frame()` converts an object of class prevR into a data frame.
- The method `export()` export data and/or the studied area in a text file, a dbf file or a shapefile.

### Data analysis

- `rings()` calculates rings of equal number of observations and/or equal radius.
- `kde()` calculates a prevalence surface or a relative risks surface using gaussian kernel density estimators (kde) with adaptative bandwidths.
- `krige()` executes a spatial interpolation using an ordinary kriging.
- `idw()` executes a spatial interpolation using an inverse distance weighting (idw) technique.

### Results visualization and export

- Outputs of `kde()`, `krige()` and `idw()` are objects of class `SpatialPixelsDataFrame` (**sp** package).
- Results could be plotted using the function `spplot()` from **sp**.
- prevR provides several continuous color palettes (see `prevR.colors`) compatible with `spplot()`.
- Calculated surfaces could be export using the function `writeRaster()`  from **terra** (see examples in the documentation of `kde()` and `krige()`.