^{1}

^{*}

^{1}

^{1}

Wildlife conservation is essential, especially for countries like Kenya which rely on tourism as a major earner of foreign exchange. Conservation of species with minimal ecological information such as Grevy’s zebra, though a challenge, is critical to enable the future survival of such species. Grevy’s and Plains zebra have been classified as endangered and near-threatened by International Union for Conservation of Nature and Natural Resources (IUCN) respectively, with Grevy’s zebra found mostly in Northern Kenya and Ethiopia. This has been due to habitat degradation from livestock grazing, local hunting and development of resorts. Six prediction variables
*i.e*. rainfall, temperature, land use, population, NDVI and cattle occurrence were used in Maxent algorithm to produce a habitat prediction map for both species. Both prediction maps had an AUC > 0.75, which is adequate for conservation planning. Niche similarity based on Warren’s
*I* index (I = 0.78) indicates that both zebra species are identical based on their occupied niche environments, suggesting that similar conversation strategies can be adopted for both species.

Global tourism has experienced continued growth, accompanied with deepening diversification, to become one of the fastest growing economic sectors in the world, contributing 10.2% of global GDP in 2016. In Kenya, it contributed 9.8% of Kenya’s GDP in 2016, accounting for 9.2% of total employment in Kenya [

Grevy’s zebra is a zebra species found in Kenya and Ethiopia. It has been assessed as endangered under criterion A2acd [

Attempts to model both Plains and Grevy’s zebra niche have been few. Studies have majorly focused on enumeration of zebra population counts [

In order to foster knowledge regarding the conservation of Grevy’s and Plains zebras and to provide insights that can be used for developing conservation efforts, this study has formulated two objectives: 1) To model and evaluate the ecological niche of both Grevy’s and Plains zebras and 2) To assess the level of niche similarity based on niche similarity indices.

Laikipia County is one of the 47 counties in the Republic of Kenya (^{2}. The county mainly consists of a plateau bordered by the Great Rift Valley to the West, the Aberdare to the South and Mt. Kenya massifs to the South East. The annual average rainfall varies between 400 mm and 750 mm though higher annual rainfall totals are observed on the areas bordering the slopes of Mt. Kenya and the Aberdare Ranges. The county has a gazetted forest area of 580 Km^{2} comprising of both the indigenous and plantation forests. The county population is estimated at 479,072 as of 2017 [

Data used in this study was classified into 3 categories: Environmental data (rainfall & temperature), Satellite Imagery (i.e. Landsat images to derive land use/land cover maps) & field survey data (zebra & cattle occurrence data). A summary of the data products used is shown in

Rainfall and Temperature were adopted from Worldclim database, based on Kigen et al. (2003) work [

Since the datasets had different spatial resolutions, all the variables were standardized to a resolution of 1 km using different resampling strategies. For

Indicator | Data products | Source | Resolution |
---|---|---|---|

Rainfall | Worldclim | Worldclim | 1 km |

Temperature | Worldclim | Worldclim | 1 km |

Vegetation health | NDVI | Landsat | 30 m |

Land use/Land cover (LULC) | Classification | Landsat | 30 m |

Cattle occurrence data | Point data | Kenya wildlife service (KWS) | 546 points |

Grevy’s zebra occurrence | Point data | KWS | 66 points |

Plains zebra occurrence | Point data | KWS | 594 points |

Population | Population raster | AfriPOP | 100 m |

NDVI, an average window was used; for LULC, a majority window was used to depict the dominant land cover; and for population, a summation window was used to aggregate the total population within a 1 km grid.

A summarized workflow is shown in

Testing or validation is required to assess the predictive performance of the model. Ideally an independent data set should be used for testing the model performance. However, in many cases this will not be available, a situation particularly prevalent in threatened and endangered species. Therefore, the most commonly used approach is to partition the data randomly into ‘training’ and “test” sets, thus creating quasi-independent data for model testing [

Variable selection is first done using multi-collinearity analysis on bioclimatic variables, and then jack-knife tests on the remaining variables. This was done so as to reduce model over-parameterization. Habitat distribution modeling was then undertaken based on the selected variables. What followed was an accuracy assessment that was done using the Receiver Operating Characteristics. If the result is acceptable, then niche similarity indices are computed. Detailed explanations of the analysis done are given in the subsequent sections.

Cattle occurrence data was first interpolated using Inverse Distance Weighting (IDW) in order to develop a cattle occurrence raster map. IDW uses the measured values surrounding the prediction location to predict a value for the unmeasured location. IDW assumes that each measured point has a local influence that diminishes with distance. IDW makes the assumption that the value at the

unsampled location is the weighted average of known values within the neighborhood [

R ^ p = ∑ i = 1 N w i R i where w i = d i − α ∑ i = 1 N d i − α (1)

where R ^ p means the unknown cattle data; R i means the cattle estimate at known locations; N means the amount of known cattle locations; w i means the weighting of each cattle location; d i means the distance from each known cattle location to the unknown site; α means the power, and is also a control parameter, generally assumed as two [

Multi-collinearity refers to the pair-wise correlation among predictor variables based on the Pearson correlation coefficients. It affects the approximations of regression coefficients and induces bias responses between outputs and predictor variables [

Pearson correlation coefficients measure the strength and direction of the linear relationship between two variables, describing the direction and degree to which one variable is linearly related to another. It assumes that the variables are well approximated by normal distribution, and their joint distribution is bivariate normal [

r p r s = ∑ ( Y m − i − Y ¯ m ) ( Y e s t − i − Y ¯ e s t ) ∑ ( Y m − i − Y ¯ m ) 2 ∑ ( Y e s t − i − Y ¯ e s t ) 2 (2)

where Y m − i is the value of the measured inhibitory activity for compound i (i = 1, 2, ×××, 67) Y m is the average of the measured inhibitory activity, Y e s t − 1 is the value of the estimated inhibitory activity for compound i, and Y e s t is the average of the estimated inhibitory activity.

The Pearson correlation coefficient can take values from −1 to +1. A value of +1 shows that the variables are perfectly linear related by an increasing relationship; a value of −1 shows that the variables are perfectly linear related by a decreasing relationship; and a value of 0 shows that the variables are not linearly related by each other. There is a strong correlation between variables if the correlation coefficient is greater than 0.8 and a weak correlation if the correlation coefficient is less than 0.5 [

Given multiple variables where the relationships between pairs of variables are many, a correlation coefficient matrix or a correlation ellipse can be developed. A correlation coefficient matrix is a matrix denoting the numerical pairwise correlation coefficient of each variable, organized in a matrix. A correlation ellipse plot is a visual plot that encodes the correlation coefficient into an ellipse, each having a different color code based on the strength of the correlation. The direction of the correlation (i.e. whether direct or inverse relationship) is also shown based on the direction of the semi-major axis [

Jack-knife is an approach that excludes one variable at a time when running the model. In so doing, it provides information on the performance of each variable in the model in terms of how important each variable is at explaining the species distribution and how much unique information each variable provides. This can point out highly correlated variables, thereby allowing the user to determine if percent contribution values are likely to be skewed due to these correlations [

A model is created each time a variable is omitted from the model run in turn. Another model is also created using each variable alone. At the same time a model with all the variables in that Species Distribution Model (SDM) is also created. The Area under the Curve (AUC) of each model is recorded and all the values plotted together in the Jackknife. The Jack-knife tests thus shows the output of the AUC of the model with the following: All the variables; without one variable; and with only one individual variable in isolation. Comparing the 3 values gives an indication of the importance of each variable in predicting the species [

Species Distribution Models (SDMs) have been employed widely in generating habitat distribution maps. Maxent is one of the Species Distribution Models that has been applied widely in habitat distribution modelling. Maxent was chosen for this study because it is freely available, and has been proven to be one of the best performing modelling even with a relatively small number of samples [

Maxent uses the principle of maximum entropy. Entropy is the measure of uncertainty associated with a random variable. The greater the entropy, the greater the uncertainty. Adhering to these concepts, Maxent utilizes presence-only points of occurrence, avoiding absence data and evading assumptions on the range of a given species [

gain = 1 m ∑ i = 1 M z ( x i ) λ − log ∑ i = 1 M Q ( x i ) e x ( x i ) λ − ∑ j = 1 J | λ j | ∗ β ∗ s 2 [ z j ] / M (3)

where the first term denotes sum of predicted values at presence locations, second term denotes sum of predicted values at background locations, and the third term an overfitting penalty, β is a regularization coefficient, and s 2 [ Z j ] is the variance of feature j at presence locations.

Maxent gives output data in the following formats: raw; cumulative; and logistic. The logistic format is recommended given that it provides estimates of the probability of occurrence as predicted by the included environmental variables [

The Area under Curve (AUC) curve is a widely used statistical technique for assessing accuracy of predictive models. An AUC plot is obtained by plotting the fraction of correctly classified cases on the y axis (sensitivity or true positive rate) against the fraction of wrongly classified cases (1-specificity or false positive rate) for all possible thresholds on the x axis at different threshold [

It is important to note that Maxent computes a variation of AUC, based on the problem of classifying presence vs. background points (which may not be true absences) [

Conservation of species, both flora and fauna, has effects in both the ecological and evolutionary aspects. This can be assessed using SDMs. However, intra and inter-species interactions in ecological and geographical space is not as straight-forward, and methodologies for conducting niche similarity are not as developed as SDMs. Niche similarity is based on a test which asks whether SDMs from sister species predict one another’s known occurrence better than expected under the null hypothesis (that they provide absolutely no information about another’s range) [

Schoener (1968) statistic for niche overlap, ranges from 0 (niche models have no overlap) to 1 (niche models identical) [

D ( p X , p Y ) = 1 − 1 2 ∑ i | p X , i − p Y , i | (4)

This metric is simple, has a long history of use, and permits direct comparison to traditional measures of niche similarity that focus on microhabitat and/or diet [

Another statistic used is based on Hellinger distances [

H ( p X , p Y ) = ∑ i ( p X , i − p Y , i ) 2 (5)

Hellinger distances lie between 0 and 2. The similarity metrics have previously been used in ecological studies, primarily in comparing community composition across sites [

I ( p X , p Y ) = 1 − 1 2 H ( p X , p Y ) (6)

which ranges from 0 (no overlap) to 1 (niche models identical).

In order to reduce the potential of model over-parameterization, multi-collinearity analysis was conducted on the 19 bioclimatic variables. Variables with an intercorrelation higher than 0.8 (r > 0.8) were eliminated. To enable identify bioclimatic variables that have an inter-correlation higher than the set threshold, a correlation ellipse plot was developed, with the red dots identifying variables with a high level of pairwise correlation i.e. r > 0.8, hence one of the variables was eliminated. 10 of the 19 bioclimatic variables were found to be least correlated (below the set threshold). The correlation ellipse plot was developed using R programming for all the bioclimatic variables under consideration (See

Jack knife tests were conducted after correlation, using 10 of the 19 bioclimatic variables (which were least correlated) together with the following data sets: land use/land cover; population; NDVI; and cattle occurrence raster. When the test is run, variables contributing least to the model fit (<2%) were removed. The remaining variables were then used for habitat distribution modelling for both Grevy’s and Plains zebra.

distribution modelling, while NDVI, Bio18, Bio14, Bio3, LULC, and Bio13 were eliminated for Grevy’s distribution modelling.

The map shows that the core presence of both species is mainly in the middle section of the county, stretching from North to South. It indicates both species are located in the open woodlands and away from human habitats, though they face competition from cattle grazing. The concentration on the north is common to both species as these areas have a large concentration of wildlife conservancies and game reserves, hence human activity is regulated.

The habitat prediction maps were assessed based on area under curve (AUC) from a Receiver Operating Curve (ROC) plot. AUC > 0.75 was the threshold set for acceptance, as set out by Lobo et al. [

For Plains zebra, the AUC for training and test data was 0.82 and 0.79 respectively, while for Grevy’s zebra was 0.9 and 0.77 respectively. The jagged curves for Grevy’s zebra are as a result of few points available (66 compared to 594 for Plains zebra).

Schoener’s D statistic and Warren’s I statistic were both computed. Both showed significant levels of niche overlap as both I , D ≥ 0.7 ( I = 0.782 & D = 0.706 ) ,

indicating that the sister zebra species are identical based on their occupied niche environments. This suggests that similar conversation strategies can be adopted for both species.

This study conducts habitat distribution modelling and niche overlap, which is key for identifying if conservation efforts used for one species can be largely adopted in the conservation of another, especially endangered species. By considering bioclimatic variables, cattle data, land use and population metrics; a habitat distribution map of both Plains and Grevy’s zebra was developed; indicating concentration of both species around the centre of the county; running along a north-south direction to varying extents. This was validated using AUC plots, which indicated a training accuracy of 0.82 and 0.9 for Plains and Grevy’s zebra respectively. Niche similarity tests were then undertaken and found significant niche similarity based on the two metrics for the two species (i.e. I = 0.782 & D = 0.706 ). This indicates that Grevy’s zebra largely share the same ecological environment with the Plains zebra, which is significant in an effort to boost the endangered species numbers.

The author wishes to thank Kenya Wildlife Service for their assistance and cooperation while executing this study.

Mwangi, T.S., Waithaka, H. and Boitt, M. (2018) Ecological Niche Modeling of Zebra Species within Laikipia County, Kenya. Journal of Geoscience and Environment Protection, 6, 264-276. https://doi.org/10.4236/gep.2018.64016