Openwetware.org

GSMNP:Notebook/MaxEnt

2014-07-28

Motivation and Background:

←Older revision

Revision as of 01:46, 28 July 2014

Line 22:

Line 22:

===Motivation and Background===

===Motivation and Background===

-

Modeling of species distributions in parks holds many values for the scientific community, but for stewardship of park resources by the NPS, it is critical. Only having species occurrences as points is of limited usefulness to park managers, since they cannot infer what is between the points.

+

*
Modeling of species distributions in parks holds many values for the scientific community, but for stewardship of park resources by the NPS, it is critical. Only having species occurrences as points is of limited usefulness to park managers, since they cannot infer what is between the points.

-

+

*
Knowing with some probability where species are in large natural areas is essential to taking actions to protect them, including monitoring, stewardship of rare species, reacting to a species that is suddenly found to be at-risk, and modeling future scenarios that place species in jeopardy.

-

Knowing with some probability where species are in large natural areas is essential to taking actions to protect them, including monitoring, stewardship of rare species, reacting to a species that is suddenly found to be at-risk, and modeling future scenarios that place species in jeopardy.

+

*
Currently there are many threats to natural systems and native species at Great Smoky Mountains National Park. The biological complexity, interactive stressors and limited agency resources at the Smokies, make knowing where to take the most effective actions imperative.

-

+

*
Maxent is a method for generating predictive distributions given a set of occurrence data and known environmental variables at those locations.

-

Currently there are many threats to natural systems and native species at Great Smoky Mountains National Park. The biological complexity, interactive stressors and limited agency resources at the Smokies, make knowing where to take the most effective actions imperative.

+

**
This predicted distribution is constrained such that it is close to the empirical average of environmental variables at the occurrence locations.

-

+

**
Among all possible models that fulfill these constraints the model of maximum entropy is the model which fits only the minimum constraints (i.e. it avoids over-fitting by choosing the most unconstrained model possible given the constraints set by the environmental variables at presence locations).

-

Maxent is a method for generating predictive distributions given a set of occurrence data and known environmental variables at those locations. This predicted distribution is constrained such that it is close to the empirical average of environmental variables at the occurrence locations. Among all possible models that fulfill these constraints the model of maximum entropy is the model which fits only the minimum constraints (i.e. it avoids over-fitting by choosing the most unconstrained model possible given the constraints set by the environmental variables at presence locations).

+

*
Maxent has been used extensively is physics and economics applications. It is just one among many different options for generating species prediction distributions using environmental variables at species presence site ([http://www.nhm.ku.edu/desktopgarp/ GARP], [http://data.princeton.edu/R/glms.html GLM],
[http://cran.r-project.org/web/packages/gam/index.html
GAM
]
), but has several advantages. Taken from
[http://www.cs.princeton.edu/~schapire/papers/ecolmod.pdf
Phillips et al. (2006)
]
, maxent:

-

+

-

Maxent has been used extensively is physics and economics applications. It is just one among many different options for generating species prediction distributions using environmental variables at species presence site ([http://www.nhm.ku.edu/desktopgarp/ GARP], [http://data.princeton.edu/R/glms.html GLM], GAM), but has several advantages. Taken from Phillips et al. (2006), maxent:

+

#requires only presence data, not presence/absence data

#requires only presence data, not presence/absence data

-

#can use both continuous and categorical variables #the optimization is efficient,

+

#can use both continuous and categorical variables

-

#has a concise probabilistic definition, #it avoids over-fitting through regularization

+

#the optimization is efficient,

-

#can address sampling bias formally, #output is continuous (not just yes/no), and

+

#has a concise probabilistic definition,

-

# is generative rather than discriminative which makes it better for small sample sizes.

+

#it avoids over-fitting through regularization

-

+

#can address sampling bias formally,

-

There is some criticism against using Maxent for species distribution modelling. Specifically, Maxent considers only presence data instead of both presence and absence data. As a result, capture probabilities are not explicitly included in the model. This is nearly anathema in the field of Wildlife Biology where predictions based on mark-recapture studies have been the norm for years.

+

#output is continuous (not just yes/no), and

-

+

#is generative rather than discriminative which makes it better for small sample sizes.

-

There are at least 3 practical answers to this criticism:

+

*
There is some criticism against using Maxent for species distribution modelling. Specifically, Maxent considers only presence data instead of both presence and absence data. As a result, capture probabilities are not explicitly included in the model. This is nearly anathema in the field of Wildlife Biology where predictions based on mark-recapture studies have been the norm for years.

+

*
There are at least 3 practical answers to this criticism:

#The first is to be explicit about the prediction probabilities that maxent produces. Rather than modelling the probability of an occurrence, maxent models the probability that an occurrence at a given location is different from a randomly selected location. The difference from true occurrence prediction is subtle, and in many cases probably does not matter.

#The first is to be explicit about the prediction probabilities that maxent produces. Rather than modelling the probability of an occurrence, maxent models the probability that an occurrence at a given location is different from a randomly selected location. The difference from true occurrence prediction is subtle, and in many cases probably does not matter.

#Second, outside of animal studies, presence data, not presence/absence data or multiple observer data, is the norm. We know of no published data on plants where multiple observers were used to assess the observation probability of a species. Longitudinal studies are common, but they are not used in the same way that mark-recapture studies are used with animals.

#Second, outside of animal studies, presence data, not presence/absence data or multiple observer data, is the norm. We know of no published data on plants where multiple observers were used to assess the observation probability of a species. Longitudinal studies are common, but they are not used in the same way that mark-recapture studies are used with animals.