Heckmann, Tobias ; Gegg, Katharina ; Gegg, Andreas ; Becht, Michael:
Sample size matters: investigating the effect of sample size on a logistic regression debris flow susceptibility model.
In: Natural Hazards and Earth System Sciences. Bd. 14 (2014) Heft 2. - S. 259-278.
ISSN 1561-8633 ; 1684-9981
Link zum Volltext (externe URL): http://www.nat-hazards-earth-syst-sci.net/14/259/2...
Predictive spatial modelling is an important task in natural hazard assessment and regionalisation of geomorphic processes or landforms. Logistic regression is a multivariate statistical approach frequently used in predictive modelling; it can be conducted stepwise in order to select from a number of candidate independent variables those that lead to the best model. In our case study on a debris flow susceptibility model, we investigate the sensitivity of model selection and quality to different sample sizes in light of the following problem: on the one hand, a sample has to be large enough to cover the variability of geofactors within the study area, and to yield stable results; on the other hand, the sample must not be too large, because a large sample is likely to violate the assumption of independent observations due to spatial autocorrelation. Using stepwise model selection with 1000 random samples for a number of sample sizes between n = 50 and n = 5000, we investigate the inclusion and exclusion of geofactors and the diversity of the resulting models as a function of sample size; the multiplicity of different models is assessed using numerical indices borrowed from information theory and biodiversity research. Model diversity decreases with increasing sample size and reaches either a local minimum or a plateau; even larger sample sizes do not further reduce it, and approach the upper limit of sample size given, in this study, by the autocorrelation range of the spatial datasets. In this way, an optimised sample size can be derived from an exploratory analysis. Model uncertainty due to sampling and model selection, and its predictive ability, are explored statistically and spatially through the example of 100 models estimated in one study area and validated in a neighbouring area: depending on the study area and on sample size, the predicted probabilities for debris flow release differed, on average, by 7 to 23 percentage points. In view of these results, we argue that researchers applying model selection should explore the behaviour of the model selection for different sample sizes, and that consensus models created from a number of random samples should be given preference over models relying on a single sample.
|Zusätzliche Informationen:||Dieses Paper wurde zur Publikation in NHESS eingereicht und zunächst in NHESSD publiziert (zweistufiger Publikationsprozess). Nach dem peer reviewing und evtl. Revisionen erscheint es in NHESS.|
|Institutionen der Universität:||Mathematisch-Geographische Fakultät > Mathematik > Lehrstuhl für Mathematik - Statistik und Stochastik|
Mathematisch-Geographische Fakultät > Geographie > Lehrstuhl für Physische Geographie
|Titel an der KU entstanden:||Ja|
|Eingestellt am:||24. Jun 2013 09:28|
|Letzte Änderung:||10. Jun 2016 11:53|
|URL zu dieser Anzeige:||http://edoc.ku-eichstaett.de/13333/|