Nonparametric predictive inference (NPI) has been developed for a range of data types, and for a variety of applications and problems in statistics. In this thesis, further theory will be developed on NPI for multiple future observations, with attention to order statistics. The present thesis consists of three main, related contributions. First, new probabilistic theory is presented on NPI for future order statistics; additionally a range of novel statistical inferences using this new theory is discussed. Secondly, NPI for reproducibility is developed by considering two statistical tests based on order statistics. Thirdly, robustness of NPI is introduced which involves a first adaptation of some of the robustness theory concepts within the NPI setting, namely sensitivity curve and breakdown point. In this thesis, we present NPI for future order statistics. Given data consisting of n real-valued observations, m future observations are considered and predictive probabilities are presented for the r-th ordered future observation. In addition, joint and conditional probabilities for events involving multiple future order statistics are presented. We further present the use of such predictive probabilities for order statistics in statistical inference, in particular considering pairwise and multiple comparisons based on future order statistics of two or more independent groups of data.This new theory enables us to develop NPI for the reproducibility of statistical hypothesis tests based on order statistics. Reproducibility of statistical hypothesis tests is an important issue in applied statistics: if the test were repeated, would the same conclusion be reached that is rejection or non-rejection of the null hypothesis? NPI provides a natural framework for such inferences, as its explicitly predictive nature fits well with the core problem formulation of a repeat of the test in the future. For inference on reproducibility of statistical tests, NPI provides lower and upper reproducibility probabilities (RP). The NPI-RP method is presented for two basic tests using order statistics, namely a test for a specific value for a population quantile and a precedence test for comparison of data from two populations, as typically used for experiments involving lifetime data if one wishes to conclude before all observations are available. As every statistical inference has underlying assumptions about models and specific methods used, one important field in statistics is the study of robustness of inferences. The concept of robust inference is usually aimed at development of inference methods which are not too sensitive to data contamination or to deviations from model assumptions. In this thesis we use it in a slightly narrower sense. For our aims, robustness indicates insensitivity to small changes in the data, as our predictive probabilities for order statistics and statistical inferences involving future observations depend upon the given observations. We introduce some concepts for assessing the robustness of statistical procedures in the NPI framework, namely sensitivity curve and breakdown point. The classical breakdown point does not apply to our context as the predictive inferences are bounded, so we change the definition to suit our context. Most of our nonparametric inferences have a reasonably good robustness with regard to small changes in the data. Traditionally, in the robustness literature there has been quite a lot of emphasis and discussion on robustness properties of estimators for the location parameters. Thus, in our investigation of NPI robustness we also focus on differences in robustness of the mean and the median of the m future observations, and see how they relate to the classical concepts of robustness of the median and mean.

Nonparametric predictive inference (NPI) is a statistical approach with strong frequentist properties, with inferences explicitly in terms of one or more future observations. NPI is based on relatively few modelling assumptions, enabled by the use of lower and upper probabilities to quantify uncertainty. While NPI has been developed for a range of data types, and for a variety of applications, thus far it has not been developed for multivariate data. This thesis presents the first study in this direction. Restricting attention to bivariate data, a novel approach is presented which combines NPI for the marginals with copulas for representing the dependence between the two variables. It turns out that, by using a discretization of the copula, this combined method leads to relatively easy computations. The new method is introduced with use of an assumed parametric copula. The main idea is that NPI on the marginals provides a level of robustness which, for small to medium-sized data sets, allows some level of misspecification of the copula. As parametric copulas have restrictions with regard to the kind of dependency they can model, we also consider the use of nonparametric copulas in combination with NPI for the marginals. As an example application of our new method, we consider accuracy of diagnostic tests with bivariate outcomes, where the weighted combination of both variables can lead to better diagnostic results than the use of either of the variables alone. The results of simulation studies are presented to provide initial insights into the performance of the new methods presented in this thesis, and examples using data from the literature are used to illustrate applications of the methods. As this is the first research into developing NPI-based methods for multivariate data, there are many related research opportunities and challenges, which we briefly discuss.

This thesis presents the use of signatures within nonparametric predictive inference (NPI) for the failure time of a coherent system with a single type of components, given failure times of tested components that are exchangeable with those in the system. NPI is based on few modelling assumptions and here leads to lower and upper survival functions. We also illustrate comparison of reliability of two systems, by directly considering the random failure times of the systems. This includes explicit consideration of the difference between failure times of two systems. In this method we assume that the signature is precisely known. In addition, we show how bounds for these lower and upper survival functions can be derived based on limited information about the system structure, which can reduce computational effort substantially for specific inferential questions. It is illustrated how one can base reliability inferences on a partially known signature, assuming that bounds for the probabilities in the signature are available. As a further step in the development of NPI, we present the use of survival signatures within NPI for the failure time of a coherent system which consists of different types of components. It is assumed that, for each type of component, additional components which are exchangeable with those in the system have been tested and their failure times are available. Throughout this thesis we assume that the system is coherent, we start with a system consisting of a single type of components, then we extend for a system consisting of different types of components.

This thesis investigates a new bootstrap method, this method is called Nonparametric Predictive Inference Bootstrap (NPI-B). Nonparametric predictive inference (NPI) is a frequentist statistics approach that makes few assumptions, enabled by using lower and upper probabilities to quantify uncertainty, and explicitly focuses on future observations. In the NPI-B method, we use a sample of n observations to create n + 1 intervals and draw one future value uniformly from one interval. Then this value is added to the data and the process is repeated, now with n + 1 observations. Repetition of this process leads to the NPI-B sample, which therefore is not taken from the actual sample, but consists of values in the whole range of possible observations, also going beyond the range of the actual sample. We explore NPI-B for data on finite intervals, real line and non negative observations, and compare it to other bootstrap methods via simulation studies which show that the NPI-B method works well as a prediction method. The NPI method is presented for the reproducibility probability (RP) of some nonparametric tests. Recently, there has been substantial interest in the reproducibility probability, where not only its estimation but also its actual definition and interpretation are not uniquely determined in the classical frequentist statistics framework. The explicitly predictive nature of NPI provides a natural formulation of inferences on RP. It is used to derive lower and upper bounds of RP values (known as the NPI-RP method) but if we consider large sample sizes, the computation of these bounds is difficult. We explore the NPI-B method to predict the RP values (they are called NPI-B-RP values) of some nonparametric tests. Reproducibility of tests is an important characteristic of the practical relevance of test outcomes.

This thesis provides a new method for statistical inference on system reliability on the basis of limited information resulting from component testing. This method is called Nonparametric Predictive Inference (NPI). We present NPI for system reliability, in particular NPI for k-out-of-m systems, and for systems that consist of multiple ki-out-of-mi subsystems in series configuration. The algorithm for optimal redundancy allocation, with additional components added to subsystems one at a time is presented. We also illustrate redundancy allocation for the same system in case the costs of additional components differ per subsystem. Then NPI is presented for system reliability in a similar setting, but with all subsystems consisting of the same single type of component. As a further step in the development of NPI for system reliability, where more general system structures can be considered, nonparametric predictive inference for reliability of voting systems with multiple component types is presented. We start with a single voting system with multiple component types, then we extend to a series configuration of voting subsystems with multiple component types. Throughout this thesis we assume information from tests of nt components of type t.

This thesis considers Nonparametric Predictive Inference (NPI) for ordinal data and accuracy of diagnostic tests. We introduce NPI for ordinal data, which are categorical data with an ordering of the categories. Such data occur in many application areas, for example medical and social studies. The method uses a latent variable representation of the observations and categories on the real line. Lower and upper probabilities for events involving the next observation are presented, with specific attention to comparison of multiple groups of ordinal data. We introduce NPI for accuracy of diagnostic tests with ordinal outcomes, with the inferences based on data for a disease group and a non-disease group. We introduce empirical and NPI lower and upper Receiver Operating Characteristic (ROC) curves and the corresponding areas under the curves. We discuss the use of the Youden index related to the NPI lower and upper ROC curves in order to determine the optimal cut-off point for the test. Finally, we present NPI for assessment of accuracy of diagnostic tests involving three groups of real-valued data. This is achieved by developing NPI lower and upper ROC surfaces and the corresponding volumes under these surfaces, and we also consider the choice of cut-off points for classifications based on such diagnostic tests.

This thesis presents new solutions for two acceptance decisions problems. First, we present methods for basic acceptance sampling for attributes, based on the nonparametric predictive inferential approach for Bernoulli data, which is extended for this application. We consider acceptance sampling based on destructive tests and on non-destructive tests. Attention is mostly restricted to single stage sampling, but extension to two-stage sampling is also considered and discussed. Secondly, sequential acceptance decision problems are considered with the aim to select one or more candidates from a group, with the candidates observed sequentially, either per individual or in subgroups, and with the ordering of an individual compared to previous candidates and those in the same subgroup available. While, for given total group size, this problem can in principle be solved by dynamic programming, the computational effort required makes this not feasible for problems once the number of candidates to be selected, and the total group size are not small. We present a new heuristic approach to such problems, based on the principles of nonparametric predictive inference, and we study its performance via simulations. The approach is very flexible and computationally straightforward, and has advantages over alternative heuristic rules that have been suggested in the literature.

This thesis presents Nonparametric Predictive Inference (NPI) for several multiple comparisons problems. We introduce NPI for comparison of multiple groups of data including right-censored observations. Different right-censoring schemes discussed are early termination of an experiment, progressive censoring and competing risks. Several selection events of interest are considered including selecting the best group, the subset of best groups, and the subset including the best group. The proposed methods use lower and upper probabilities for some events of interest formulated in terms of the next future observation per group. For each of these problems the required assumptions are Hill’s assumption A(n) and the generalized assumption rc-A(n) for right-censored data. Attention is also given to the situation where only a part of the data range is considered relevant for the inference, where in addition the numbers of observations to the left and to the right of this range are known. Throughout this thesis, our methods are illustrated and discussed via examples with data from the literature.

In probability and statistics, uncertainty is usually quantified using single-valued probabilities satisfying Kolmogorov's axioms. Generalisation of classical probability theory leads to various less restrictive representations of uncertainty which are collectively referred to as imprecise probability. Several approaches to statistical inference using imprecise probability have been suggested, one of which is nonparametric predictive inference (NPI). The multinomial NPI model was recently proposed, which quantifies uncertainty in terms of lower and upper probabilities. It has several advantages, one being the facility to handle multinomial data sets with unknown numbers of possible outcomes. The model gives inferences about a single future observation. This thesis comprises new theoretical developments and applications of the multinomial NPI model. The model is applied to selection problems, for which multiple future observations are also considered. This is the first time inferences about multiple future observations have been presented for the multinomial NPI model. Applications of NPI to classification are also considered and a method is presented for building classification trees using the maximum entropy distribution consistent with the multinomial NPI model. Two algorithms, one approximate and one exact, are proposed for finding this distribution. Finally, a new NPI model is developed for the case of multinomial data with subcategories and several properties of this model are proven.

In recent years, we have seen a diverse range of crises and controversies concerning food safety, animal health and environmental risks including foot and mouth disease, dioxins in seafood, GM crops and more recently the safety of Irish pork. This has led to the recognition that the handling of uncertainty in risk assessments needs to be more rigorous and transparent. This would mean that decision makers and the public could be better informed on the limitations of scientific advice. The expression of the uncertainty may be qualitative or quantitative but it must be well documented. Various approaches to quantifying uncertainty exist, but none are yet generally accepted amongst mathematicians, statisticians, natural scientists and regulatory authorities. In this thesis we discuss the current risk assessment guidelines which describe the deterministic methods that are mainly used for risk assessments. However, probabilistic methods have many advantages, and we review some probabilistic methods that have been proposed for risk assessment. We then develop our own methods to overcome some problems with the current methods. We consider including various uncertainties and looking at robustness to the prior distribution for Bayesian methods. We compare nonparametric methods with parametric methods and we combine a nonparametric method with a Bayesian method to investigate the effect of using different assumptions for different random quantities in a model. These new methods provide alternatives for risk analysts to use in the future.

This thesis considers nonparametric predictive inference for lifetime data that include right-censored observations. The assumption A(n) proposed by Hill in 1968 providers a partially specified predictive distribution for a future observation given post observations. But it does not allow right-censored data among the observations. Although Berliner and Hill in 1988 presented a related nonparametric method for dealing with right-censored data baased on A(n), they replaced ‘exact censoring information’ (ECI) by ‘partial censoring information’ (PCI), enabling inference on the basis of A(n). We address if ECI can be used via a generalization of A(n). We solve this problem by presenting a new assumption ‘right-censoring A(n)’ (rc-A(n)), which generalizes A(n). The assumption rc-A(n) presents a partially specified predictive distribution for a future observation, given the past observations including right-censored data, and allows the use of ECI. Based on rc-A(n), we derive nonparametric predicative inference (NPI) for a future observation, which can also applied to a variety of predictive problems formulated in terms of the future observation. As applications of NPI, we discuss grouped data and comparison of two of lifetime data, which are problems occurring frequently in reliability and survival analysis.

Copyright © npi-statistics.com 2017