Nowadays, internet-based surveys are increasingly used for data collection, because their usage is simple and cheap. Also they give fast access to a large group of respondents. There are many factors affecting internet surveys, such as measurement, survey design and sampling selection bias. The sampling has an important place in selection bias in internet survey. In terms of sample selection, the type of access to internet surveys has several limitations. There are internet surveys based on restricted access and on voluntary participation, and these are characterized by their implementation according to the type of survey. It can be used probability and non-probability sampling, both of which may lead to biased estimates. There are different ways to correct for selection biases; poststratification or weighting class adjustments, raking or rim weighting, generalized regression modeling and propensity score adjustments. This paper aims to describe methodological problems about selection bias issues and to give a review in internet surveys. Also the objective of this study is to show the effect of various correction techniques for reducing selection bias.
In the last decades, the internet survey has become a popular tool of data collection. Because internet surveys have several advantages compared to more traditional surveys with personal interviews, telephone interviews, or mail surveys [
1) Now that so many people are connected to the Internet. In the world, the number of internet users is 3,366,261,156 people. Also we can see internet users in the world by regions in
users are located in Asia and Europe in this figure. The number of internet users in the world is increasing with each passing day. The greater the number of people using the internet, the number of people who responded to the online survey will also increase. An internet survey is a simple means to get access to a large group of potential respondents.
2) Questionnaires can be distributed at very low costs. No interviewers are needed, and there are no mailing and printing costs.
3) Surveys can be launched very quickly. Little time is lost between the moment the questionnaire is ready and the start of the fieldwork. Thus, internet surveys are a fast, cheap and attractive means of collecting large amounts of data. When we search internet survey studies according to years in
Internet questionnaires are applied by the interaction of the internet site and the participant, [
Types of Internet Surveys: Internet surveys based on restricted access, and internet surveys based on voluntary participation will be examined [
1) Internet surveys based on restricted access:
E-Mail Surveys: E-mail surveys will be executed on the basis of probability samples which are obtained from a list frame of available e-mail addresses, assuming that is the frame population. There are also relatedcoverage bias issues.
Internet Surveys by E-Mail Invitation: It is also based on a probability sample using the same list frame of available e-mail addresses. Same coverage biases exist for this. For the probability based samples, in addition to the above stated coverage error problems, there will also be nonresponse issues and related adjustments.
2) Internet surveys based on voluntary participation:
Free Access to Internet Surveys: In this case, any respondent can have access to a internet questionnaire on the site, without any restriction.
In terms of population representation, the collected information will have several problems. In this case, the population frame will be undefined, ill defined, or partially defined [
Surveys estimates will never be exactly equal to the population characteristics they intend to estimate. There is always some error. It has been described possible causes in literature like Kish (1967), Bethlehem (1999). It is shown general survey error in
nonsampling error. Nonsampling follows coverage error, nonresponse error and measurement error. These errors should be examined under separate headings.
Internet surveys contain survey error. Although they are very popular nowadays internet survey has several advantages, but this type surveys are also prone to many survey errors. It is useful to evaluate the types of internet surveys currently available in terms of the traditional measures of quality and sources of errors in surveys. While internet surveys generally are significantly less expensive than other modes of data collection, and are quicker to conduct, there are serious concerns raised about errors of non-observation or selection bias. Inference in internet surveys involves three key aspects: sampling, coverage, and nonresponse [
The sampling process for explaining the selection issues are given in the following section.
The key challenge for sampling in internet surveys is that the mode does not have an associated sampling method. For example, telephone surveys are often based on random-digit dialling (RDD) sampling, which generates a sample of telephone numbers without the necessity of a complete frame. On the other hand, similar stra- tegies are not possible for the Web surveys.
While e-mail addresses are relatively fixed (like telephone numbers or street addresses), internet use is a behavior (rather than status) that does not require an e-mail address. Thus, the population of “internet users” is dynamic and is difficult to define. Furthermore, the goal is often to make inference to the full population, not just the Internet users.
Internet surveys appear in many different forms like in
What difference does it make if a sample consists of self-selected volunteers rather than a probability sample from the target population? The key statistical consequence is bias. Unadjusted means or proportions from non-probability samples are likely to be biased estimates of the corresponding population means or proportions. There are a number of different ways researchers attempt to correct for selection biases, both for probability-based and non-probability online surveys. Weighting adjustment techniques may help to reduce selection bias. Weighting adjustment is based on the use of auxiliary information. Auxiliary information is defined here as a set of variables that have been measured in the survey, and for which the distribution in the population is available. The bias will be large if:
1) The relationship between the target variable and the response behavior is strong;
2) The variation in the response probabilities is large;
3) The average response probability is low.
Nonprobability Methods | Probability-Based Methods |
---|---|
Polls as entertainment | Intercept surveys |
Unrestricted self-selected surveys | List-based samples |
Volunteer option panels | Web option in mixed mode |
Surveys using “Harvested” email lists | Pre-recruited panels |
Pre-recruited panels of full population |
There can be several reasons to carry some kind of weighting adjustment on the response to a web survey: The sample is selected with unequal probability sampling. Nonresponse may cause estimators of population characteristics to be biased. If the target population is wider than the internet population, people without internet can never be selected for the survey. If the sample is selected by means of self-selection, the true selection probabilities are unknown, assuming equal selection probabilities leads to biased estimates. Weighting adjustment techniques may help to reduce a bias [
The weighting techniques described in the following sections can reduce the nonresponse bias provided that, proper auxiliary information is available. The three reasons for weighting described above apply to any survey, whatever the mode of data collection will be. There are two more reasons for weighting that are particularly important for many Web surveys. These reasons are the under-coverage and self-selecting.
In general there are four weighting methods for adjustments which are given in
Poststratification or weighting class adjustments is an estimation technique that attempts to make the sample representative after the data has been collected. It is the simplest and most commonly used methodology. Poststratification that has been used to adjust for the sampling and coverage problems in Web surveys and is known variously as ratio adjustment, post-stratification, or cell weighting. Raking adjusts the sample weights so that sample totals line up with external population figures, but the adjustment aligns the sample to the marginal totals for the auxiliary variables, not to the cell totals.
GREG weighting is an alternative method of benchmarking sample estimates to the corresponding population figures. Another popular adjustment method is PSA or propensity weighting [
Poststratification, generalized regression estimation, and raking ratio estimation can be effective bias reduction techniques provided auxiliary variables are available that have a strong correlation with the target variables of the survey. If such variables cannot be used because their population distribution is not available, one might consider estimating these population distributions in a different survey, a so-called reference survey. This reference survey must be based on a probability sample, where data collection takes place with a mode different from the web, e.g., CAPI [
Another possible solution for correcting the bias from selection problems is using response propensities. The response propensity is the conditional probability that a person responds to the survey request, given the available background characteristics. To compute response propensities, auxiliary information for all sample elements is needed. In particular, response propensity weighting and stratification are proposed as correction techniques.
The response propensities can be used in a direct way for estimation of the target variables directly by using the response propensities as weights. This is called response propensity weighting. The direct approach attempts to estimate the true selection probabilities by multiplying the first-order inclusion probabilities with the estimated response propensities. Bias reductions will only be successful if the available auxiliary variables are capable of explaining the response behavior. The response propensities also can be used indirectly, by forming strata of elements having the same response propensities. This is called response propensity stratification. The final estimates rely less heavily on the accuracy of the model for the response propensities.
In internet surveys, selecting a proper probability sample requires a sampling frame containing the e-mail addresses of all individuals in the population. Such sampling frames rarely exist. Actually, general-population sampling frames do not contain information about which people have internet access and which do not. Thus, one should bear in mind that people not having internet access will not respond to a internet questionnaire.
Weighting Adjustment Methods |
---|
Poststratification or weighting class adjustments |
Raking or rim weigthing |
Generalized regression (GREG) modeling |
Propensity score adjustment (PSA) |
Pre-recruited panels of full population |
Moreover, people having internet access will also not always participate. Taking these facts into account, it is evident that the ultimate group of respondents is the result of a selection process (mostly self-selected) with unknown selection probabilities.
Some studies have shown that, response propensity matching combined with response propensity stratification is a promising strategy for the adjustment of the self-selection bias in Web surveys. Research is ongoing to implement further improvements for response propensity weighting. PSA is a frequently adopted solution to improve the representativity of web panels. It should be noted that there is no guarantee that correction techniques are successful [
(a) a volunteer panel survey sample (sw) with nw units each with a base weight of
When the base weights are equal for all units or are not available, one way use an alternative adjustment factor as follows [
Aşan & Ayhan (2013) [
The raking formulation can be given as
where, the row adjustment will be
where, the column adjustment will be
The sum and proportion of gender (i = 1, 2) and age groups (
The application of this work consists of a first stage based on a web survey by an e-mail invitation and a second stage based on a voluntary participation internet survey. The methodology is also proposed for the estimation and allocation of the population frame characteristics of adult internet users by gender and age groups. The proposed alternative methodologies is a beneficial tool for internet survey users [
Several of these methods are closely related to one another. For example, post-stratification, in turn, is a special case of GREG weighting. All of the methods involve adjusting the weights assigned for the survey participants to make the sample line up more closely with population figures. A final consideration differentiating the four approaches is that propensity models can only incorporate variables that are available for both the internet survey sample and calibration sample [
When we examine the effectiveness of the adjustment methods in internet surveys, some example of the works of Steinmez, Tijdens & Pedraza (2009) [
Internet surveys already offer enormous potential for survey researchers, and this is likely only to improve with time. In spite of their popularity, the quality of Web surveys for scientific data collection is open to discussion [
The general conclusion is that when the internet survey is based on a probability sample, nonresponse bias and, to a lesser extent, coverage bias, can be reduced through judicious use of post-survey adjustment using appropriate auxiliary variables.
The challenge for the survey industry is to conduct research on the coverage, sampling, nonresponse, and measurement error properties of the various approaches to web-based data collection. There are no corresponding sampling methods for internet surveys. As a result of these sampling difficulties, many internet surveys use self-selected samples of volunteers rather than probability samples. When it is used nonprobability sampling for internet survey, it should be used adjustment procedure.
We need to learn when the restricted population of the Web does not matter, under which conditions low response rates on the Web may still yield useful information, or how to find ways to improve response rates to internet surveys.
Zerrin Asan Greenacre, (2016) The Importance of Selection Bias in Internet Surveys. Open Journal of Statistics,06,397-404. doi: 10.4236/ojs.2016.63035