Background: Misinformation on interactive Knowledge Exchange Social Websites (KESWs) is concerning since it can influence Internet users’ health behaviors, especially during an infectious disease outbreak. Objective: The present study seeks to examine the accuracy and characteristics of health information posted to a Knowledge Exchange Social Website (KESW). Methods: A sample of 204 answers to Ebola questions were extracted and rated for accuracy. Multiple logistic regression modeling was used to examine whether answer characteristics (best answer, professional background, statistical information, source disclosed, link, and word count) predicted accuracy. Results: Overall, only 27.0% of the posted answers were rated as “accurate”. Accuracy varied across question topics with between 11.8% - 45.5% of answers being rated as accurate. When Yahoo Answers’ “best answers” were examined, the overall accuracy was substantially higher, with 80.0% of “best answers” being rated as accurate compared to 16.0% of all other answers. Conclusion: There is need for tools to help Internet users navigate health information posted on these dynamic user-generated knowledge exchange social websites.
The popularity of the Internet as a discreet, readily available source of health information is evidenced by data showing that up to 75% of adults in the U.S. report having used a search engine to look up health information [
Despite the popularity of online health resources, past research has shown that both the quality [
The presence of poor, incomplete, or misleading information is troubling given the low levels of electronic health literacy (eHealth Literacy) reported among Internet users [
The impact of misinformation online is at least twofold.
First, the mere presence of such information can influence Internet users’ search and browsing habits. For instance, confirmation biases, in which people tend to seek out information that confirms their preexisting beliefs [
Second, health information has also been shown to actively shape various users’ health attitudes and behaviors [
Previous research has examined the quality and accuracy of health information posted to professional, static websites [
Ascertaining the veracity of webpages’ health information becomes more challenging on interactive websites where Internet users play a role in generating the web pages’ content. Whereas the overall quality and accuracy of a static website’s content, which is developed by a single person or single team, can often be assessed in a single pass, such assessments become more challenging with user- generated content, where high-quality health information can be presented alongside incomplete information, emotionally powerful personal anecdotes, misinformation attributable to users’ misperceptions, and even intentionally misleading misinformation.
While research has examined the overall quality of information on some interactive health-focused websites such as the message boards used in online support groups [
Accurate and timely online information is particularly important during an outbreak of a (re)emerging infectious disease. Slow dissemination of information through official channels and confusing or conflicting messages in the media generate high levels of panic in the general public and drive them to seek answers on the internet [
The present study seeks to address some of the knowledge gap on the accuracy of health information posted to KESWs by examining the types of Ebola questions being posted on a popular KESW and rating the accuracy of the anonymous users’ answers to these questions. In addition, the relationship between answer characteristics, such as inclusion of links to references, and answers’ accuracy was examined in order to determine whether answer characteristics could be used to identify higher quality answers.
The decision was made to focus the study on a single KESW. Of the KESWs reviewed, Yahoo Answers was selected due to the interface’s ease of searching and retrieving questions and answers as well as for its reach; in 2016 Yahoo was ranked as the third most popular multi-platform web property in the United States with 206 million unique visitors in a single month (https://www.statista.com/statistics/271412/most-visited-us-web-properties-based-on-number-of-visitors/).
On March 25, 2015, a total of 23 posts with the keyword “ebola” were extracted from Yahoo Answers for analysis (see
resulting in a dataset of 18 posts. Upon further review, several of the 18 posts contained multiple questions. Each question within the posts was examined independently, yielding a total of 35 distinct questions about Ebola. A total of 204 answers were offered to these 35 questions. Each question had between 2 to 11 answers offered, with the average number of answers posted per question being 5.83 (SD = 3.24).
In addition to questions and answers, six accompanying data points were extracted from each answer:
1) Best Answer: Since March 2014, the person who posted their question(s) on Yahoo Answers gets to mark one of the answers provided as the Best Answer. All sets of answers had a Best Answer marked.
2) Professional Background: This variable captured whether or not each answerer indicated that their answer was based on their professional background in the health sciences (ex: answerer indicated that they were a nurse with 10 years of experience with infectious diseases).
3) Statistical Information: This variable captured whether or not each answer included the use of statistics.
4) Source Disclosure: This variable captured whether or not each answer contained a disclosure that the information presented came from an external source, as it was discovered that many answers contained unmodified copied and pasted content from other websites.
5) Link: This variable captured whether the answer contained a link to an external website for additional information.
6) Word Count: A count of the words used in each answer.
In order to evaluate the accuracy of each posted answer, answers were coded into one of five categories:
1) Accurate: Accurate answers contained no factual errors and addressed the question that was asked.
2) Inaccurate: Inaccurate answers contained one or more factual errors. Note that, given the severe consequences of misinformation on infectious diseases, it was decided to rate answers as inaccurate even if the answer contained accurate information as well as inaccurate information.
3) Subjective: Subjective answers included any response whose accuracy could not be rated, such as statements of opinion.
4) Unanswered: Unanswered answers represented responses that did not address the question that was asked.
5) Trolling: Upon working with the data, it became clear that a fifth category was needed in order to capture responses that not only didn’t answer the question asked, but which also took on the characteristics of online trolling, which Merriam-Webster defines as “to antagonize (others) online by deliberately posting inflammatory, irrelevant, or offensive comments or other disruptive content” (https://www.merriam-webster.com/dictionary/troll).
The accuracy of all answers was assessed independently by two of the authors. The authors then examined each other’s ratings and discussed the answers they disagreed upon. A physician was available as the tiebreaker in case the authors could not agree upon an answer’s accuracy rating after discussion, though all disagreements were resolved with discussion between the authors without need for the physician’s intervention.
All data were analyzed with SPSS version 24.0 (IBM, 2016) [
A thematic analysis was conducted in order to establish a codebook of the types of questions being asked about Ebola on Yahoo Answers [
Simple descriptive statistics (frequency and valid percent) and histograms were employed to examine the types of Ebola questions being asked, the accuracy of answers to these questions, and the role of answers voted “best answer” by the KESW user who posted each question.
Multiple logistic regression modeling was used to examine whether answer characteristics (best answer, professional background, statistical information, source disclosed, link, and word count) predict accuracy (re-coded to a dichotomous accurate vs. inaccurate). Answers that fundamentally failed to address the question asked (i.e. subjective, trolling, or unanswered) were excluded from the logistic regression model, as readers looking for an answer to a health question could reasonably be expected to disregard these answers. As there were no a priori predictions regarding which variables would emerge as significant predictors of answers’ accuracy, five of the six predictors were force entered into the final logistic regression model. The sixth predictor, professional background, was ultimately removed from the model, as only three answers came from respondents citing a professional background, which precluded meaningful analysis of this variable.
A total of seven themes were identified during the thematic analysis of types of Ebola questions posted to Yahoo Answers.
The topics of Yahoo Answers visitors’ questions showed significant heterogeneity, with each of the question categories capturing only between 4.9% -
27.5% of the question totals (see
Overall, only 27.0% of the posted answers were rated as “accurate” (i.e. answering the question asked and containing no factual errors; see
Logistic regression modeling found that the overall model with all five predictors together served as a statistically significant predictor of answers’ accuracy (χ2(5) = 25.08, p < 0.001; Nagelkerke R2 = 0.37). Examining the individual predictors revealed only a single statistically significant predictor of accurate answers (see
Overall, the accuracy of Ebola information posted to Yahoo Answers was quite low, with less than half of all answers providing fully accurate information. More troubling, the questions that would be most relevant during an infectious disease outbreak, namely transmission, symptoms, and treatment, were each answered accurately less than a third of the time. In light of Internet users’ low electronic health literacy [
The finding that people who posted questions on the KESW later selected “best answers” that were 21 times more likely to be accurately answered helps to allay some of these concerns raised about visitors’ eHealth literacy. In aggregate, it seems like KESWs users were able, to some degree, to discern accurate information from the various responses given. In fact, 80.0% of the answers voted “best answer” were accurate while only 2.9% of these answers were categorically inaccurate. That said there remain significant unknowns. For instance, while the users who posted the questions in this sample tended to select accurate “best answers”, it is unclear whether and how the demography of question posters might differ from users who only passively read through others’ questions and answers. In addition, it is unclear to what degree users rely on the best answers or instead read through multiple answers, possibly looking for the answer that most closely matches their preexisting beliefs or perceptions.
In addition, the observation that nearly a quarter of the responses represented attempts to troll the question poster speaks to the communities utilizing KESWs. Unlike health forums and support groups established to address a single health problem, KESWs appear to draw a more diverse population of Internet users, including Internet trolls. Anecdotally, several of the most egregious, inflammatory statements were attributable to a small number of repeat offenders whose inflammatory comments appeared under several questions. At the same time, only three of the answers provided came from users who indicated a relevant professional background.
Several limitations should be considered when examining these results. First, in the absence of further data, it is worth noting that the culture of KESW users may differ widely from Website to Website, limiting the generalizability of these findings. Further research is needed to explore not only how KESW users differ across different sites such as Yahoo Answers versus Reddit, but also how the culture of users differs across different health topics. For instance, the participation of vociferous groups like the anti-vaxxer community could radically change the distribution of accurate to inaccurate posted answers on topics like childhood vaccination recommendations. Likewise, it seems plausible that trolling may be more prevalent in posts related to topics being popularized by the media. Media coverage of infectious disease outbreaks may serve to draw trolls to posts related to those diseases.
Another limitation of this study is the treatment of accuracy as a dichotomous variable. The coding of websites’ accuracy has varied from study to study, with some evaluating the proportion of content that is accurate rather treating the content as either accurate or inaccurate. In this study, content with any misinformation at all was coded inaccurate, not only because of the potential harmful impact of any misinformation during an infectious disease outbreak, but also because misinformation surrounded by accurate information may be particularly insidious and difficult to detect. That said, examination of the ratio of accurate to inaccurate information within each KESW answer might be illuminating.
In addition, due the high ratio of answers to questions, although 204 answers were available to code, only 23 posts with 35 total questions were examined. This raises the possibility that other types of questions are being asked about Ebola on KESWs, or that ratio of question topics being addressed may differ from those presented here. These data nonetheless take the first steps towards filling the knowledge gap on KESW answers’ accuracy, and future replication research will help to verify the types of questions being asked.
Ultimately, these data highlight the risks posed by seeking health information related to emerging infectious disease online through KESWs. Although those posting questions selected “best answers” that were often accurate, too little is known about the browsing habits of other KESW users. The presence of frequent misinformation among the posted responses and high volume of unhelpful information (unanswered, subjective, or trolling responses), suggest that these sites may pose special risks to users with low health literacy or medical misperceptions. In the context of Ebola, this misinformation could translate into challenges to outbreak containment, opposition to proper quarantine procedures, or social stigmatization of patients.
Further research is needed in order to explore the landscape of different KESWs and health topics, though these preliminary results raise concerns. If these patterns of inaccurate information hold true in other contexts, it may be necessary to provide users with tools to help them ascertain the veracity of user- generated claims, work directly with KESW providers to develop quality control mechanisms on their websites, and direct practitioners’ attention to these sites both to drive further research as well as to prepare practitioners to work with populations using these sites as a source of medical information.
Gorman, F., Yadegarians, D., Islam, T., Tongco, S., Johnston, E., Estrada, E. and Gorman, N. (2017) Accuracy of Ebola Information in a Knowledge Exchange Social Website (KESW). Open Journal of Preventive Medicine, 7, 210-223. https://doi.org/10.4236/ojpm.2017.710017