The tremendous growth of social media and consumer-generated content on the Internet has inspired the development of the so-called big data analytics to understand and solve real-life problems. However, while a handful of studies have employed new data sources to tackle important research problems in hospitality, there has not been a systematic application of big data analytic techniques in these studies. This study aims to explore and demonstrate the utility of big data analytics to better understand important hospitality issues, namely the relationship between hotel guest experience and satisfaction. Specifically, this stu dy applies a text analytical approach to a large quantity of consumer reviews extracted from Expedia.com to deconstruct hotel guest experience and examine its association with satisfaction ratings. The findings reveal several dimensions of guest experience that carried varying weights and, more importantly, have novel, meaningful semantic compositions. The association between guest experience and satisfaction appears strong, suggesting that these two domains of consumer behavior are inherently connected. This study reveals that big data analytics can generate new insights into variables that have been extensively studied in existing hospitality literature. In addition, implications for theory and practice as well as directions for future research are discussed.
Social media and consumer-generated content on the Internet continue to grow and impact the hospitality industry [
The goal of this study is to explore and demonstrate the utility of big data analytics by using it to study core hospitality management variables that have been extensively studied in past decades. Specifically, hotel guest experience and satisfaction have long been a topic of interest because it is widely recognized that they contribute to customer loyalty, repeat purchases, favorable word-of-mouth, and ultimately higher profitability. Particularly, the hotel industry is highly competitive in that hotel firms offer essentially homogeneous products and services, which drive the desire of hotels to distinguish themselves among their competitors. As such, guest satisfaction has become one of the key measures of a hotel’s effectiveness in outperforming others. Since the 1970s a plethora of studies has been conducted with the aim to understand the components and antecedents of guest satisfaction [
While this line of research offers a variety of perspectives on guest satisfaction, the vast majority of existing studies primarily relied upon conventional research techniques such as consumer surveys or focus group interviews to gauge what leads to guest satisfaction. As such, whether we can develop novel and meaningful insights into these building blocks of hospitality management using big data analytics becomes an intriguing research question.
This study employed one of the most important types of consumer-generated content, i.e., online customer reviews of hotel properties, to understand hotel guest experience and its relation-ships with guest satisfaction. Text analytics was applied to first deconstruct a large quantity of customer reviews collected from Expedia.com and then examine its association with hotel satisfaction ratings. Thus, the analytics approach aimed to gain insights into the nature and structure of guest experience expressed when a customer gave a specific satisfaction rating for the hotel he/she has stayed in. This paper is organized as follows: following the introduction, the subsequent section reviews literature on the big data analytics approach and hotel guest experience and satisfaction. Research questions are formulated with the focus on using online customer reviews to enrich our understanding of these constructs. The methodology section details data collection and the text analytical approach utilized to answer the research questions. Findings are then presented and discussed. Finally, the study’s contributions to literature and practice as well as directions for future research are discussed.
Big data is being generated through many sources including Internet traffic (e.g., clickstreams), mobile transactions, user-generated content, and social media as well as purposefully captured content through sensor networks, business transactions, and many other operational domains such as bioinformatics, healthcare, and finance [
One of the application areas of growing importance is the so-called business intelligence in that big data analytics can be used to understand customers, competitors, market characteristics, products, business environment, impact of technologies, and strategic stakeholders such as alliance and suppliers. Many examples and cases have been cited to illustrate the applications of big data analytics to discover and solve business problems [
Due to the volume and unstructured nature of social media and consumer generated content, opinion mining and sentiment analysis, i.e., the so-called text analytics, plays an important role in big data analytics. Indeed, opinion mining and sentiment analysis is considered well-suited to various types of market intelligence applications [
Hotel guest satisfaction is a complex human experience within a hospitality service setting. The study of guest satisfaction was initiated as early as the 1970s. Different definitions of guest satisfaction have emerged. Hunt [
From the managerial point of view, it is, perhaps, more important to understand the components or antecedents of hotel guest satisfaction. For example, it has been conceptualized that the hotel product consists of several levels. That is, the core product, i.e., the hotel room, deals exactly with what the customer receives from the purchase. Besides, the hotel product also includes facilitating, supporting, and augmenting elements which concern with, for example, how the customer receives from the purchase, the interactions with service providers and other customers, as well as necessary conditions (e.g., the front desk) which provide access to the core product and numerous value-added products and services. The hotel product can also be represented as a set of attributes as suggested by Dolnicar and Otter [
Given the complexity of the guest experience, measuring and managing hotel guest satisfaction is a challenging task. In the hospitality industry research has shown that there is a gap between what managers believe is important and what guests say is important in the selection and evaluation of accommodation [
As suggested by Oh [
Online customer reviews have been widely considered one of the most influential types of consumer-generated content for understanding consumer behavior and consequently firm performance in hospitality and tourism [
1) What is the nature and underlying structure of the hotel guest experience represented in customer reviews?
2) Can hotel guest experience represented in customer reviews be used to explain guest satisfaction?
A large-scale text analytics study was conducted with the goal to understand hotel guest experience represented in online customer reviews and its association with satisfaction ratings based upon publicly available data in Expedia.com.
Expedia.com was chosen because it is the largest online travel agency in the world with more than 16.5 million monthly unique visitors (see www.advertising.expedia.com). Also, unlike other websites that host consumer reviews, Expedia requires reviewers to make at least one transaction through its website before being allowed to contribute a review to the website. This essentially prevents hospitality businesses or marketers to post inauthentic reviews. Usually after staying at the hotel property purchased through Expedia.com, the customer receives an email from the website soliciting feedback including ratings as well as his/her experience at the hotel.
Data were collected during the period of December 18-29, 2017 using an automated Web crawler. In a nutshell, the Web crawler visited Expedia.com and extracted customer reviews for all hotels listed by Expedia in Taiwan. The crawler collected data on 106 hotels resulting in 6027 customer reviews, which means each hotel on average had approximately sixty customer reviews. Once the data were collected, the extraction process identified all unique words contained in the text comments resulting in 6642 words from all customer reviews. This word bank, with frequencies ranging from words such as “hotel” (33,549) and “room” (22,213) to many words with a frequency of one, serves as the basis for understanding the domain of guest experience. A relational database was created using Microsoft Access with unique identifiers assigned to every hotel property, every customer review, and every unique word so that associations could be easily established for analytical purposes. For example, each hotel could be associated with a number of customer reviews which, in turn, were associated with multiple uniquely identified words. In total, this database contains about 1.3 million word-review pairs, which suggests that on average one customer review contains about 22 unique words (counting each word only once regardless of how many times it occurred in a specific review).
Data analysis followed a text analytics process which typically involves several steps including data pre-processing, domain identification/classification, and statistical association analysis. While statistical analysis aims to examine the associations between the identified domain-related words and the dependent variable (i.e., hotel guest satisfaction in this case), the first two steps, i.e., data pre-processing and domain identification, are critical for establishing content validity with the focus on extracting conceptually relevant linguistic entities (words) from the corpus. Typical textual data pre-processing involves a series of operations such as stemming (i.e., coding several forms of a linguistic entity into a ‘rudimentary’ form which represents the same meaning), misspelling identification, and identification and removal of stop words such as certain pronouns, adverbs, and conjunctions. Domain identification aims to classify guest experience-related words and non-guest experience-related ones. Normally, data pre-processing and domain identification are conducted in separate steps because they serve distinct purposes. However, since to our knowledge there was no readily available “dictionary” that describes hotel guest experience, these operations were conducted manually and simultaneously through an iterative process. Considering the sheer size of the word bank, this was a tedious and labor-intensive process. For example, there were a large number of variations for a word like “restaurant” with its different forms (e.g., single and plural) and many misspellings.
Word | N | N/Hotel | Word | N | N/Hotel | Word | N | N/Hotel | Word | N | N/Hotel |
---|---|---|---|---|---|---|---|---|---|---|---|
Room | 5641 | 10.7 | Downtown | 676 | 1.3 | Lobby | 357 | 0.7 | Experience | 240 | 0.5 |
Clean | 3104 | 5.9 | Airport | 620 | 1.2 | Internet | 344 | 0.7 | Suite | 236 | 0.4 |
Staff | 2898 | 5.5 | Desk | 609 | 1.2 | Trip | 328 | 0.6 | Money | 233 | 0.4 |
Location | 2865 | 5.4 | View | 569 | 1.1 | Pay | 320 | 0.6 | Carpet | 233 | 0.4 |
Comfortable | 2168 | 4.1 | Recommend | 532 | 1.0 | Door | 317 | 0.6 | Courteous | 233 | 0.4 |
Service | 1707 | 3.2 | Noise | 493 | 0.9 | Shops | 316 | 0.6 | City | 231 | 0.4 |
Friendly | 1614 | 3.1 | Quiet | 486 | 0.9 | Sleep | 303 | 0.6 | Expensive | 223 | 0.4 |
Close | 1594 | 3.0 | Food | 468 | 0.9 | Business | 301 | 0.6 | Dirty | 221 | 0.4 |
Breakfast | 1524 | 2.9 | Distance | 464 | 0.9 | Complaint | 299 | 0.6 | Renovated | 219 | 0.4 |
Helpful | 1378 | 2.6 | Shuttle | 447 | 0.8 | Shower | 296 | 0.6 | Tub | 217 | 0.4 |
Bed | 1334 | 2.5 | Street | 429 | 0.8 | Family | 294 | 0.6 | Safe | 216 | 0.4 |
Price | 1321 | 2.5 | Shopping | 419 | 0.8 | Value | 290 | 0.5 | Far | 214 | 0.4 |
Restaurants | 1153 | 2.2 | Maintained | 417 | 0.8 | Cheap | 288 | 0.5 | Air | 213 | 0.4 |
Walking | 1011 | 1.9 | Beach | 398 | 0.8 | Smelled | 284 | 0.5 | Refrigerator | 205 | 0.4 |
Area | 863 | 1.6 | Access | 398 | 0.8 | Kids | 258 | 0.5 | Quality | 203 | 0.4 |
Parking | 802 | 1.5 | Park | 385 | 0.7 | Tv | 256 | 0.5 | Decor | 201 | 0.4 |
Bathroom | 764 | 1.4 | Floor | 373 | 0.7 | Attractions | 248 | 0.5 | Wait | 200 | 0.4 |
Pool | 716 | 1.4 | Check in | 369 | 0.7 | Water | 247 | 0.5 | Freeway | 198 | 0.4 |
Free | 712 | 1.3 | Spacious | 365 | 0.7 | Coffee | 244 | 0.5 | Elevator | 196 | 0.4 |
Convenient | 708 | 1.3 | Bar | 358 | 0.7 | Amenities | 244 | 0.5 | Accommodation | 114 | 0.2 |
“clean”, “comfortable”, “maintained”, “safe”, “smelled”, “value”, and “cheap”; 7) travel context such as “business” and travel party such as “family”, “kids”, and “husband”; and, 8) possible actions such as “recommend”. Compared with the coding schema, this list does not reflect certain aspects of guest experience such as stay at the hotel due to word-of-mouth (recommendations), the departure stage (checkout) of service encounters, affective evaluation of the experience, as well as other possible actions after the stay, etc.
The frequency distribution of these 80 words is highly skewed, in that the top 12 words constitute more than half, and the top 25 words nearly 70%, of the total frequency of all words. This distribution can be characterized as one with a “head”, i.e., word with relatively high frequencies, and a “long tail”, i.e., those with low frequencies (with an average frequency per hotel of less than 1 starting from the 26th word). The “head” words center around the core and basic products/services as well as important attributes such as the guest room, cleanliness, staff, location, comfort, service, friendliness and helpfulness of staff, breakfast, bed, and price, etc. The “long tail” words reflect other important areas of guest experience. Generally speaking, most of these words are functional and objective, while a handful of them represent guests’ subjective evaluation of their hotel experience. It is interesting to note that words denoting travel party (“family” in this case), food-related aspects such as breakfast, restaurants, bar, and even coffee, and activities guests can do outside of the hotel property such as shopping and visit to the beach, are also relevant to guest experience. Overall these 80 words reflect a diverse array of amenities, attributes, and service encounters shaped by hotel guests’ unique expectations and evaluations at the aggregate level.
Factor analysis was employed in order to examine the underlying semantic structure and further reduce the number of words from the data matrix into meaningful groupings of words that would be easier to interpret. As can be seen in
Words (N = 34) | Factor loadings | |||||
---|---|---|---|---|---|---|
F1 | F2 | F3 | F4 | F5 | F6 | |
Hybrid | ||||||
Clean (5.9)a | 0.436 | |||||
Smelled (0.5) | 0.423 | |||||
Dirty (0.4) | 0.395 | |||||
Price (2.5) | 0.369 | |||||
Cheap (0.5) | 0.354 | |||||
Carpet (0.4) | 0.349 | |||||
Sleep (0.6) | 0.323 | |||||
Expensive (0.4) | −0.313 | |||||
Shopping (0.8) | −0.326 | |||||
View (1.1) | −0.377 | |||||
Restaurants (2.2) | −0.387 | |||||
Distance (0.9) | −0.459 | |||||
Location (5.4) | −0.492 | |||||
Walking (1.9) | −0.496 | |||||
Deals | ||||||
Breakfast (2.9) | 0.517 | |||||
Airport (1.2) | 0.433 | |||||
Free (1.3) | 0.435 | |||||
Comfortable (4.1) | 0.409 | |||||
Shuttle (0.8) | 0.393 | |||||
Amenities | ||||||
Close (3.0) | 0.390 | |||||
Beach (0.8) | −0.366 | |||||
Pool (1.4) | −0.533 | |||||
Family friendliness | ||||||
Family (0.6) | 0.509 | |||||
Kids (0.5) | 0.483 | |||||
Attractions (0.5) | 0.338 | |||||
Suite (0.4) | 0.313 | |||||
Service (3.2) | −0.338 | |||||
Core product | ||||||
Room (10.7) | 0.552 | |||||
Bathroom (1.4) | 0.420 | |||||
Bed (2.5) | 0.322 | |||||
Spacious (0.7) | 0.302 |
Staff | ||||||
---|---|---|---|---|---|---|
Helpful (2.6) | −0.462 | |||||
Friendly (3.1) | −0.511 | |||||
Staff (50.5) | −0.517 | |||||
Eigenvalue | 4.55 | 3.65 | 3.07 | 2.66 | 2.30 | 2.05 |
Cumulative variance | 5.69% | 10.26% | 14.09% | 17.41% | 20.28% | 22.84% |
aIndicating average number of times this word occurred in a hotel’s customer reviews (based upon
perception of product (“cheap”). The second group of words, including “expensive”, “shopping”, “view”, “restaurants”, “distance”, “location”, and “walking”, seems to represent the experiential aspects of the hotel stay, particularly in words such as “shopping”, “restaurant”, “location”, “walking”, and “view”. What is revealing is that these two groups of words have the opposite signs in their loadings: loadings in the first group are all positive while in the second group all negative. This suggests that, in the semantic space that represents hotel guest experience, these two groups of words belong to two very different contexts of meaning. That is, when a consumer mentions the words in the first group, he/she is unlikely to use words in the second group to describe the experience. Behaviorally speaking, it seems the maintenance-related aspects are “blocking” the experiential aspects of the hotel stay in the guest’s mental model. In other words, the presence of any maintenance factors associated with “smelled”, “dirty”, “price”, “cheap”, “carpet”, and “sleep” may not add much to satisfaction but their absence will certainly detract from satisfaction.
The other five factors are quite straightforward to interpret. Factor 2 was named “Deals” apparently because the word “free” occurred with “breakfast”, “airport”, and “shuttle”. The third factor “Amenities” consists of only three words, with “beach” and “pool” having a negative sign suggesting that when customers mention the word “close”, it is unlikely referring to “beach” and “pool”. This implies that these two words tend to have a negative connotation when customers talk about convenience and access to amenities. The fourth factor, i.e., “Family Friendliness”, seems to suggest that, when customers share their story about staying at a hotel with their family members, their experience is likely to be linked with the need for a large room (“suite”) or attractions they want to visit. It is unlikely for them to talk or care about the hotel service. The fifth factor reflects the core product of a hotel, i.e., the guest room, bed, and bathroom. It is interesting to note the word “spacious” is used within this context. Lastly, the sixth factor represents customers’ perception of hotel staff with words such as “helpful” and “friendly”. All three words have negative loadings on this factor, suggesting that, in general, there is a negative connotation to the context wherein customers mentioned their experience with hotel staff.
Overall these factors captured the salient aspects of hotel guest experience in that most of the primary words with high frequencies in customer reviews generated relatively high loadings on these factors. Some long tail word such as “shopping”, “distance”, “beach”, “spacious”, “sleep”, “family”, “kids”, “smelled”, “attractions”, “suite”, and “expensive”, also contributed to these factors. While some factors such as travel party (i.e., family in this case) seemed to be highly relevant to guest experience, other factors traditionally considered important such as front desk services, did not have significant impact on the semantic space representing hotel guest experience based on the customer reviews.
Model | Unstandardized coefficients | Standardized coefficients | t | Sig. | |
---|---|---|---|---|---|
B | Std. error | Beta | |||
(Constant) | 4.023 | 0.013 | 298.410 | 0.000 | |
Hybrid | −0.293 | 0.013 | −0.576 | −21.714 | 0.000 |
Deals | 0.258 | 0.013 | 0.506 | 19.086 | 0.000 |
Amenities | −0.015 | 0.013 | −0.029 | −1.076 | 0.282 |
Family friendliness | 0.076 | 0.013 | 0.149 | 5.606 | 0.000 |
Core Product | 0.063 | 0.013 | 0.123 | 4.641 | 0.000 |
Staff | 0.044 | 0.013 | 0.086 | 3.242 | 0.001 |
Dependent variable: average customer rating; Adjusted R square: 0.629.
NOT being mentioned in the context of those words. Staff-related words are negatively loaded on to the factor Staff suggesting a high satisfaction rating is not likely associated with the mentions of words such as “helpful” and “friendly”.
Hotel guest experience and satisfaction have been extensively studied in the hospitality management literature. Guest experience is, undoubtedly, an extremely complex construct. Depending upon the research design and methods researchers could get very different pictures of what constitutes guest experience and what actually leads to guest satisfaction. Since conventional methods usually rely on a set of predefined hypotheses, justified using previous and existing body of knowledge, the attempts are made in the direction of either confirming or disconfirming such hypotheses. However, this is not the case with big data analytics. Through the analytical process we as researchers let the data reveal patterns reflective of consumers’ reliving and evaluation of their actual experiences with products (hotels in this case). Then, we attempted to make sense and attach meaning to the inferences by bringing appropriate theories to shed light on and explain revealed/novel patterns from large data. Different from conventional methods, this way of explaining the findings is part of epistemology of generating and creating knowledge using big data [
Compared to other text analytics approaches such as sentiment analysis, which generally aim to capture the subjective opinions of online consumers about certain products [
The dictionary identified for hotel guest experience reflects what consumers think are relevant and important that contribute to their (dis)satisfaction with a specific hotel [
In a similar way, a guest who stays with family members seems to be not interested in the service aspect of the hotel other than a spacious room and attractions nearby. This indicates that, within the consumer’s complex mental model about the hotel experience, there are structures of “domains” that are mutually exclusive, or that one serves as the necessary condition for another. This also points to the fact that because of the tangible aspects of maintenance factors, hotels should provide and develop appropriate service amenities and features, and maintain them at the performance level that is expected to be in place. Another important insight is the identification of the Family Friendliness factor, which shows that what the guest brings into the experience, i.e., the travel party, can be an important contributing factor to their satisfaction. In addition, some of the “long tail” words in all of these dimensions show that the underlying semantic structures in customer reviews could be more conceptually relevant than simply words with high frequencies [
Although guest satisfaction is not measured in the traditional sense, the association between satisfaction rating and guest experience appears to be strong. According to Lewis [
The author declares no conflicts of interest regarding the publication of this paper.
Ko, C.-H. (2018) Exploring Big Data Applied in the Hotel Guest Experience. Open Access Library Journal, 5: e4877. https://doi.org/10.4236/oalib.1104877