Open Access Library Journal
Vol.06 No.04(2019), Article ID:91957,16 pages
10.4236/oalib.1105330

Rate of Agreement between Database Users’ and Authors’ Keywords in SID and Magiran Databases and Its Effect on Information Retrieval

Fatemeh Motamedi1, Narges Khanjani2, Ali Talebian1, Samaneh Behzadifard3, Fahimeh Bakhtyari1*

1Medical Information Faculty Kerman University of Medical Sciences Kerman, Iran

2Deptartment of Epidemiology and Biostatistics, Kerman University of Medical Sciences, Kerman, Iran

3Nursing and Midwifery Faculty Kerman University of Medical Sciences, Khorramabad, Iran

Copyright © 2019 by author(s) and Open Access Library Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

http://creativecommons.org/licenses/by/4.0/

Received: March 12, 2019; Accepted: April 20, 2019; Published: April 23, 2019

ABSTRACT

Aim: This study examined the rate of correspondence between users’ keywords and Latin authors published in SID and Magiran databases and its effect on information retrieval. Method: This cross-sectional study was conducted on PhD students and residents, in which Latin shared articles indexed in database of SID and Magiran were selected by random sampling method. The abstracts of these articles were provided to each student, in order to determine the keyword for each of them. Data collection was performed by a checklist. Data analysis was performed by non-parametric and -parametric statistical analysis. Data was analyzed by SPSS 22. Results: The results showed 29.9 percent of the keywords were non-match, with the highest frequency of three keywords and 39.9 percent were the exact match, with the highest frequency of one keyword. There was a significant difference between the two databases, SID and Magiran in retrieving relevant results (P < 0.011). Conclusion: Because of the loss of relevant articles on one hand and retrieving irrelevant articles on the other and also the importance of retrieving the right information, appropriate policies should be developed for indexing and using controlled language in these databases.

Subject Areas:

Library, Intelligence and Philology

Keywords:

Information Retrieval, Compliance, Keywords, Searching, Relevance, SID, Magiran, Medical’ Users, Authors of Research Papers

1. Introduction

Academic users are guided in the content of bibliographic databases through matching their search keywords that match the keywords used by any retrieval system or database for indexing. The mismatch of searched keywords and indexed keywords in databases may result in retrieval failure in accessing information. This can hinder scientific production particularly within databases providing full-text scientific articles. The effectiveness and value of these databases are subject to provision of services and appropriate approaches for enabling users in searching and quick and easy access to journals [1]. The search ability of a full-text database requires indexing, the implementation of this is one of the main processes in development of such databases capable of guiding the researcher on the content of any scientific document [2].

Indexing serves as a means for information storage and retrieval, it is the practical way to deliver the content of the texts [3]. Indexing language is of utmost importance in indexing. According to Zhang, Indexing language is a set of index keywords used in an index for presentation of a subject or documents’ feature as well as the rules for combining or using these terms [4]. There are several languages for indexing texts including controlled free and natural language. In fact, the extraction and selection of terms used in indexing take place either through controlled vocabulary such as words in a thesaurus or natural language and free language (author’s language).

Searching in a retrieval system, whether using natural language or controlled vocabulary and semi-natural1 language, will yield relevant results only if a whole or part of the keyword searched by the user matches with whole or part of the indexed keyword in the information retrieval system [5]. However, since the searchers in databases use different keyword approaches to keywords [6], according to conducted research, the keywords given by the authors directly and indirectly match with descriptors provided by indexers, and some problems will arise in the absence of keyword match between authors and users [3] 2. In fact, one of the problems in information retrieval results from inappropriate keywording from authors and users. Authors tend to express their opinions with unique language or specific terms, while on the other hand users also use their own vocabulary to seek information. This matter can create numerous issues for users such as unsuccessful search and information overload [7] and ultimately failure in information retrieval. One of the main aspects of such assessment involves the success rate of information retrieval by users in bibliographic databases that provide indexing for scientific publications in prestigious scientific disciplines.

Several studies have been conducted in the field of compliance and matching of keywords searched by users of databases in English, some of which focused on controlled indexing language and thesauri. For instance, Nowkarizi and Dehghani (2010) intended to examine the consistency of keywords extracted from abstracts with indexing descriptors in the database of Iran’s Theses Abstracts [8]. This study was conducted through a checklist and using content analysis on a total of 100 theses between the years 1989 and 2006 in the areas of humanities, engineering and agriculture. The results showed there was a significant relationship between the number of keyword groups in theses abstracts and their matching with descriptors, as well as between the number of extracted keywords and the number of descriptors.

In a study, Naghaneh Esfahani et al. (2011) assessed the keyword matching of titles and abstracts of Persian and English theses with Persian medical thesaurus and Medical Subject Headings (MeSH). This research was analytical and the population comprised a total of 2942 master and doctoral theses, among which 340 were selected as sample. Data were collected through a self-made checklist. The results of this study showed there is significant relationship between matching of English and Persian Keywords with medical Persian thesaurus and MeSH keywords [9].

Morphy et al. (2003) carried out a research about the use of controlled vocabulary among writers, scholars and indexers on four medical databases in the field of alternative medicine, with the assumption that there was no consistency among the keywords used by them continuously. Research data included the frequency of MeSH terms, descriptors and keywords used by the authors in their article titles and abstract titles and abstracts that were collected according to the available standard methods within the Medline, MANTYZ (Manual, Alternative and Natural Therapy Index System), CINAHL (Cumulative Index to Nursing and Allied Health Literature) and Web of Science databases. After analysis of the data, the findings showed that writers and researchers never used many of the terms in the thesauri. Finally, this problem was solved using a standard terminology suggested by the authors and researchers to write keywords and abstracts [10].

A study by Gil-Leiva Alonso-Arroyo and (2007) focused on matching between keywords of scientific authors with descriptors assigned by the indexers. The study population consisted of 640 articles from Cab Abstract (Commonwealth Agricultural Bureaux), LISA (Library and Information Science Abstracts) and INSPEC databases. After data analysis, the results showed that 25 percent of keywords assigned by authors fully matched and 21% were relatively relevant with the descriptors. By calculating the relative and full matching with each other, there was a 46% concordance between the keywords and descriptors [3].

Kipp (2011) compared and evaluated the framework of online indexing from the point of view of three different groups: users, authors and professional indexers. Data required for this study were collected and analyzed through user tags, keywords from authors of scientific papers published in academic journals that were indexed in PubMed and descriptors selected by professional indexers for these papers. The results showed that although some tags from users and keywords from authors were matching with descriptors assigned by indexers, the other terms without matching had a broad impact on indexing vocabulary [11].

In this regard, the present study intended to examine the matching of keywords searched by medical students and keywords designated by authors in medical bibliographic databases, to evaluate the impact on information retrieval using the relevant measures. Since SID (Scientific Information Database) and Magiran databases have been identified as the most important Iranian accessible databases and each of them is a single database covering Scientific-research publications in several areas including medicine and related sciences. In addition, indexing in both of them is based on free language indexing (keywords allocated by the authors) [8]. The current comparative study has been conducted on these two databases. What distinguishes this study from previous studies is that, it seems like the results can be an effective step to improve the accessibility of information resources related to Users’ queries based on synchronization words. The main objective of this study was to determine the compliance of medical keywords searched by users with keywords assigned by authors in the SID and Magiran databases and its impact on information retrieval.

2. Methodology

This was a cross-sectional descriptive-analytical study that aimed to compare the matching keywords assigned by authors in their papers and keywords used by students (Ph.D and clinical residents) in their search, in terms of number, grammatical system and keyword matching. At the next stage, to examine the impact of exact, relative and negative keyword matching on information retrieval in SID and Magiran databases, the level of relevance was also assessed.

Using the census, all graduate students at PhD level and clinical residents who were studying at Kerman University of Medical Sciences, at the time of the survey, were considered for this study. They consisted of 138 PhD students and 287 clinical residents. Among them 83 clinical residents and 51 PhD students accepted to participate in this study.

The article selection was from the common indexed journals in the databases of SID and Magiran, which included 107 Latin and 92 Persian journals out of which 26 Persian journals and 32 Latin journals were selected randomly. Then, from the selected journals, preferably from their latest issues, the abstracts were selected randomly.

The article abstracts were used to evaluate the keyword matching of the authors and the students under the study, because abstracts play an important role in information retrieval as well as in improving the index. Moreover, the words of an abstract in new subjects were an integral part of the search purpose [12].

To obtain the common journals between SID and Magiran databases, a list of indexed medical journal titles was extracted from these two databases. The publishing language of some journals had been changed from Persian into Latin, but the previous issues in Persian were still available. Accordingly, the common issues between these two databases were 107 Latin publications and 92 Persian publications. In order to eliminate the impact of article subject on presenting keywords, the articles were separately selected based on subjects related to students’ fields of study. Meaning that, the titles of journals belonging to each of the two groups of students i.e. PhD students and clinical residents, were separated. Then the abstracts were extracted from the articles that were selected randomly out of those selected journals. These articles preferably, were extracted from the last issues of each journal.

Since any pilot study generally examines a variable 3 to 5 times, four articles including two Persian and two Latin articles were supplied equally for each of the study fields of medicine and were presented to the study population. The reason for equality in the number of articles in Persian and Latin is the same value of these two languages regarding the research objectives.

In general, 22 Latin abstracts and 18 Persian abstracts were extracted for clinical residents at 11 fields of study, and 18 Persian and Latin abstracts, totally were extracted for PhD students in 9 fields.

After collecting the questionnaires containing article abstracts, the keywords specified by the participant were entered into Excel software, in which the control checklist had already been pre-entered. According to the criteria in this checklist, the keywords from authors were compared with the keywords from subjects.

Finally, through assessing relevance of retrieved information from keywords by exact agreement and non-agreement, the researcher evaluated and compared the impact of keyword matching on information retrieval in the two databases of SID and Magiran. For this purpose, a number of subject specialists in each of the study fields of PhD students and Clinical Residents were asked to classify the retrieved results based on the three categories of full relevance, relative relevance and irrelevance.

Data Analysis Method

In the first stage, all the information for the keywords and population data were entered into SPSS 22 software. In the next step, nonparametric Mann-Whitney U test was used to compare the keywords from authors and users in keywords of the number and grammatical system as well as to compare the effect of the keyword matching on information retrieval in SID and Magiran databases. On the other hand, a t-test was used to compare the number of keywords and the grammatical system of the two user groups (PhD students and clinical residents) as well as to compare their keyword matching.

3. Results

Regarding the non-normality of the variables under study, nonparametric Mann-Whitney U test at a significance level of 0.05 percent was used to determine the keyword matching of authors and users in terms of the number and grammatical structure. According to the data in Table 1, there was a significant relationship between authors and users in terms of the number of keywords. In fact, in the 50th percentile, the authors of the articles either Persian or Latin, assigned 4 keywords to their papers, whereas users specified 3 keywords for each article abstract either in Persian or Latin.

The research findings also showed that there was a significant difference between authors and users in terms of using single keywords either in Persian or Latin abstracts. Moreover, 50% of authors used 4 single keywords in Persian articles and 3 single keywords in Latin articles, whereas 50% of the users assigned 4 single keywords to Persian article abstracts and 2 single keywords to Latin ones. There was no significant difference between authors and users in using plural Latin keywords (p < 0.113). On the contrary, a significant difference was found in using plural Persian keywords (p < 0.001). However, there was a significant difference between authors and users (p < 0.001) in the use of the simple and compound keywords both in Persian and Latin articles (Table 1).

The frequency, mean and standard deviation for the number of Latin and Persian keywords assigned by the Clinical Residents and PhD students in terms of three kinds of matching (exact, relative and lack of matching) can be seen in Table 2. As it is seen, the highest frequency of keywords in exact matching in Latin and Persian was 1 keyword, while in relative matching it was zero and in lack of matching it was 3 keywords both in Persian and Latin articles (Table 2).

Table 1. Keyword matching between authors and students under study in terms of the number and grammatical structure in SID and Magiran.

Table 2. Frequency of Latin keywords in terms of matching (exact, relative and lack of matching).

In order to compare the matching (exact, relative, a total of exact and relative matching, lack of matching), of the keywords assigned by PhD students and clinical residents with keywords assigned by authors, t-test was used (Table 3). As it can be seen in this table, there was no significant difference between the two groups in regard to matching of keywords of Persian abstracts. While, there was a significant difference between Ph.D. students and clinical residents (p < 0.032) in the Latin article abstracts. The mean and standard deviation of Latin keywords with lack of matching for PhD students was 1.94 ± 1.02 and for clinical residents was 3.14 ± 1.50, while the significant difference for the Persian keywords was 0.34 ± 0.91 and 3.14 ± 1.50, respectively (p < 0.001) (Table 3).

To compare the effects of matching keywords from authors of articles with keywords searched by users on information retrieval in SID and Magiran, nonparametric Mann-Whitney U test was used.

The data in Table 4 indicated that there was no significant difference in terms of full match in the Persian Keywords. Moreover, 50% of the articles were retrieved in both databases (p = 0.146). While there was a significant difference in Latin keywords in the two databases (p = 0.004). In retrieving highly relevant articles, there was a significant difference between Persian as well as Latin keywords. Thus, 50 percent of Latin keywords in SID were not retrieved while Magiran yielded at least one case (p = 0.011).

In retrieving Persian articles with relative relationship, there was no significant difference between the two databases (p = 0.48), while there was a significant difference (p = 0.001) in retrieval of Persian articles with relative relation. There was no significant difference in the retrieval of Latin irrelevant documents, i.e. 9 irrelevant documents in 25% of cases was retrieved in both databases, while in 50 percent of retrieved Persian articles it yielded 2 irrelevant articles in SID and 4 irrelevant articles in Magiran (Table 4).

According to the data presented in Table 5, there was no significant difference between the two databases in the number of retrieved results with mismatched keywords by users. There was a significant difference in the retrieval of relevant articles, so that in 25 percent of Latin retrieved documents from SID, there were no relevant documents, while there was at least one relevant document retrieved from Magiran. As for retrieving documents with relative relevance and irrelevance, there was no significant difference between the two databases (Table 5).

Table 3. Comparing the level of matching (exact, relative, a total of exact and relative and lack of matching) keywords of PhD students and clinical residents with keywords assigned by authors.

Table 4. Comparing the effects of exact match of keywords used by authors with keywords searched by clinical residents and PhD students on information retrieval in SID and Magiran databases.

Table 5. Comparing the effects of mismatching for keywords from article users on information retrieval in each of the SID and Magiran databases.

4. Discussion

According to the findings, 50% of users assigned 3 keywords for articles in Persian and Latin, while, 50 percent of authors for Latin and Persian articles assigned 4 keywords to their own articles. And in this respect, the results were consistent with those obtained by Heckner, Mühlbacher et al. (2008) in that the number of keywords used by users was two-thirds of the keywords assigned by the authors, i.e. the average number of author-assigned keywords was 6 vs the average number of user keywords that was 2 [13]. In this respect, the findings was consistent with the results of the study conducted by Nowkarizi and Dehghani (2011), in which there was a significant difference between the descriptors extracted by the authors and the number of keywords designated by the users [8].

This difference may be because the author of an article is quite fluent and expert in the subject, while the user, based on his/her own opinion, may assign fewer number of keywords to a given content material. This topic requires further investigation in the field of “relevance” and is not in the scope of the present discussion.

As for use of single keywords in Persian and Latin articles, there was a significant difference between the authors and users. According to the data obtained from the findings, authors used single keywords twice as much as users did. According to the previous findings, perhaps this difference is mainly because the number of keywords designated by users is lower than those by authors. Moreover, the authors of the Persian articles used single keywords more frequently than the authors of the Latin articles. The findings of this study confirms the results obtained by Nowkarizi and Dehghan who argued that the single keywords tended to be used for indexing in Persian language is more than in English language.

In contrast, in terms of the use of plural keywords there was no significant difference between users and authors. This indicated that using plural keywords by both Persian and Latin authors in their articles was less than using singular keywords. Nowkarizi and Dehghan in their research concluded that the system descriptors (including 93.2% for singular and 6.8% for plural words) matched the user keywords (including 78.65% for singular and 21.35% for plural words). This matter was confirmed by the findings of this study, i.e. the use of the singular keywords was more common than the use of the plural ones. It seems that there is a kind of heterogeneity in terms of the way of understanding the syntax and using the singular and plural keywords from the viewpoint of the users and authors.

The findings also showed that there was a significant difference between authors and users in terms of using simple keywords (p < 0.001). So that the authors of both Persian and English articles in the 50th percentile used one simple keyword (users for English articles applied only one keyword and for Persian articles did not use any simple keyword). However, the use of this type of keywords was less common. That is because using a simple keyword by users as well as authors may have lower significance and conceptual understanding in a database that in dealing with a huge volume of articles.

The findings of this study indicated that the users of Persian article were reluctant to use simple keywords for a search. Perhaps the reason for this is due to the difference between the existing semantic structure in each of the Persian and English languages. Given the significant difference between authors and users in applying compound keywords, the frequency of using these types of keyword was more than the other indices. Furthermore, these findings indicated that the use of compound keywords by Persian authors was one and a half times more in English articles, and the use of compound keywords by users in Persian articles was twice as much in English articles.

According to the research that was conducted by Esfahani et al. (2010), the use of compound and simple or single-word combinations in Persian and Latin keywords was more popular and the structure of the keywords tended to be more towards compound and simple or single-word combinations. Hence, the results of their study are consistent with what obtained in this study. In controlled vocabulary such as the Medical Subject Headings (MeSH) and etc. this type of grammatical structure was more frequently used. This may be that it entails a more semantic load. Nowkarizi and Dehghani (2011) concluded that there was an almost identical ratio between descriptors of compound words (26.50) and single-words (74.49) [8]. These results were consistent with results of the present study in terms of using the same indicators mentioned.

According to the findings on matching the Latin keywords assigned by the authors and the two user groups, only one keyword was found as the highest frequency in the case of exact matching vs. three keywords as the highest frequency in the case of mismatching. Accordingly, most keywords determined by the two groups of users were mismatching. Based on these findings it seems that users failed to access the indexed documents in the databases, due to the incompatibility between their searched keywords and author keywords, as indexing descriptors. On the other hand, this finding implies that there is more exact matching in the keywords from Persian authors than those from English authors. Also, in the case of mismatching, the highest frequency belonged to keywords from English authors. It can be concluded that matching in Persian articles was more ideal than Latin articles.

The mismatching of keywords searched by the users and the keywords assigned by the authors, which was high can have several reasons including that it may be the author using a plural keyword while the user applies the same keyword in a single form, or instead of a word like “Hepatitis C Virus” that has been used by the author, he/she has picked its acronym form “HCV” for searching. This matter was more common among the clinical residents.

The findings of Heckner (2008) showed that the keywords used in plural form by the authors had been used in singular form by the users [13]. Due to this discrepancy, more appropriate approaches including using the natural language and controlled vocabulary simultaneously by the databases in order to make more compliance between the searched keywords and the indexing language is recommended. Moreover, the average matching level in PhD students in terms of exact matching was more than one keyword, while it was less than one keyword for clinical residents. As previously mentioned, this difference shows that the PhD students were probably more proficient in the given subject, hence their search could be more successful.

However, according to the previous studies, familiarity with the subject matter does not necessarily result in remarkable improvement in matching between the user keywords and system indexing language [14]. That is because the users of the two groups may have more mastered their own scientific fields; however, their technical vocabulary is limited and hence fails to perform a successful search to access their desired information in a retrieval system.

Nowkarizi and Dehghan’s (2012) results are consistent with the current study because there was a significant difference between the mean of keywords in the field of basic sciences and the other fields, except the technical and engineering field. It seems that more specific and various keywords are selected in basic sciences compared to the other areas [8]. Similarly in this study, PhD students due to the level and nature of their education had a wider vocabulary and used more general terms as compared to clinical residents. While clinical residents had less vocabulary, but used more specific terms (such as “HIV”) than the system language. Using abbreviations may be due to their limited time to search through databases. On the other hand, since such keywords in the retrieval system are of little use, the user fails to retrieve his/her intended document, while it has been indexed in the database.

Moreover, the findings of this study based upon the mismatching of users’ queries with the system language confirmed the results of a research conducted by Salaba (2005) and Carlisle (1989), i.e. a large proportion of users’ searched queries or keywords are non-compatible with technical terms [14] [15].

At the next stage, the two groups of users were compared in terms of matching level, in order to find out that the difference between the Persian and Latin keywords belongs to which group users in this study. The average number of mismatching keywords among PhD students was lower than clinical residents. This finding indicates that lower exact matching in clinical residents generally led to lower matching between the authors and users. As noted earlier, clinical residents tend to use more specific keywords in their searches that was hardly consistent with the system language (that can be keywords by authors). Hence, they become confused in searching the retrieval information system, and finally fail to retrieve the desired documents.

Since evaluating the first 10 results from searching the Latin keywords of the authors and the users showed the exact matching, there was also a significant difference in terms of number in SID and Magiran databases (p < 0.004), i.e. the number of retrieved results and also the number of relevant retrieved results in Magiran was more than SID. While there was no significant difference between the two databases in terms of the number of irrelevant retrieved results and in both databases, searches with 75% of keywords yielded 9 irrelevant documents.

Hence, to the researcher’s surprise, as mentioned earlier, despite the fact that indexing in the two databases is based on keywords by authors, search by exact keywords used by authors had little impact on the retrieval of relevant documents. In both databases in the 50th percentile, neither article were retrieved in Persian nor in Latin; even in some cases, the intended article failed to be retrieved by the author keywords. SID database has a keyword search field, i.e. if the user decides to search the database through any of the author’s keywords, there is a possibility to retrieve his/her desired document. According to the findings, it seems if the indexing methods in these two databases improve, more relevant information will be retrieved.

However, according to the results the indexing method that is now used in SID and Magiran has been challenged and it is recommended that a standard method be used in indexing. Despite the significant difference between the two databases for retrieving relevant results, the number of these types of results is too few and one cannot declare with certainty about the better performance of Magiran compared with SID. It is due to the fact that both these databases follow the same indexing method. This difference may be due to poor function of the SID database. There were no studies proving or rejecting this finding.

In our study retrieval of relevant results from searching with Latin keywords was in a more ideal situation than Persian keywords. This matter perhaps arises from the fact that if some authors decide to publish their papers in Latin publications that are indexed in databases such as PubMed or ISI, they are required to use a controlled vocabulary (like medical terminology/MeSH) to provide keywords for their manuscripts. This matter resulted in more relevant results if Latin keywords were used for searching. By the way, it should be noted that the results of some studies showed that such authors did not apply valid words (controlled vocabulary) for writing abstracts. Thus comparing controlled vocabulary and keywords by users would require another in-depth study [10].

There was no significant difference in retrieving relevant results through the use of mismatched keywords. The results also indicated that the number of retrieval in the two databases through the use of Latin keywords has been identical. As the results showed, searching with user and author keywords generally did not result in relevant results. While, Kipp (2011) argued that author keywords and user tags can be valuable in updating the indexing systems and construction of emerging vocabulary. Moreover, the keywords in title and abstract can be useful in successful searching and information retrieval. According to the findings, there was a significant difference between the Persian and Latin keywords on the rate of retrieval. Based on a study [3], 25% of author keywords were an exact match with descriptors and in total, 46% of keywords were of exact and relative matching.

Thus, the author keywords are considered as an important source for being used as descriptors by indexers in databases. On the other hand, despite the low value tags, these could be significant access points in the use of terms applied by the indexers [16]. Then on the basis of the results of this study and other researches, the keywords that were used by authors and users can provide potential access points for desired information retrieval in databases, so that both together can be an important source in retrieving relevant results in both SID and Magiran databases. Moreover, according to the seemingly identical performance of users in choosing the Latin and Persian keywords, it is recommended that it is necessary for databases to revise their indexing policy in order to solve such problems as well as benefit from the natural language of the users in addition to using the author keywords.

5. Conclusions

The results showed that there was a significant difference between the matching of user keywords within the medical field and the author keywords assigned for articles in terms of matching rate, number and syntax. Furthermore, there was a significant difference between SID and Magiran in terms of the impact of the matching rate on information retrieval. On one hand, there was a significant difference in terms of the number and syntax between author and user keywords and on the other hand, there was a significant difference between keywords from PhD students and clinical residents.

The results revealed that both authors and user groups were reluctant to apply compound keywords, while the use of simple and compound keywords was more common. As for matching rate, however, the number of mismatching keywords whether in Latin or Persian was twice the exact and relative matching. The results of comparing the rate of matching between the two groups of users revealed a significant difference, so that according to data analysis, the PhD students were in a more ideal situation than clinical residents. Nevertheless, the desired results were not achieved in retrieving the results from searching by using the author and user keywords in SID and Magiran databases.

Despite the fact that there was a significant difference between the two databases in terms of retrieving relevant results through using author and user keywords, the number of retrieved relevant results was too low. As the results showed, only one or two relevant result(s) was retrieved. Moreover, the findings revealed that there was not a remarkable difference between Latin and Persian keywords in retrieving the relevant results. Based on these findings, given that according to the findings of this study 50% of users in medical areas rely on the two “SID” and “Magiran” databases, it can be inferred that the use of controlled vocabulary for both languages is an inevitable issue.

Given that the indexing system in these two databases is based upon the authors’ keywords, these keywords are, in turn, determined on the basis of journal policies. It can be recommended that all journals ask all authors to select their keywords from within a controlled vocabulary and as suggested in some studies [10] which write the abstracts for their articles based on the terms within it.

Acknowledgements

The authors would like to express their gratitude to librarians of the Management and Medical Information Faculty as well as nurses of hospitals affiliated to Kerman University of Medical Sciences.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Cite this paper

Motamedi, F., Khanjani, N., Talebian, A., Behzadifard, S. and Bakhtyari, F. (2019) Rate of Agreement between Database Users’ and Authors’ Keywords in SID and Magiran Databases and Its Effect on Information Retrieval. Open Access Library Journal, 6: e5330. https://doi.org/10.4236/oalib.1105330

References

  1. 1. Entezarian, N. and Fatahi, R. (2009) A Study on Perception of Users about the Inter-face of Databases Based on Nilson’s Model (Comparing Electronic Article Databases at the Technology and Scientific Center of Iran). Journal of Library and Information Sci-ence, 12, 11-31.

  2. 2. Banieghbal, N., Khosravi, F. and Mirhadi, E. (2010) Comparing Subject Keywords and Abstracts of Dissertations and the Descriptors Set in the National Library and Archives Organization Index. Iranian Studies Librarian Journal, 86, 135-146.

  3. 3. Gil-Leiva, I. and Alonso-Arroyo, A. (2007) Keywords Given by Authors of Scientific Articles in Database Descriptors. Journal of the American Society for Information Science and Technology, 58, 1175-1187. https://doi.org/10.1002/asi.20595

  4. 4. Zhang, X. (2006) Concept Integration of Document Databases Using Different Indexing Languages. Information Processing & Management, 42, 121-135.
    https://doi.org/10.1016/j.ipm.2004.09.003

  5. 5. Smith, E.H. (1991) Enhancing Subject Accessibility to the Online Catalog. Library Resources and Technical Services, 35, 109-113.

  6. 6. Gault, L.V., Shultz, M. and Davies, K.J. (2002) Variations in Medical Subject Headings (MeSH) Mapping: From the Natural Language of Patron Terms to the Controlled Vocabulary of Mapped Lists.

  7. 7. Fatahi, R. and Nikzaman, A. (2011) Analysis of OPAC Searches in Terms of Type and How Consistent They Are with Persian Subject Headings (Data Registration and Retrieval). Journal of Iranian Research Institute for Information Science and Technology, 28, 271-251.

  8. 8. Nowkarizi, M. and Dehghani, K. (2010) Matching Level of Keywords with Descriptors Extracted from Abstract Indexers. Research Information and Public Libraries, 3, 449-477.

  9. 9. NaghnehEsfahani, M., CheshmehSohrabi, M. and Banieghbal, N. (2012) Comparative Study on Keywords Theses University of Medical Sciences, Isfahan Persian and English and Persian Medical Thesaurus Medical Subject Headings (MeSH). Health Information Managemen, 9, 803-813.

  10. 10. Murphy, L.S., Reinsch, S., Najm, W.I., Dickerson, V.M., Seffinger, M.A., Adams, A., et al. (2003) Searching Biomedical Databases on Complementary Medicine: The Use of Controlled Vocabulary among Authors, Indexers and Investigators. BMC Complementary and Alternative Medicine, 3, 1. https://doi.org/10.1186/1472-6882-3-3

  11. 11. Kipp, M.E. (2006) Complementary or Discrete Contexts in Online Indexing: A Comparison of User, Creator and Intermediary Keywords.

  12. 12. Pao, M.L. (1989) Concepts of Information Retrieval. Libraries Unlimited, Englewood.

  13. 13. Heckner, M., Mühlbacher, S. and Wolff, C. (2008) Tagging Tagging. Analysing User Keywords in Scientific Bibliography Management Systems. Journal of Digital Information, 9.

  14. 14. Salaba, A. (2005) Term Selection Process in Subject Searching: End-User Interactions with Information Retrieval Systems and Indexing Languages. University of Wisconsin, Madison.

  15. 15. Carlyle, A. (1989) Matching LCSH and User Vocabulary in the Library Catalog. Cataloging & Classification Quarterly, 10, 37-63.
    https://doi.org/10.1300/J104v10n01_04

  16. 16. Kipp, M.E. and Campbell, D.G. (2006) Patterns and Inconsistencies in Collaborative Tagging Systems: An Exam-ination of Tagging Practices. Proceedings of the American Society for Information Sci-ence and Technology, 43, 1-18.
    https://doi.org/10.1002/meet.14504301178

NOTES

1Semi-natural language is a term proposed by Ebrami in his book titled “Principles of Development of Subject Headings”, where it refers to a language neither controlled nor natural.

2Keyword refers to a word or group of words extracted from the title or text in order to describe the content of the document and its retrieval (Alonso-Arroyo; Gil-Leiva, 2007).