Abstract
Rigorous evidence identification is essential for systematic reviews and metaanalyses (evidence syntheses) because the sample selection of relevant studies
determines a review's outcome, validity, and explanatory power. Yet, the search systems allowing access to this evidence provide varying levels of precision,
recall, and reproducibility and also demand different levels of effort. To date, it
remains unclear which search systems are most appropriate for evidence synthesis and why. Advice on which search engines and bibliographic databases
to choose for systematic searches is limited and lacking systematic, empirical
performance assessments. This study investigates and compares the systematic
search qualities of 28 widely used academic search systems, including Google
Scholar, PubMed, and Web of Science. A novel, query-based method tests how
well users are able to interact and retrieve records with each system. The study
is the first to show the extent to which search systems can effectively and efficiently perform (Boolean) searches with regards to precision, recall, and reproducibility. We found substantial differences in the performance of search
systems, meaning that their usability in systematic searches varies. Indeed,
only half of the search systems analyzed and only a few Open Access databases
can be recommended for evidence syntheses without adding substantial
caveats. Particularly, our findings demonstrate why Google Scholar is inappropriate as principal search system. We call for database owners to recognize the
requirements of evidence synthesis and for academic journals to reassess quality requirements for systematic reviews. Our findings aim to support
researchers in conducting better searches for better evidence synthesis.