File S1. Supplementary materials
Table of Contents
Methods - expanded 1
Results - supplementary data 6
Discussion – review of previously published bibliometric data on cardiovascular research 9
Discussion - Limitations of the study 13
References 14
Methods - expanded
Data sources
Data retrieval aimed to include all publications (articles, reviews, letters or notes) on the cardiovascular system (heart, blood vessels or pericardium; including all types of research: basic, clinical and epidemiological). The data was obtained from Thomson Reuters Web of Science Core Collection through custom data licensed to ECOOM - KU Leuven.
Different sources were considered at the initiation phase. Using other databases, such as PubMed, would add publications from national level journals, potentially affecting the findings on country participation levels; however, PubMed does not have the necessary data to measure impact (no citations) or communication flow (no article references indexed) or international collaboration (country/address only available for first author and not for all authors in all publications indexed prior to 2012)1,2. Other databases have also been used for bibliometric studies, including Scopus and Google Scholar. Although Google Scholar is an open access database there are limitations due to the lack of information on the algorithms used to identify citations and the inclusion of items such as course handouts in citation counts2,3. Scopus has become a more widely used database, however, it requires a paid subscription, like Web of Science. Research has also shown that country-level bibliometric statistics (outputs and citations) for Scopus and Web of Science are highly correlated4.
Using Thomson Reuters Web of Science Core Collection, the dataset was established using a hybrid information retrieval strategy5. The methods used and steps undertaken to establish the cardiovascular dataset are shown in supplementary Figure S1A.
Figure S1A. Establishment of the cardiovascular publication dataset (Data sourced from Thomson Reuters Web of Science Core Collection)
We started with a high-precision set of core cardiovascular journals, and a low-precision, high-recall set of publications, identified using relevant cardiovascular search terms. The large set of publications identified through the search terms were then limited to the most relevant publication by linking their references and citations with the publications in the core journals.
Data underwent systematic data cleaning and were stored in a local Oracle database. The total cardiovascular dataset contains the full reference, abstract, address, references and citation data for 766.509 unique publications.
Global output and country participation
The most active countries were selected, defined as the countries that contributed to at least 1% of all publications in 2006-2012. Publications were assigned to countries according to author addresses. Each unique country contribution per publication was counted in full, unless otherwise stated. This means that one publication with an author from the USA, from Germany and from the UK will be counted as one publication for each of the three countries. Therefore, if the number of publications from all countries are added together this would add up to greater than 100% of the total number of publications in the dataset. The number and share of publications per country were compared. All publications by authors in the European Union (EU-27) region were also combined as a supra-national region to compare with larger countries such as the USA and China. Note that country classifications change over time in Web of Science, in particular for this dataset, all publications from Hong Kong after 1999 are classified in Web of Science as being from China. Emerging countries were defined as countries that experienced greater than 1% change in shares of publications between 1992-1998 and 2006-2012.
Impact
All publications in the Web of Science citing the cardiovascular dataset were used to calculate a 3 year citation window (citations in the publication year and two following years, i.e. including 2013 and 2014), used in all indicators of impact.
Country-level citations were compared over time, using six indicators6:
· Mean Observed Citation Rate (MOCR): The average number of observed citations per publication per year;
· Mean Expected Citation Rate (MECR): The average number of citations per publication per year for all publications in the Web of Science database for each journal. This is an expected citation rate for the journals in which each paper was published;
· Field Expected Citation Rate (FECR): The average number of citations for the entire field. In this case the field is defined as the entire cardiovascular dataset.
· Normalized Mean Citation Rate (NMCR): The MOCR normalised to the citation rate for the entire field; i.e. MOCR divided by the FECR;
· Share of uncited publications: the share of publications that are not cited;
· Share of publications by top citation classes7: the share of publications cited higher than average across three top citation classes (fairly, remarkably and outstandingly cited). Citation classes are used instead of selecting only the top 1% or 10%, since the division between publications in the top 1% or 10% is often not clear (e.g. publications ranked as being the top 1.001% may have a very similar number of citations as those ranked as being the top 0.99%, however they would be placed in two different categories, top 10% vs. top1%). The citation classes distribute the top publications into mathematically self-adjusting groups without arbitrarily separating publications around a set percentage. Therefore, the division between the citation classes is made at percentages where there is a more clear separation between the citations of the publications7.
Collaboration
Publications were defined as domestic publications when author addresses included only one country and as internationally collaborative publications when more than one country was listed. The shares of domestic and international publications per country and time period were calculated.
The strength of collaboration was measured as defined in Equation 1 (Salton’s cosine)8:
Strength of collaboration AB=CoPubsAB/√(PubsA*PubsB)
Where CoPubsAB= Number of co-publications between CountryA and CountryB; PubsA=Total number of publications by CountryA; PubsB=Total number of publications by CountryB. Collaborations with less than 20 co-publications or where any country had less than 50 publications within the time period, were excluded to maintain statistical stability.
Flow of communication
Citation domesticity and reference domesticity, defined in Equations 2(a and b) were calculated for each country9.
Equation 2a. Citation Domesticity A=CitationsAtoA/CitesA
Where Citations AtoA= Citations from CountryA to CountryA (citations counted as a fraction of the number of unique countries cited in each publication; e.g. if the publication being cited has an author from Germany, the USA and the UK, each country will only receive 1/3 of the citation); CitesA=Total number of citations received by CountryA;
Equation 2b. Reference Domesticity A=ReferencesAtoA/RefsA
Where References AtoA= References from CountryA to CountryA (references counted as a fraction of the number of unique countries in the referenced publication; e.g. if the publication being referenced has an author from Germany, the USA and the UK, each country will only receive 1/3 of the reference); RefsA=Total number of references* in publications by CountryA;
*Only references up to two years before year of publication were included.
The standardised flow of citations between individual countries was defined as per Equation 310.
Equation 3. Strength of citation links AtoB=(Citations AtoB)/√(RefsA*CitesB)
Where Citations AtoB= Citations from CountryA to CountryB (citations counted as a fraction of the number of unique countries in the CountryA publication; e.g. if the CountryA publication has an author from Germany, the USA and the UK, each country will give 1/3 of a citation to Country B); RefsA=Total number of references* in publications by CountryA; CitesB=Total number of citations received by CountryB.
Observations with less than 20 citations or where either country had less than 50 references within the time period, were excluded to maintain statistical stability.
Impact of international collaboration
The indicators of impact (above) were used to compare the impact of domestic and internationally collaborative publications by country in 2006-2012.
Time trend analysis
Apart from the yearly number and shares of publications per country presented, all indicators were calculated and compared using three 7-year time periods: 1992-1998, 1999-2005, 2006-2012.
Software and Visualizations
All calculations of indicators were undertaken in Oracle SQL Developer version 4.0.1 and RStudio11 version 0.99.489. The following R packages were used for calculation and data manipulation: igraph12, reshape213, tidyr14, plyr15 and dplyr16. Graphs and network diagrams were produced using ggplot217, gridExtra18, maps19 and scales20 R packages. Country names are depicted using the ISO 3-letter country code, except for the UK, which is used instead of GBR (see supplementary table S3 for the country names and ISO codes).
Results - supplementary data
Participation
Figure S1B presents the number of publications per country. Please note that only countries with at least 0.1% share of publications in 2006-2012 are included in the figure.
Figure S1-B. Distribution of publications by country 2006-2012 (Data sourced from Thomson Reuters Web of Science Core Collection)
Impact of collaboration
Figure S1-C presents complementary indices for the impact of collaboration on citations.
Figure S1-C. Citation classes and uncited publications of international collaboration 2006-2012 (Data sourced from Thomson Reuters Web of Science Core Collection)
Topics
We further examined what topics are addressed in the collaborative publications focusing on the 2010 to 2012 dataset. A topic map and cluster analysis using the co-occurrence of terms in the titles and abstracts of the international and domestic publications from 2010 to 2012 was created using VOSviewer (Figure S1-D). The map shows distinctive clustering of the publications. Terms with low relevance (e.g. ‘Maximum likelihood’), terms that are often used (e.g. ‘Cardiovascular’) and terms composed of only one word (e.g. ‘outcomes’) were removed to increase relevance of included terms. The size of each circle indicates the number of times the terms occur across all publications, and the distance between circles/terms represents the number of times the terms co-occur in the title and abstracts of the publications. This results in collections of related clusters, coded in different colours.
Figure S1-D. Topic network map of 29,598 Internationally Collaborative Cardiovascular Publications 2010-2012 (Data sourced from Thomson Reuters Web of Science Core Collection)
Based on the terms in these collections we can assign these clusters to the large topics typically highlighted in international congresses and policies. The purple clusters seemingly contains population/public health/risk factor research; the red clusters contains mechanistic and exploratory (basic and clinical) research; the yellow and green cluster collections are more fragmented but seem to contain mostly clinical and applied/interventional research.
Discussion – review of previously published bibliometric data on cardiovascular research
Purpose of review: To inform current and future research by undertaking a comprehensive review of all bibliometric studies on cardiovascular disease research.
Methods: Web of Science and PubMed databases were searched up to April 2016. All titles and abstracts were reviewed by DG for inclusion or exclusion using pre-set criteria. The full text articles obtained were assessed again for inclusion/exclusion by DG and if not fulfilling the inclusion/exclusion criteria were excluded. A standardized form was used to extract data from the included publications for assessment of the scientometric methods and evidence synthesis. A narrative synthesis of the included studies was undertaken.
Findings
Studies included
As of April 2016, 122 eligible studies were included in the review database. The 122 publications identified are highly diverse in terms of methods used, topic and country of focus, and findings presented, with the publications categorised as either bibliometric methodological studies (n=11), topic focused studies using advanced or state-of-the-art bibliometric methods (n=19), or topic focused studies using basic bibliometric methods (n=91).
Publication evolution
Recently, there has been a growing interest in undertaking bibliometric studies in cardiovascular disease, with 2/3 of the included articles being published since 2010. One included article was published in 1972, 10 articles were published in the 1990’s, whereas 27 articles were published in the years 2000 to 2009, and 84 articles were published from the year 2010.
Topics of focus
More than half of the publications (n=69) investigated the broad topic of cardiovascular disease, whereas the remaining studies focused on more specific topics including: cardiology (9), ischemic heart disease/myocardial infarction/atherosclerosis (n=7), surgery (n=7), cardiovascular risk factors/hypertension (n=7), anaesthesiology (n=3), congenital heart disease (n=5), imaging (n=3), cardiovasology (n=2), vascular (n=5), cardiovascular biology (n=1), acute coronary syndrome (n=1), Brugada Syndrome (n=1), electrocardiogram(n=1), inflammatory disorders of the heart (n=1), rheumatic heart disease (n=1), cardiovascular devices (n=1), atrial fibrillation (n=1).
Countries of Focus
Two-thirds of the studies (n=80) collected and presented data from around the world, without focusing on specific countries or regions. Of the remaining 42 studies, the following countries or regions were the primary focus including: Spain (n=8), the United States (n=7), China (n=3), the Netherlands (n=2), Brazil (n=3), mix of selected countries (n=3), Latin America (n=3), Eastern Mediterranean (n=3), Europe (n=2), United Kingdom (n=2), Eastern Europe (n=1), Africa(n=1), Canada (n=1), Germany (n=1), Israel (n=1), Ireland (n=1), and South Africa (n=1).
References for methodological studies
Gregori Junior F, Godoy MF de, Gregori FF. Proposal of an individual scientometric index with emphasis on ponderation of the effective contribution of the first author: h-fac index. Rev Bras Cir Cardiovasc 2012;27:370–376.
Groneberg-Kloft B, Scutaru C, Fischer A, Welte T, Kreiter C, Quarcoo D. Analysis of research output parameters: density equalizing mapping and citation trend analysis. BMC Health Serv Res 2009;9:16.
Jarneving B. The cognitive structure of current cardiovascular research. SCIENTOMETRICS 2001;50:365–389.
Jian D, Xiaoli T. Perceptions of author order versus contribution among researchers with different professional ranks and the potential of harmonic counts for encouraging ethical co-authorship practices. Scientometrics 2013;96:277–295.
Jonnalagadda SR, Moosavinasab S, Nath C, Li D, Chute CG, Liu H. An automated approach for ranking journals to help in clinician decision support. AMIA Annu Symp Proc AMIA Symp AMIA Symp 2014;2014:757–766.
Leydesdorff L, Opthof T. Citation analysis with medical subject Headings (MeSH) using the Web of Knowledge: A new routine. J Am Soc Inf Sci Technol 2013;64:1076–1080.