Published research can sound as though it’s written in a different language. If you can’t understand the title of a paper, you may think it won’t get any better if you read on. As a result, many people never read beyond the abstract, which summarizes the study’s methods and findings.
Experimental studies yield all kinds of interesting results that can help you and your clients improve performance, health, safety and other metrics. But if you’re relying on lay media for your information, you may not be getting the whole story.
Not that they get it wrong, per se, but certain aspects of studies may receive more emphasis than the researchers intended, or it may be unclear that the findings applied to a small, specific group of people (say, five male Olympic bobsled athletes).
With a little help, you can learn what to look for so you can better answer questions from clients about the next sensationalized headline—or dig deeper into a lay media story that you find compelling.
The Life Cycle of a Research Paper
When breakthroughs occur, they are typically announced in a scientific journal by the team of researchers who conducted the groundbreaking study. To understand the inner workings of these articles, though, let’s rewind to how they all begin.
A Review of the Scientific Method
Every journal article begins with the scientific method—something you probably learned about in middle school. Though this step-by-step approach has been broken down in various ways over the years, the steps are similar across disciplines: Scientists make an observation, which leads to a research question. From this, they form a hypothesis, see what other research exists on the topic and then design an experiment.
They assemble the tools and participants, conduct the experiment, and collect data. Finally, they aggregate and analyze the results and draw conclusions, which they write up as a research paper.
The guiding principles that underlie quality research are also commonly held by the scientific community. These principles call for research to be ethically conducted, be applicable beyond the context of the study (i.e., it has external validity) and be reproducible by other researchers using the same methodology. All these points will be covered in some detail below.
A Peek at the Peer-Review Process
Scientists are people, too. They are subject to errors in judgment, can be led astray by biases and may be beguiled by data that is “too good to be true.” To avoid these pitfalls, articles undergo a vetting process, called “peer review,” after they are submitted to a journal for publication.
In a peer review, an article is evaluated by scientists in the same field as the study authors. (The process is blinded, meaning these experts do not know whose study it is.) The reviewers examine the study for internal validity, or how sound the research is (a marker of its quality).
They check to see whether the researchers’ data are accurate and reliable, whether appropriate controls and methods of analysis were used, and so on. They then pose questions to the authors, who submit replies and modifications until the reviewers are satisfied. Finally, the reviewers advise the journal editor on whether the paper should be published.
Interestingly, studies that find a null result—that is, the intervention or treatment showed no effect—are far less likely to be published. This is referred to as publication bias, and it means we may never hear about these studies, since their authors may not bother submitting their paper for peer review.
How Journal Articles Are Shared With the Public
Sometimes the results of a research study first appear in an online publication. This enables the researchers to announce their study results as soon as the paper is accepted, rather than waiting for the print edition to be published.
If the journal in question is open access, the full article on the study will be available to everyone for free. In subscription-based journals, the full article will be visible only to paid subscribers—at least at first. Sometimes these journals make the full article available for free after a certain amount of time has passed.
Until then, the research results will be available in the form of an abstract. This is a synopsis of the main purpose and findings of the study (about 200–300 words in length). The context and subtleties of the research, the funding behind it, its strengths and limitations, and how it stacks up against previous work can be assessed only by reading the full article.
Many times, a journal article is also presented in the form of a review paper. This type of article is written by experts, too, but it is more “user-friendly” for nonscientists because the language is easier to understand. Review papers tend to give a good explanation of the issue at hand and then summarize the various findings of other papers on the topic.
A standard review article, however, is subject to inclusion bias by the authors (meaning they might only include studies they deem interesting).
To avoid this, researchers typically do what is called a systematic review. This type of review avoids personal bias by using objective standards for inclusion and exclusion, defining the search terms used to find studies on databases, and explaining which studies were excluded and why. (All of this attention to objectivity makes a systematic review the most authoritative type of review study.)
The goal is to analyze and synthesize “everything” that is already known about a particular topic. Systematic reviews often include a meta-analysis, in which researchers use a statistical model to “translate” their collective data into quantitative results that are easier to understand.
What to Look At in a Journal Article
Reviewers of journal articles can be very exacting, requiring the authors to justify everything from study design to execution and analysis. The system isn’t perfect, but it is a bulwark against misinterpreting or “cherry-picking” data or jumping to conclusions from evidence that isn’t present. (See “A Few Words About Statistics,” page TK, for more about errors.)
Even if a study passes muster, though, that does not mean it will matter to you and your clients. Its relevance will depend on a number of factors.
What were the questions being asked in the journal?
A scientific study can consist of one question or multiple questions. When deciding whether to ask multiple questions, the researchers make sure this will not affect the internal validity (accuracy or trustworthiness) of the research.
If the paper is about swimming and you never hit the pool, it may not be of interest to you.
What population was being studied?
A population simply refers to all members of a specified group. This may be highly specific (e.g., lifelong recreational athletes over age 65) or more general (e.g., American men and women). It is very important to consider this when attempting to generalize research results.
Yes, effects seen in younger men may apply to older women (and vice versa), but we can’t be sure. How the findings of a particular study may be applied to others or generalized to the overall population is known as external validity.
To confidently say that a study’s results apply broadly to “everyone,” the people studied need to vary in age, sex and ethnicity. Large studies often break out results by cohort, with each cohort referring to a group of subjects with a defining characteristic (e.g., age, sex, ethnicity, type of illness).
To draw conclusions about a cohort, it must be large enough that the population it represents would likely respond similarly under the same circumstances. Sometimes a subset of a population included in a study is also referred to as a sample. (Of course, the people in the study are often called participants, though they may be referred to as patients or subjects, as well.)
It is important to note that recruiting people to participate in research introduces selection bias, since those who step forward may differ in important ways from those who won’t set foot in a lab or answer a survey.
Since we don’t test the latter group, we’ll never know. (See “Why Are There So Many Studies on Young Men?,” page TK, for another look at bias.)
What was the Size of the sample being studied?
Some studies are very large and look at a diverse group (varying in age, sex, ethnicity). Others involve a small sample of a specific population of interest (e.g., a particular age, sex and/or ethnicity, or with a specific health condition).
Crossover studies provide a way to increase statistical power while using a smaller sample. For these, experts compare the same people to themselves under multiple circumstances in a random order, with a “washout” period separating the experiments. This cuts the required number of participants by half (or more), while yielding results on several types of interventions.
In any case, to answer a research question, a study must produce results that are statistically significant.This affects the number of participants needed to find out if the hypothesis works.
Having too many participants makes for an unwieldy and expensive study (though more data), but if there are too few, the results are unlikely to show a statistical difference.
How long did the study last?
In some cases, a study will look at interventions over a long period of time. In others, the whole study lasts a few weeks or incorporates only a few lab visits. It’s not that one or the other is better, but the time period could affect how it applies to your clients (or not).
Typically, though, large studies that follow many people over a long time, called prospective research, give us a lot of very strong data.
How was the study funded?
All studies have budget restrictions. The cost of supplies, equipment, testing, staff and volunteer payments add up. Typically, funding comes from private or government grants or research foundations, but sometimes it comes from businesses or corporations that produce a particular product.
These must be listed at the end of the article as potential conflicts of interests (more on that later).
How were the participants divided into groups?
You may have seen the words controlled, randomized, single-blind and double-blind in study abstracts. Here are some quick ways to remember what this jargon means.
In a controlled study, only some of the participants are receiving the treatment or intervention—for example, in a medical trial, half might receive a placebo, or fake pill. The group not receiving treatment is the control group.
In a randomized study, the participants are assigned at random to the experimental or control group. Using randomized controls helps ensure that the results of an experiment are not biased.
Also important is “blinding,” previously mentioned in relation to peer reviews. Within a study, blinding refers to the concealment of information from researchers or participants in an attempt to prevent their personal biases from affecting the results.
In a single-blind study, only one group of people (the researchers) knows who is getting which treatment; the participants do not. In a double-blind study, neither the researchers nor the participants know who is getting what—until after the data are collected.
(At that point, the researchers need to know who was in which group so they can analyze the data.) Since the researchers’ expectations about what works could influence their behavior (and thus that of the volunteers), double-blind research is considered the gold standard.
Where was the study performed?
Repeatability, which refers to the variation in measurements taken by one research team and/or instrument, underlies internal validity, or the inherent accuracy of the test itself.
Laboratory research is often considered more reliable than field research because one can exercise greater control over samples and sampling in the lab than in the field. In the lab, temperature, atmospheric pressure and humidity are maintained so that instruments remain calibrated throughout each testing bout. Samples, be they tissue or bodily fluids, can be tested immediately or processed for long-term storage without delay.
Are the results significant and/or important?
Statistical significance means that the differences found upon analysis are probably real, with a small probability that the effect is merely due to chance. Statistical significance or, in some cases, a trend toward significance in the data, is typically necessary for publication.
However, just because something is statistically significant does not mean the information is meaningful, and vice versa. Whether or not something is “important” is highly subjective.
For instance, training differences among the top three medalists in an Olympic event is typically not statistically significant, but that margin may be very meaningful to people who want to compete with them.
Breaking down an Abstract
Now that you can cut through the jargon—and figure out what is important to you—here is a breakdown of where you’ll find key information in a journal abstract and/or article.
As noted earlier, the first thing you’ll see when you look up published research is typically the abstract, which summarizes the paper. The abstract is a terrific way for you to quickly evaluate whether a study could be of interest to you or your clients. It includes the following sections (though in some cases, the section names may be missing or may be slightly different):
Objectives: This part of the abstract explains the research question or questions and, often, what prompted the study.
Methods: This part gives information on the size and specifics of the group studied, the testing methods used, and how outcomes were measured. It often introduces abbreviations that are then used in the rest of the article (e.g., CRP for C-reactive protein). The number of people who were studied is referred to here as n. So, if there were 23 people in the study, it will say (n = 23).
Results: This section gives the outcomes of the statistical analyses of the data collected during the study. It also notes which (if any) showed significance and what that significance was.
Discussion: Here’s what most of us are really looking for: How can the findings be applied in the real world?
Unfortunately, it is this part that many people use, on its own, as evidence for or against whatever case they are making. However, reading through the paper in its entirety is vital for those who want to truly understand the nuances of the research.
Anatomy of a Journal Article
In a “complete” published research paper, the sections are similar to those in the abstract, but each goes into much greater depth.
Introduction/Background: This is typically an overview of the current literature and the rationale for the study—for example, this section might state what information is missing from the existing research or what existing research the study wishes to try to reproduce.
Reproducibility is a foundational principle of scientific research: If findings from one study cannot be reproduced under similar conditions, it calls into question the original results.
Materials and Methods: Here, you’ll find descriptions of how volunteers were selected and allocated to treatments or to a control group (if used), as well as specifics about the interventions (doses, timing, washout period, etc.). This level of detail may seem excessive, but it allows other laboratories to replicate the techniques in a future study.
Results/Outcomes: This section characterizes the statistical analysis techniques used, including the computer program and version; the sampling techniques, calibration and manufacturer of the machines used in the lab; how the samples were prepared and stored (including test tube type and how long centrifuged); and, sometimes, charts and tables of the main outcomes.
You will also find details and explanations for any adverse events, such as how many volunteers dropped out of which group(s) and an explanation for any data that was excluded from analysis.
(Note: Excluding data is legitimate if it was problematic or if there were errors in its collection, but it is not okay if the results simply didn’t fit in with what researchers hoped to find.)
Discussion/Conclusions: Here, the study authors share their explanation and synthesis of the findings and make a case for why they are important (and to whom).
They also offer suggestions on what, if anything, to do about the findings. This section may also include (or be followed by) a “strengths and limitations” section, also written by the researchers, along with suggestions on what further research should be done.
Strengths may include how the findings can be generalized and how consistent the data was; weaknesses may be related to data being incomplete, inaccessible, difficult to understand or difficult to reconcile with existing research.
Usually, there will also be a description of where funding for the study originated and any possible conflicts of interest (for instance, if a study suggests you should eat 12 pounds of cheese daily—and was funded by a dairy lobby group).
This doesn’t mean that the study is invalid, but it may make you want to compare it with other research done by scientists with “neutral” funding sources to see if those researchers drew the same conclusions.
Case Studies: Comparing the Abstract to the Article
Now that you’ve reviewed the terminology and framework of a journal article, let’s explore some actual abstracts from Frontiers in Physiology, along with some of the additional things you can learn if you read beyond the abstract.
Case study #1:
As a systematic review, this study from Cerqueira et al. (2020) combines findings from a wide array of existing studies. If you were to read only this abstract, you would assume that longer bouts of high-intensity exercise contribute to increased injury risk and chronic inflammation.
However, if you read the whole article, your conclusions would be more nuanced and less definitive, as noted below.
Copyright © 2020 Cerqueira, Marinho, Neiva and Lourenço. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY), https://creativecommons.org/licenses/by/4.0/.
The full article describes the methods for inclusion, the data analysis and the key inflammatory markers of interest. The researchers conclude that exercise has “considerable effects on inflammation markers,” and yet the article notes that “the strong variability in study designs, type, duration, and intensity of exercise remain obstacles in the assessment of the measurable effects of exercise on inflammatory markers.”
Additionally, the authors say that dehydration may have affected the quantification of markers (making them appear more plentiful due to a decrease in plasma volume).
Regarding variability, the studies reviewed were based on individual sports, such as cycling, resistance training and running, limiting the findings’ application to other types of exercise. Also, the number of bouts of exercise and time spent at it varied, and authors were not able to conduct a meta-analysis—again, because of lack of consistency among the studies.
Moreover, most of the studies had only a small number of participants, who exercised at a single level of intensity. And, since all of the studies used healthy, nonsedentary participants, the results could not be easily extrapolated to those with chronic illnesses or completely sedentary lifestyles.
There are a number of other limitations, too. None are so concerning that we should simply ignore the conclusions, but they are concerning enough that we should not reject high-intensity exercise because this study seems to indicate that it causes more problems than it solves.
This study is careful to point out that, along with intensity, the specific exercise performed and the muscle contraction type (eccentric versus concentric) were critical components. Improving safety might be as simple as increasing the recovery period between bouts, but until further research is conducted, we don’t know.
Article Assessment #2
This research from Bertschinger, Giboin & Gruber (2020) is of interest because it appears to undermine existing findings that short bouts of high-intensity training enhance endurance performance and overall fitness.
This type of exercise—also called sprint interval training (SIT) and high-intensity interval training (HIIT)—has become immensely popular in group fitness classes, given its apparent outsized benefits for a relatively small investment of training time.
What is significant about this research article is that it calls into serious question whether at least some of the previous studies were affected by a repeatability flaw, where the instruments or the researchers—not the intervention—were causing the variations. Although the abstract alludes to this, it is explained in greater detail in the main article, which warrants reading.
Annie: In the abstract below, you’ll see the term η2p a number of times. In the original the 2 is superscript and the p is subscript, with the p sitting just above the 2. Can you do that?
An Example of an abstract
@source:Copyright © 2020 Bertschinger, Giboin and Gruber. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY), https://creativecommons.org/licenses/by/4.0/.
How to assess fitness research articles
Experimental control can be undermined by very basic failures, such as failing to account for the duration of a learning effect. Whenever research volunteers are tested—be it to see how long they can exercise to exhaustion or how hard they can voluntarily contract a muscle—they will naturally improve somewhat in their ability to perform the test once they become familiar with what is expected.
There are certainly ways to avoid this error. In this case, the study used a control group that took the same tests as the intervention group. The control group showed the same improvement in neuromuscular function, as one would expect (many early improvements in exercise performance are attributable to this).
But the training group was no different in this, nor was there an increase in their “exercise time to exhaustion” (a marker of endurance improvement) over the 2 weeks of their training.
In the full article, Bertschinger, Giboin & Gruber hypothesize that the two exhaustive cycling tests (taken by both the control group and the training group before testing began) may have induced neuromuscular adaptations that persisted as long as 2 weeks.
In other words, their first step in the process may have caused the control group to reap benefits, making them in effect part of the intervention group!If this study had not used a control group, the researchers might not have realized the problems underlying the previous experiments.
They wholeheartedly expected to find improvements in the training group that did not appear in the control group. When they found no differences, they realized that even minor differences in methodology can lead to very different outcomes—and that no matter how strong results appear to be, replication studies are essential.
They submit that this is a recognized problem in the sport and exercise science field, particularly in evaluating short-term interventions.
Like Exercise, It Gets Easier With Practice
As you can see, even for a nonscientist it’s useful to peruse the current literature—in its entirety—to see whether the people interpreting it have done a good job. As with anything new, you may feel when you first wade in that you’re over your head.
(Your new clients probably feel that way, too!) But the more you read scientific journals and learn their jargon, the easier it will be to understand them (at least to a point).
While most of us aren’t going to be qualified to evaluate the validity of scientific studies (or dismiss them out of hand), we can at least develop a better understanding of their underpinnings, and from there we’ll understand more clearly how they might apply to our clients.
What Qualifies as Ethical Research?
Some of science’s most egregious offenses have occurred in research with vulnerable populations who did not know they were being studied or did not have a choice. That is one reason why, today, a study is not published in a reputable, peer-reviewed journal unless the experiments were approved by an institutional review board, ethical review committee or animal use committee before the work began.
These groups look at proposed research and make sure it does not pose undue risks to the participants; is not relying on populations who might feel pressured to participate (such as students of a research professor or employees of an organization); the compensation offered is commensurate with the risks and inconveniences posed by the study; and those risks and inconveniences are clearly explained to the volunteers in language they can easily understand (i.e., there is informed consent).
Ethical researchers also commit to maintain the anonymity of participants, not to invade their privacy, to give them safe and competent treatment, and to share the results of the study with them (Berg & Latin 2004). Further, all volunteers must be reminded that they can drop out of a study at any time, without penalty; it is unethical for a researcher to suggest otherwise.
Attempts to skirt any of these principles—by conducting research “offshore,” for example—are condemned by the scientific community.
A Few Words About Statistics
Statistics is the science of synthesizing and analyzing data collected under specific conditions. It allows us to account for the uncertainties inherent in a data set and to quantify the relative import of new findings (Hinkle, Wiersma & Jurs 2003).
When research is done correctly, use of statistics allows for careful extrapolation of the results—so we can draw general conclusions for the larger population, while only studying a sample of it. For example, we can assert that exercise is good for us without having to test every human being on the planet.
There are two important types of statistical error that every researcher goes to great pains to avoid: type I, or alpha (⍺), errors; and type II, or beta (β), errors. A type I (⍺) error is when you think there is an effect or difference, but there is not. A type II (β) is when there is an effect or difference and the researchers dismiss it (perhaps because they were not looking for it).
How can you keep the two straight? In the story of “The Boy Who Cried Wolf,” when the boy falsely cries wolf and everyone comes running, that’s type I. When the wolf is really there and no one believes the boy, it’s type II.
For researchers to draw conclusions from a study, the results must be statistically significant.The threshold for statistical significance in exercise research is typically set at ⍺ = 0.05, meaning we can be reasonably confident that, 95% of the time, the differences found in our analysis actually exist. (The other 5% of the time, they may be due to chance.)
Why don’t researchers insist on a higher standard—like 99%? In medical and pharmaceutical trials, they do, because the stakes and costs are higher. However, in exercise research, achieving an ⍺ = 0.01 result would be prohibitively expensive, decreasing the number of studies laboratories could perform.
As imperfect as research may be, the answer is to do more of it and allow the highest-quality information to flow to the top. This naturally happens when educational and certifying organizations seek out the best science—and adapt when the consensus changes.
Types of Studies: Looking Forward or Looking Back
Some studies apply an intervention now and watch what happens, while others look at data or behaviors that have already occurred. Here is a quick guide to the differences.
Most research projects that we think of as “experiments” are prospective studies. That is, they start with a group of subjects (e.g., human volunteers or research animals) and impose some sort of intervention (e.g., treatment or behavior), then observe and record the results.
Longer-term prospective studies you may be familiar with include the Framingham Heart Study, the Nurses’ Health Study and NHANES (the National Health and Nutrition Examination Survey). These studies have gathered high-quality data on a number of factors that affect overall health—among them, diet, exercise, genetics, smoking and oral contraceptive use.
Retrospective studies start by looking at an outcome—such as contracting a specific disease—and work backward. For example, researchers may try to assess why certain people developed a health condition, while others did not.
Retrospective studies often include an odds ratio (OR) or a relative risk (RR) that suggests whether certain factors influence the likelihood of something happening to specific groups (e.g., whether daily exercise reduces the RR of early mortality). It is important to note that ORs and RRs show correlation, not causation.That is, they do not show that A causes B, only that there is some sort of mutual relationship.
When a series of studies repeatedly show a specific correlation, future research may be done to try to tease out causation. Another thing to keep in mind with retrospective studies: They are more likely than prospective research to involve recall bias, as they ask participants to remember what they did (what they ate or how much they exercised, for example) in the past.
Why Are There So Many health Studies on Young Men?
It may seem that there is an inordinate amount of exercise research done on young men. That is because there is, for a number of reasons. Many university labs use male undergrads in their research because there is a large, willing supply on hand; they typically have flexible schedules; and young men do not have the hormonal fluctuations that menstruating and pregnant women do.
Further, many universities require study participants to be no older than 35; older adults (over the age of 50) are more likely to have pre-existing conditions—and thus would require more medical oversight and evaluation during the study.
Until the late 20th century, clinical trials (often led by a medical doctor) also tended to study only males, to avoid complications introduced by the menstrual cycle or the dangers of testing drugs on pregnant women.
It is now recognized that women respond to medications differently from men, and scientists agree that differences in sex, age and ethnicity need to be adequately represented among participants in pharmaceutical research (Liu & Dipietro Mager 2016).
Individual Differences and Personalized Training
An important principle of exercise physiology, individual differences, is in direct opposition to the way most research is done. Generally, research studies report the mean (average) of various measures.
But each human being responds differently to things, including exercise (Bouchard & Rankinen 2001; Pickering & Kiely 2019). This means that even if a particular exercise protocol works well for most people, it may affect a particular individual in quite another way.
As fitness professionals, we already work hard to personalize our approach to clients. Hopefully, as medicine moves toward becoming more personalized and specific to a person’s genome, study results will reflect this and better support individualized training plans.
It would be remarkable if we were able to know what mode, duration and intensity of exercise each person would respond to without engaging in trial and error, but we’re not there yet.
Commonly Mixed-Up Terms
Population: all members of a specified group, such as all residents of a particular city or state, those in a particular age group, etc.
Cohort: study subjects with a common defining characteristic, such as age, sex or illness
Sample: a subset of a population included in a study
Causation: a relationship between two variables in which one produces a result when applied to the other
Correlation: a statistical relationship that exists between any two variables and that may or may not imply causation
Internal validity: how sound the research, methods and analysis of data are; an indicator of quality
External validity: how a study’s findings might be applied or generalized to others
Repeatability: agreement between tests within the same study; e.g., ensuring that instruments are appropriately calibrated so that differences are due to changes and not errors
Reproducibility: ability for other researchers to come up with the same or similar findings when using the same methods and approach
Berg, K.E., & Latin, R.W. 2004. Essentials of Research Methods in Health, Physical Education, Exercise Science, and Recreation. Philadelphia: Lippincott Williams & Wilkins.
Bertschinger, R., Giboin, L.S., & Gruber, M. 2020. Six sessions of sprint-interval training did not improve endurance and neuromuscular performance in untrained men. Frontiers in Physiology, 10, 1578.
Bouchard, C., & Rankinen, T. 2001. Individual differences in response to regular physical activity. Medicine & Science in Sports & Exercise, 33 (6, Suppl.), S446–51.
Cerqueira, É., et al. 2020. Inflammatory effects of high and moderate intensity exercise—a systematic review. Frontiers in Physiology, 10, 1550.
Hinkle, D.E., Wiersma, W., & Jurs, S.G. 2003. Applied Statistics for the Behavioral Sciences (5th ed., pp. 1–13). Boston: Houghton Mifflin Harcourt.
Liu K.A., & Dipietro Mager, N.A. 2016. Women’s involvement in clinical trials: Historical perspective and future implications. Pharmacy Practice, 14 (1), 708.
Pickering, C., & Kiely, J. 2019. Do non-responders to exercise exist—and if so, what should we do about them? Sports Medicine, 49,1–7.