Show Summary Details
Page of

Selection of the Research Question 

Selection of the Research Question
Selection of the Research Question

Keiko Ueda

, Lotfi B Merabet

, Andre Brunoni

, and Felipe Fregni

Page of

PRINTED FROM OXFORD MEDICINE ONLINE ( © Oxford University Press, 2021. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Medicine Online for personal use (for details see Privacy Policy and Legal Notice).

date: 09 December 2021

The difficulty in most scientific work lies in framing the questions rather than in finding the answers.

—Arthur E. Boycott (Pathologist, 1877–1938)

The grand aim of all science is to cover the greatest number of empirical facts by logical deduction from the smallest number of hypotheses or axioms.

—Albert Einstein (Physicist, 1879–1955)


The previous chapter provided the reader with an overview of the history of clinical research, followed by an introduction to fundamental concepts of clinical research and clinical trials. It is important to be aware of and to learn lessons from the mistakes of past and current research in order to be prepared to conduct your own research. As you will soon learn, developing your research project is an evolutionary process, and research itself is a continuously changing and evolving field.

Careful conceptual design and planning are crucial for conducting a reproducible, compelling, and ethically responsible research study. In this chapter, we will discuss what should be the first step of any research project, that is, how to develop your own research question. The basic process is to select a topic of interest, identify a research problem within this area of interest, formulate a research question, and finally state the overall research objectives (i.e., the specific aims that define what you want to accomplish).

You will learn how to define your research question, starting from broad interests and then narrowing these down to your primary research question. We will address the key elements you will need to define for your research question: the study population, the intervention (x, independent variable[s]), and the outcome(s) (y, dependent variable[s]). Later chapters in this volume will discuss popular study designs and elements such as covariates, confounders, and effect modifiers (interaction) that will help you to further delineate your research question and your data analysis plan.

Although this chapter is not a grant-writing tutorial, most of what you will learn here has very important implications for writing a grant proposal. In fact, the most important part of a grant proposal is the “specific aims” page, where you state your research question, hypotheses, and objectives.

How to Select a Research Question

What Is a Research Question?

A research question is an inquiry about an unanswered scientific problem. The purpose of your research project is to find the answer to this particular research question. Defining a research question can be the most difficult task during the design of your study. Nevertheless, it is fundamental to start with the research question, as it is strongly associated with the study design and predetermines all the subsequent steps in the planning and analysis of the research study.

What Is the Importance of a Research Question?

Defining the research question is instrumental for the success of your study. It determines the study population, outcome, intervention, and statistical analysis of the research study, and therefore the scope of the entire project.

A novice researcher will often jump to methodology and design, eager to collect data and analyze them. It is always tempting to try out a new or “fancy” method (e.g., “Let’s test this new proteomic biomarker in a pilot study!” or “With this Luminex assay we can test 20 cytokines simultaneously in our patient serum!”), but this mistake all too often makes the research project a “fishing expedition,” with the unfortunate outcome that a researcher has invested hours of work and has obtained reams of data, only to find herself at an impasse, and never figuring out what to do with all the information collected. Although it is not wrong to plan an exploratory study (or a hypothesis-generating study), such study has a high risk of not yielding any useful information; thus all the effort to have the study performed will be lost. When planning an exploratory or pilot study (with no defined research question), the investigator must understand the goals and the risks (for additional discussion on pilot studies, see Lancaster et al. 2002).

It is important to first establish a concept for your research. You must have a preset idea or a working hypothesis in order to be able to understand the data you will generate. Otherwise, you will not be able to differentiate whether your data were obtained by chance, by mistake, or if they actually reflect a true finding. Also, have in mind ahead of time how you would like to present your study at a conference, in a manuscript, or in a grant proposal. You should be able to present your research to your audience in a well-designed manner that reflects a logical approach and appropriate reasoning.

A good research question leads to useful findings that may have a significant impact on clinical practice and health care, regardless of whether the results are positive or negative. It also gives rise to the next generation of research questions. Therefore, taking enough time to develop the research question is essential.

Where Do Research Questions Come From?

How do we find research questions? As a clinical research scientist, your motivation to conduct a study might be driven by a perceived knowledge gap, the urge to deepen your understanding in a certain phenomenon, or perhaps to clarify contradictory existing findings. Maybe your bench research implies that your findings warrant translation into a study involving patients in a clinical setting. Maybe your clinical work experience gives you the impression that a new intervention would be more effective for your patients compared to standard treatment. For example, your results could lead you to ask, “Does this drug really prolong life in patients with breast cancer?” or, “Does this procedure really decrease pain in patients with chronic arthritis?”

Once you have identified a problem in the area you want to study, you can refine your idea into a research question by gaining a firm grasp of “what is known” and “what is unknown.” To better understand the research problem, you should learn as much as you can about the background pertaining to the topic of interest and specify the gap between current understanding and unsolved problems. As an early step, you should consult the literature, using tools such as MEDLINE or EMBASE, to gauge the current level of knowledge relevant to your potential research question. This is essential in order to avoid spending unnecessary time and effort on questions that have already been solved by other investigators. Meta-analyses and systematic reviews are especially useful to understand the combined level of evidence from a large number of studies and to obtain an overview of clinical trials associated with your questions. You should also pay attention to unpublished results and the progress of important studies whose results are not yet published. It is important to realize that there likely are negative results produced but never published. You can inspect funnel plots obtained from meta-analyses or generated from your own research (see Chapter 14 in this volume for more details) to estimate if there has been publication bias toward positive studies. Also, be aware that clinical trials with aims similar to those of your study might still be ongoing. To find this information, you can check the public registration of trials using sites such as

How to Develop the Research Question: Narrow Down the Question

Once you have selected your research topic, you need to develop it into a more specific question. The first step in refining a research question is to narrow down a broad research topic into a specific description (narrow research question) that covers the four points of importance, feasibility, answerability, and ethicality.

Importance: Interesting, Novel, and Relevant

Your research can be descriptive, exploratory, or experimental. The purpose of your research can be for diagnostic or treatment purposes, or to discover or elucidate a certain mechanism. The point you will always have to consider when making a plan for your study, however, is how to justify your research proposal. Does your research question have scientific relevance? Can you answer the “so what” question? You need to describe the importance of your research question with careful consideration of the following elements:

  • The disease (condition or problem): Novelty, unmet need, or urgency are important. What is the prevalence of this disease/condition? Is there a pressing need for further discoveries regarding this topic because of well-established negative prognoses (e.g., HIV, pancreatic cancer, or Alzheimer’s disease)? Are existing treatment options limited, too complex or costly, or otherwise not satisfactory (e.g., limb replacement, face transplantation)? Does the research topic reflect a major problem in terms of health policy, medical, social, and/or economic aspects (e.g., smoking, hypertension, or obesity)?

  • The intervention: Is it a new drug, procedure, technology, or medical device (e.g., stem-cell derived pacemaker or artificial heart)? Does it concern an existing drug approved by the Food and Drug Administration (FDA) for a different indication (e.g., is Rituximab, a drug normally indicated for malignant lymphoma, effective for systemic lupus erythematosus or rheumatoid arthritis)? Is there new evidence for application of an existing intervention in a different population (e.g., is Palivizumab also effective in immunodeficiency infants, not only in premature infants to prevent respiratory syncytial virus)? Have recent findings supported the testing of a new intervention in a particular condition (e.g., is a β‎-blocker effective in preventing cardiovascular events in patients with chronic renal failure)? Even a research question regarding a standard of care intervention can be valuable if in the end it can improve the effectiveness of clinical practice.


In short, be realistic: novel research tends to jump right away into very ambitious projects. You should carefully prove the feasibility of your research idea to prevent wasting precious resources such as time and money:

  • Patients: Can you recruit the required number of subjects? Do you think your recruitment goal is realistic? Rare diseases such as Pompe or Fabry’s disease will pose a challenge in obtaining a sufficient sample size. Even common diseases, depending on your inclusion criteria and regimen of intervention, may be difficult to recruit. Does your hospital have enough patients? If not, you may have to consider a multi-center study. What about protocol adherence and dropouts? Do you expect significant deviations from the protocol? Do you need to adjust your sample size accordingly?

  • Technical expertise: Are there any established measurements or diagnostic tools for your study? Can the outcome be measured? Is there any established diagnostic tool? Do we have any standard techniques for using the device (e.g., guidelines for echocardiographic diagnosis for congenital heart disease)? Is there a defined optimal dose? Can you operate the device, or can the skill be learned appropriately (e.g., training manual for transcutaneous atrial valve replacement)? A pilot study or small preliminary study can be helpful at this stage to help answer these preliminary questions.

  • Time: Do you have the required time to recruit your patients? Is it possible to follow up with patients for the entire time of the proposed study period (e.g., can you follow preterm infant development at 3, 6, and 9 years of age)? When do you need to have your results in order to apply for your next grant?

  • Funding: Does you budget allow for the scope of your study? Are there any research grants you can apply for? Do the funding groups’ interests align with those of your study? How realistic are your chances of obtaining the required funding? If there are available funds, how do you apply for the grant?

  • Team: How about your research environment? Do your mentors and colleagues share your interests? What kind of specialists do you need to invite for your research? Do you have the staff to support your project (technicians, nurses, administrators, etc.)?


New knowledge can only originate from questions that are answerable. A broad research problem is still a theoretical idea, and even if it is important and feasible, it still needs to be further specified. You should carefully investigate your research idea and consider the following:

  • Precisely define what is known or not known and identify what area your research will address. The research question should demonstrate an understanding of the biology, physiology, and epidemiology relevant to your research topic. For example, you may want to investigate the prevalence and incidence of stroke after catheterization and its prognosis before you begin research on the efficacy of a new anticoagulant for patients who received catheter procedures. Again, you may need to conduct a literature review in order to clarify what is already known. Conducting surveys (interviews or questionnaires) initially could also be useful to understand the current status of your issues (e.g., how many patients a year are diagnosed with stroke after catherization in your hospital? What kind of anticoagulant is already being used for the patients? How old are the patients? How about the duration of cauterization techniques? etc.).

  • The standard treatment should be well known before testing a new treatment. Are there any established treatments in your research field? Could your new treatment potentially replace the standard treatment or be complementary to the current treatment of choice? Guidelines can be helpful for discussion (e.g., American College of Cardiology/American Heart Association guidelines for anticoagulant therapy). Without knowing the current practice, your new treatment may never find its clinical relevance.

  • We also need information about clinical issues for diagnostic tests and interventions. Are you familiar with the diagnoses and treatment of this disease (e.g., computerized tomography or magnetic resonance imaging to rule out stroke after catherization)? Do you know the current guidelines?

Ethical Aspects

Ethical issues should be discussed before conducting research. Is the subject of your research a controversial topic? The possible ethical issues will often depend largely on whether the study population is considered vulnerable (e.g., children, pregnant women, etc.; see Chapter 1) [1]. You must always determine the possible risks and benefits of your study intervention [1].

Finally, you may want to ask for expert opinions about whether your research question is answerable and relevant (no matter how strong your personal feelings may be about the relevance). To this end, a presentation of your idea or preliminary results at a study meeting early on in the project development can help refine your question.

How to Build the Research Question

The next step of formulating a narrow research question is to focus on the primary interest (primary question): What is the most critical question for your research problem? You will define this primary question by addressing the key elements using the useful acronym PICOT (population, intervention, control, outcome, and time), while keeping in mind the importance, feasibility, answerability, and ethicality. Although PICOT is a useful framework, it does not cover all types of studies, especially some observational studies, for instance those investigating predictors of response (E [exposure] instead of I [intervention] is used for observational studies). But for an experimental study (e.g., a clinical trial), the PICOT framework is extremely useful to guide formulation of the research question.

Building the Research Question: PICOT

P (Population or Patient)

What is the target population of your research? The target population is the population of interest from which you want to draw conclusions and inferences. Do you want to study mice or rabbits? Adults or children? Nurses or doctors? What are the characteristics of the study subjects, and what are the given problems that should be considered? You may want to consider the pathophysiology (acute or chronic?) and the severity of the disease (severe end stage or early stage?), as well as factors such as geographical background and socioeconomic status.

Once you decide on the target population, you may select a sample as the study population for your study. The study population is a subset of the target population under investigation. However, it is important to remember that the study population is not always a perfect representation of the target population, even when sampled at random. Thus, defining the study population by the inclusion and exclusion criteria is a critical step (see Chapter 3).

Since only in rare cases will you be able to study every patient of interest, you will have to identify and select whom from the target population you want to study. This is referred to as the study sample. To do this requires choosing a method of selection or recruitment (see Chapter 7).

A specific study sample defined by restricted criteria will have a reduced number of covariates and will be more homogeneous, therefore increasing the chance of higher internal validity for your study. This also typically allows for the study to be smaller and potentially less expensive. In contrast, a restricted population might make it more difficult to recruit a sufficient number of subjects. On the other hand, recruitment can be easier if you define a broad population, which also increases the generalizability of your study results. However, a broad population can make the study larger and more expensive [2].

I (Intervention)

The I of the acronym usually refers to “intervention.” However, a more general and therefore preferable term would be “independent variable.” The independent variable is the explanatory variable of primary interest, also declared as x in the statistical analysis. The independent variable can be an intervention (e.g., a drug or a specific drug dose), a prognostic factor, or a diagnostic test. I can also be the exposure in an observational study. In an experimental study, I is referred to as the fixed variable (controlled by the investigator), whereas in an observational study, I refers to an exposure that occurs outside of the experimenter’s control.

The independent variable precedes the outcome in time or in its causal path, and thus it “drives” the outcome in a cause-effect relationship.

C (Control)

What comparison or control is being considered? This is an important component when comparing the efficacy of two interventions. The new treatment should be superior to the placebo, when there is no standard treatment available. Placebo is a simulated treatment that has no pharmaceutical effects and is used to mask the recipients to potential expectation biases associated with participating in clinical trials. On the other hand, active controls could be used when an established treatment exists and the efficacy of the new intervention should be examined at least within the context of non-inferiority to the standard treatment. Also the control could be baseline in a one-group study.

O (Outcomes)

O is the dependent variable, or the outcome variable of primary interest; in the statistical analysis, it is also referred to as y. The outcome of interest is a random variable and can be a clinical (e.g., death) or a surrogate endpoint (e.g., hormone level, bone density, antibody titer). Selection of the primary outcome depends on several considerations: What can you measure in a timely and efficient manner? Which measurement will be relevant to understand the effectiveness of the new intervention? What is routinely accepted and established within the clinical community? We will discuss the outcome variable in more detail later in the chapter.

T (Time)

Time is sometimes added as another criterion and often refers to the follow-up time necessary to assess the outcome or the time necessary to recruit the study sample. Rather than viewing time as a separate aspect, it is usually best to consider time in context with the other PICOT criteria.

What Is the Primary Interest in your Research?

Once you have selected your study population, as well as the dependent and independent variables, you are ready to formulate your primary research question, the major specific aim, and a hypothesis. Even if you have several different ideas regarding your research problem, you still need to clearly define what the most important question of your research is. This is called your primary question. A research project may also contain additional secondary questions.

The primary question is the most relevant question of your research that should be driven by the hypothesis. Usually only one primary question should be defined at the beginning of the study, and it must be stated explicitly upfront [3]. This question is relevant for your sample size calculation (and in turn, for the power of your study—see Chapter 11).

The specific aim is a statement of what you are proposing to do in your research project.

The primary hypothesis states your anticipated results by describing how the independent variable will affect the dependent variable. Your hypothesis cannot be just speculation, but rather it must be grounded on the research you have performed and must have a reasonable chance of being proven true.

We can define more than one question for a study, but aside from the primary question, all others associated with your research are treated as secondary questions. Secondary questions may help to clarify the primary question and may add some information to the research study. What potential problems do we encounter with secondary questions? Usually, they are not sufficiently powered to be answered because the sample size is determined based on the primary question. Also, type I errors (i.e., false positives) may occur due to multiple comparisons if not adjusted for by the proper statistical analysis. Therefore, findings from secondary questions should be considered exploratory and hypothesis generating in nature, with new confirmatory studies needed to further support the results.

An ancillary study is a sub-study built into the main study design. Previous evidence may convince you of the need to test a hypothesis within a sub-group ancillary to the main population of interest (e.g., females, smokers). While this kind of study enables you to perform a detailed analysis of the subpopulation, there are limitations on the generalizability of an ancillary study since the population is usually more restricted (see Further Readings, Examples of Ancillary Studies).


It is important to understand thoroughly the study variables when formulating the study question. Here we will discuss some of the important concepts regarding the variables, which will be discussed in more detail in Chapter 8.

We have already learned that the dependent variable is the outcome, and the independent variable is the intervention. For study design purposes, it is important to also discuss how the outcome variables are measured. A good measurement requires reliability (precision), validity (“trueness”), and responsiveness to change. Reliability refers to how consistent the measurement is if it is repeated. Validity of a measurement refers to the degree to which it measures what it is supposed to measure. Responsiveness of a measurement means that it can detect differences that are proportional to the change of what is being measured with clinical meaningfulness and statistical significance.

Covariates are independent variables of secondary interest that may influence the relationship between the independent and dependent variables. Age, race, and gender are well-known examples. Since covariates can affect the study results, it is critical to control or adjust for them. Covariates can be controlled for by both planning (inclusion and exclusion criteria, placebo and blinding, sampling and randomization, etc.) and analytical methods (e.g., covariate adjustment [see Chapter 13], and propensity scores [see Chapter 17]).

  • Continuous (ratio and interval scale), discrete, ordinal, nominal (categorical, binary) variables: Continuous data represent all numbers (including fractions of numbers, floating point data) and are the common type of raw data. Discrete data are full numbers (i.e., integer data type; e.g., number of hospitalizations). Ordinal data are ordered categories (e.g., mild, moderate, severe). Nominal data can be either categorical (e.g., race) or dichotomous/binary (e.g., gender). Compared to other variables, continuous variables have more power, which is the ability of the study to detect an effect (e.g., differences between study groups) when it is truly present, but they don’t always reflect clinical meaningfulness and therefore make interpretation more difficult. Ordinal and nominal data may better reflect the clinical significance (e.g., dead or alive, relapse or no relapse, stage 1 = localized carcinoma, etc.). However, ordinal and categorical data typically have less power, and important information may be lost (e.g., if an IQ less than 70 is categorized as developmental delay in infants, IQs of 50, 58, and 69 will all fall into the same category, while an IQ of 70 or more is considered to be normal development, although the difference is just 1 point). This approach is called categorization of continuous data, where a certain clinically meaningful threshold is set to make it easier to quickly assess study results. It is important to note that some authors differentiate between continuous and discrete variables by defining the former as having a quantitative characteristic and the latter as having a qualitative characteristic. This is a somewhat problematic classification, especially when it comes to ordinal data.

  • Single and multiple variables: Having a single variable is simpler, as it is easier for clinical interpretation. Multiple valuables are efficient because we can evaluate many variables within a single trial, but these can be difficult to disentangle and interpret. Composite endpoints are combined multiple variables and are also sometimes used. Because each clinical outcome may separately require a long duration and a large sample size, combining many possible outcomes increases overall efficiency and enables one to reduce sample size requirements and to capture the overall impact of therapeutic interventions. Common examples include MACE (major adverse cardiac events) and TVF (target vessel failure: myocardial infarction in target vessel, target vessel reconstruction, cardiac death, etc.). Interpretation of the results has to proceed with caution, however (see section on case-specific questions) [9].

  • Surrogate variables (endpoints) and clinical variables (endpoints): Clinical variables directly assess the effect of therapeutic interventions on patient function and survival, which is the ultimate goal of a clinical trial. Clinical variables may include mortality, events (e.g., myocardial infarction, stroke), and occurrence of disease (e.g., HIV). A clinical endpoint is the most definitive outcome to assess the efficacy of an intervention. Thus, clinical endpoints are preferably used in clinical research. However, it is not always feasible to use clinical outcomes in trials. The evaluation of clinical outcomes presents some methodological problems since they require long-term follow-up (with problems of adherence, dropouts, competing risks, requiring larger sample sizes) and can make a trial more costly. At the same time, the clinical endpoint may be difficult to observe. For this reason, clinical scientists often use alternative outcomes to substitute for the clinical outcomes. So-called surrogate endpoints are a more practical measure to reflect the benefit of a new treatment. Surrogate endpoints (e.g., cholesterol levels, blood sugar, blood pressure, viral load) are defined based on the understanding of the mechanism of a disease that suggests a clear relationship between a marker and a clinical outcome [8]. Also, a biological rationale provided by epidemiological data, other clinical trials, or animal data should be previously demonstrated. A surrogate is frequently a continuous variable that can be measured early and repeatedly and therefore requires shorter follow-up time, smaller sample size, and reduced costs for conducting a trial. Surrogate endpoints are often used to accelerate the process of new drug development and early stages of development, such as in phase 2 [10]. As a word of caution, too much reliance on surrogate endpoints alone can be misleading if the results are not interpreted with regard to validation, measurability, and reproducibility (see Further Reading) [4].

How to Express a Research Question


Once a narrow research question is defined, you should clearly specify a hypothesis in the study protocol. A hypothesis is a statement about the expected results that predicts the effect of the independent on the dependent variable. A research hypothesis is essential to frame the experimental and statistical plan (statistics will be discussed in Unit II of this volume) and is also important to support the aim of the study in a scientific manuscript.

Types of Research Questions

To refine the research question and form the research hypothesis, we will discuss three types of research questions that investigate group differences, correlations, or descriptive measures. This classification is particularly important in discussing which statistical analysis is appropriate for your research question [5].

  • Basic/complex difference (group comparison) questions: Samples split into groups by levels associated with the independent variable are compared by considering whether there is a difference in the dependent variable. If you have only one independent variable, the question is classified as a basic difference question (e.g., drug A will reduce time to primary closure in a 5-mm punch biopsy vs. placebo) and you would rely on a t-test or one-way analysis of variance (ANOVA) for the analysis. If you have two or more independent variables (e.g., drugs A and B led to a 15-mg/dl reduction in LDL cholesterol versus placebo, but there was no reduction with only drug A), this then becomes a complex difference question and is analyzed by other statistical methods, such as a factorial ANOVA.

  • Basic/complex associational (relational/correlation) questions: The independent variable is correlated with the dependent variable. If there is only one dependent variable and one independent variable (e.g., is there a relationship between weight and natriuretic peptide levels?), it is called a basic associational question, and in this situation, a correlation analysis is used. If there is more than one independent variable associated with one dependent variable (e.g., smoking and drinking alcohol are associated with lung cancer), it is called a complex associational question, and multiple regression is used for statistical analysis.

  • Basic/complex descriptive question: The data are described and summarized using measures of central tendency (means, median, and mode), variability, and percentage (prevalence, frequency). If there is only one variable, it is called a basic descriptive question (e.g., how much MRSA isolates occur after the 15th day of hospitalization?); for more than one variable, a classification of basic/complex descriptive question is used.

Where Should You State Your Research Question?

Finally, where should you state your hypothesis? You may be writing for a research grant, research protocol, or manuscript. Usually, research questions should be stated in the introduction, immediately following the justification (“so what”) section. Research questions should be clearly stated in the form of a hypothesis, such as “We hypothesize that in this particular population (P), the new intervention (I) will improve the outcome (O) more than the standard of care (C).”

A Research Question Should Be Developed over Time

It is important that the investigator spend a good amount of time developing his or her study question. During this process, everything we discussed in this chapter needs to be reviewed and the research question then needs to be refined as this process takes place. A good planning, starting with the research question, is one of the key components for a study’s success.

Related Topics for Choosing the Research Question

Selecting the Appropriate Control in Surgical Studies or Other Challenging Situations

Let’s think about various situations. Can we use placebo (sham) or another procedure as a control in a surgical trial? What exactly can be considered a placebo in surgical studies? How do we control for a placebo effect in surgical procedures?

Placebos can be used for the control group in clinical studies in comparison to a new agent if no standard of care is available. In order to fully assess the placebo effect in the control arm, participants have to be blinded. The control group could either have no surgery at all or undergo a “sham” procedure, but both options might be unethical depending on the given patient population [6]. In surgical studies, the control group usually receives the “traditional” procedure. In all cases, blinding might be very challenging and even impossible on certain levels (e.g., the surgeon performing the procedure). What about acupuncture? What would you consider a good control? What about cosmetic procedures?

Using Adverse Events as the Primary Research Question

Important questions concerning adverse effects can be answered in a clinical trial. However, as the typical clinical trial is performed in a controlled setting, the information regarding adverse effects is not always generalizable to the real-world setting. Thus, the clinical translation of the results needs careful consideration when carrying out a safety-focused study. The adverse reports from phase 4 (post-marketing marketing surveillance) are considered more generalizable information in drug development, although minimum safety data from phase 1 are required to proceed to subsequent study phases.

Also, it might not be easy to formulate a specific research question regarding adverse effects, as they might not be fully known in the early stage of drug development. This will also make it difficult to power the study properly (e.g., how many patients do we need to examine to show the statistically meaningful difference?).

When the Research Question Leads to Other Research Questions

Medical history is filled with interesting stories about research questions. And sometimes, it is not the intended hypothesis to be proven that yields a big discovery. For example, Sildenafil (Viagra) was initially developed by Pfizer for the treatment of cardiovascular conditions. Although clinical trials showed Sildenafil to have only little effect on the primary outcomes, it was quickly realized that an unexpected but marked “side effect” occurred in men. Careful investigation of clinical and pharmacological data generated the new research question, “Can Sildenafil improve erectile dysfunction?” This question was then answered in clinical trials with nearly 5,000 patients, which led to Sildenafil’s FDA approval in 1998 as the first oral treatment for male erectile dysfunction [7]. The investigator must be attentive to novel hypothesis that can be learned from a negative study.

Case Study: Finding the Research Question

Dr. L. Heart is a scientist working on cardiovascular diseases in a large, busy emergency room of a tertiary hospital specialized in acute coronarian syndromes. While searching PUBMED, she found an interesting article on a new drug—which animal studies have demonstrated to be a powerful anti-thrombotic agent—showing its safety in healthy volunteers. She then feels that it would be the right time to perform a phase II trial, testing this new drug in patients presenting myocardial infarction (MI). She sees this as her big career breakthrough. However, when Dr. Heart starts writing a study proposal for the internal review board (ethics committee), she asks herself, “What is my research question?”1


Defining the research question is, perhaps, the most important part of the planning of a research study. That is because the wrong question will eventually lead to a poor study design and therefore all the results will be useless; on the other hand, choosing an elegant, simple question will probably lead to a good study that will be meaningful to the scientific community, even if the results are negative. In fact, the best research question is one that, regardless of the results (negative or positive), produces interesting findings. In addition, a study should be designed with only one main question in mind.

However, choosing the most appropriate question is not always easy, as such a question might not be feasible to be answered. For instance, when researching acute MI, the most important question would be whether or not a new drug decreases mortality. However, for economic and ethical reasons, such an approach can only be considered when previous studies have already suggested that the new drug is a potential candidate. Therefore, the investigator needs to deal with the important issue of feasibility versus clinical relevance. Dr. Heart soon realized that her task would not be an easy one, and also that this task may take some time; she kept thinking about one of the citations in an article she recently read: “One-third of a trial’s time between the germ of your idea and its publication in the New England Journal of Medicine should be spent fighting about the research question.”2

“So What?” Test for the Research Question

Dr. Heart knows that an important test for the research question is to ask, “So what?” In other words, does the research question address an important issue? She knows, for example, that the main agency funding in the United States, the NIH (National Institutes of Health), considers significance and innovation as important factors to fund grant applications. Dr. Heart also remembers something that her mentor used to tell her at the beginning of her career: “A house built on a weak foundation will not stand.” She knows that even if she has the most refined design and uses the optimal statistical tests, her research will be of very little interest or utility if it does not advance the field. But regarding this point, she is confident that her research will have a significant impact in the field.

Next Step for the Research Question: How to Measure the Efficacy of the Intervention

Dr. L. Heart is in a privileged position. She works in a busy hospital that receives a significant amount of acute cardiovascular patients. She also has received huge departmental support for her research, meaning that she can run a wide range of blood exams to measure specific biological markers related to death in myocardial infarction. Finally, she has a PhD student who is a psychologist working with quality of life post-MI. Therefore, she asks herself whether she should rely on biological markers, on the assessment of quality of life, or if she should go to a more robust outcome to prove the efficacy of the new drug. She knows that this is one of the most critical decisions she has to make. It was a Friday afternoon. She had just packed up her laptop and the articles she was reading, knowing that she will have to make a decision by the end of the weekend.

Dr. Heart is facing a common problem: What outcome should be used in a research study? This needs to be defined for the research question. She knows that there are several options. For instance, the outcome might be mortality, new MI, days admitted to the emergency room, quality of life, specific effect of disease such as angina, a laboratory measure (cholesterol levels), or the cost of the intervention. Also, she might use continuous or categorical outcomes. For instance, if she is measuring angina, she might measure the number of days with angina (continuous outcome) or dichotomize the number of angina days in two categories (less than 100 days with angina vs. more or equal to 100 days with angina). She then lays out her options:

  • Use of clinical outcomes (such as mortality or new myocardial infarction): She knows that by using this outcome, her results would be easily accepted by her colleagues; however, using these outcomes will increase the trial duration and costs.

  • Use of surrogates (for instance, laboratorial measurements): One attractive alternative for her is to use some biomarkers or radiological exams (such as a catheterism). She knows a colleague in the infectious disease field who only uses CD4 for HIV trials as the main outcome. This would increase the trial feasibility. However, she is concerned that her biomarkers might not really represent disease progression.

  • Use of quality of life scales: This might be an intermediate solution for her. However, she is still concerned with the interpretation of the results if she decides to use quality of life scales.

More on the Response Variable: Categorical or Continuous?

Even before making the final decision, Dr. Heart needs to decide whether she will use a continuous or categorical variable. She wishes now that she knew the basic concepts of statistics. However, she calls a colleague, who explains to her the main issue of categorical versus continuous outcomes—in summary, the issue is the trade-off of power versus clinical significance.

A categorical outcome usually has two categories (e.g., a yes/no answer), while a continuous outcome can express any value. A categorical approach might be more robust than a continuous one, and it also has more clinical significance, but it also decreases the power of the study due to the use of less information.3 She is now at the crossroad of feasibility versus clinical significance.

Choosing the Study Population

Now that Dr. Heart has gone through the difficult decision of finding the best outcome measure, she needs to define the target population—that is, in which patients is she going to test the new drug? Her first idea is to select only patients who have a high probability of dying—for instance, males who smoke, are older than 75 years, with insulin-dependent diabetes and hypercholesterolemia. “Then,” she thinks, “it will be easier to prove that the new drug is useful regardless of the population I study. But does that really sound like a good idea?”

The next step is to define the target population. Dr. Heart is inclined to restrict the study population, as she knows that this drug might be effective to a particular population of patients and therefore this increases her chances of getting a good result. In addition, she does remember from her statistical courses that this would imply a smaller variability and therefore she would gain power (power is an important currency in research, as it makes the study more efficient, decreasing costs and time to complete the study). On the other hand, she is concerned that she might put all her efforts in one basket—this is a risky approach, as this specific population might not respond, and she knows that broadening the population also has some advantages, for instance, the results would be more generalizable and it would be easier to recruit patients. But this would also increase the costs of the study.

But How about Other Ideas?

After a weekend of reflection, Dr. Heart called the staff for a team meeting and proudly explained the scenario and stated her initial thoughts. The staff was very eager to start a new study, and they made several suggestions: “We should also use echocardiography to assess the outcome!”; “Why don’t we perform a genotypic analysis on these patients?”; “We need to follow them until one year after discharge.” She started to become anxious again. What should she do with these additional suggestions? They all seem to be good ideas.

When designing a clinical trial, researchers expose a number of subjects to a new intervention. Therefore, they want to extract as much data as possible from studies. On the other hand, it might not be possible to ask all of the questions, since this will increase the study’s duration, costs, and personnel. Also, researchers should be aware that all the other outcomes assessed will be exploratory (i.e., their usefulness remains in suggesting possible associations and future studies) because studies are designed to answer a primary question only—and, as a principle of statistics, there is a 5% probability of observing a positive result just by chance (if you perform 20 tests, for instance, one of them will be positive just by chance!). But Dr. Heart knows that she can test additional hypotheses as secondary questions. She knows that there is another issue to go through: the issue of primary versus secondary questions.

Defining Her Hypothesis

After going through this long process, Dr. Heart is getting close to her research question. But now she needs to define the study hypotheses. In other words, what is her educated guess regarding the study outcome?

An important step when formulating a research question is to define the hypothesis of the study. This is important in terms of designing the analysis plan, as well as estimating the study sample size. Usually, researchers come up with study hypotheses after reviewing the literature and preliminary data. Dr. Heart can choose between a simple and a complex hypothesis. In the first case, her hypothesis would only have one dependent variable (i.e., the response variable) and one independent variable (e.g., the intervention). Complex hypotheses have more than one independent and/or dependent variable and might not be easy to use in planning the data analysis.

By the end of the day, Dr. Heart was overwhelmed with the first steps to put this study together. Although she is confident that this study might be her breakthrough and she needs to get her tenure track position at the institution where she works, she also knows she has only one chance and must be very careful at this stage. After wrestling with her thoughts, she finished her espresso and walked back to her office, confident that she knew what to do.

Case Discussion

Dr. Heart is a busy and ambitious clinical scientist and wants to establish herself within the academic ranks of her hospital. She has some background in statistics but seems to be quite inexperienced in conducting clinical research. She is looking for an idea to write up a research proposal and rightly conducts a literature research in her field of expertise, cardiovascular diseases. She finds an interesting article about a compound that has been demonstrated to be effective in an animal model and safe in healthy volunteers (results of a phase I trial). She now plans to conduct a phase II trial, but struggles to come up with a study design. The most vexing problem for her is formulating the research question.

Dr. Heart then reviews and debates aspects that have to be considered when delineating a research question. The main points she ponders include the following: determining the outcome with regard to feasibility (mainly concerning the time of follow-up when using a clinical outcome) versus clinical relevance (when using a surrogate outcome) and with regard to the data type to be used for the outcome (categorical vs. continuous); the importance of the research proposal (the need for a new anticoagulant drug); whether to use a narrow versus broad study population; whether to include only a primary or also secondary questions; and whether to use a basic versus complex hypothesis. Important aspects that Dr. Heart has not considered include the following: whether to test versus a control (although not mandatory in a phase II trial, it deserves consideration since she is investigating the effects of an anticoagulant and therefore adverse events should be expected, thus justifying the inclusion of a control arm) or to test several dosages (to observe a dose-response effect); logistics; the budget; and the overall scope of her project.

All these aspects are important and need careful consideration, but you have to wonder how this will help Dr. Heart come up with a compelling research question. Rather than assessing each aspect separately and making decisions based on advantages and disadvantages, it is recommended to start from a broad research interest and then develop and further specify the idea into a specific research question.

While Dr. Heart should be applauded for her ambition, she should also try to balance the level of risk of her research given her level of experience.

Finally, we should also question Dr. Heart’s motives for conducting this study. What is her agenda?

Case Questions for Reflection

  1. 1. What are the main challenges faced by Dr. Heart?

  2. 2. Should she be really concerned with the study question?

  3. 3. Which variable response should she choose? Justify your choice to your colleagues (remember, there is no right or wrong).

  4. 4. How should she select the study population?

  5. 5. Should her study have secondary questions?

  6. 6. Finally, try to create a study question for Dr. Heart in the PICOT format, depending on your previous selections.

Further Reading


Haynes B, et al., Clinical epidemiology: how to do clinical practice research forming research questions; part 1. Performing clinical research, 3rd ed. Haynes B, Sackett DL, Guyatt GH, and Tugwell P; 2006: 3–14Find this resource:

Portney LG, Watkins MP. Foundations of clinical research: applications to practice. 3rd ed. Pearson; 2008: 121–139.Find this resource:

Surrogate Outcomes

D’Agostino RB. Debate: The slippery slope of surrogate outcomes. Curr Contr Trials Cardiovasc Med. 2000; 1: 76–78.Find this resource:

Echt DS, Liebson PR, Mitchell LB,, Mortality and morbidity in patients receiving encainide, flecainide, or placebo: The Cardiac Arrhythmia Suppression Trial. N Eng J Med. 1991; 324: 781–788.Find this resource:

Feng M, Balter JM, Normolle D, et al. Characterization of pancreatic tumor motion using Cine- MRI: surrogates for tumor position should be used with caution. Int J Radiat Oncol Biol Phys. 2009 July 1; 74(3): 884–891.Find this resource:

Katz R. Biomarkers and surrogate markers: an FDA perspective. NeuroRx. 2004 April; 1(2): 189–195.Find this resource:

Lonn E. The use of surrogate endpoints in clinical trials: focus on clinical trials in cardiovascular diseases. Pharmacoepidemiol Drug Safety. 2001; 10: 497–508.Find this resource:

Composite Endpoint

Cordoba G, Schwartz L, Woloshin S, et al. Definition, reporting, and interpretation of composite outcomes in clinical trials: systematic review. BMJ. 2010; 341: c3920.Find this resource:

Kip KE, Hollabaugh K, Marroquin OC, et al. The problem with composite end points in cardiovascular studies. The story of major adverse cardiac events and percutaneous coronary intervention. JACC. 2008; 51(7): 701–707.Find this resource:

Examples of Ancillary Studies

Krishnan JA, Bender BG, Wamboldt FS, et al. Adherence to inhaled corticosteroids: an ancillary study of the Childhood Asthma Management Program clinical trial. J Allergy Clin Immunol. 2012; 129 (1): 112–118.Find this resource:

Udelson JE, Pearte CA, Kimmelstiel CD, et al. The Occluded Artery Trial (OAT) Viability Ancillary Study (OAT-NUC): influence of infarct zone viability on left ventricular remodeling after percutaneous coronary intervention versus optimal medical therapy alone. Am Heart J. 2011 Mar; 161(3): 611–621.Find this resource:

Controls, Sham/Placebo

Finnissa DG, Kaptchukb TJ, Millerc F, Placebo effects: biological, clinical and ethical advances. Lancet. 2010 February 20; 375(9715): 686–695.Find this resource:

Macklin R. The ethical problems with sham surgery in clinical research. N Engl J Med. 1999 Sep 23; 341(13): 992–996.Find this resource:

Pilot Studies

Lancaster GA, Dodd S, Williamson PR. Design and analysis of pilot studies: recommendations for good practice. J Eval Clin Pract. 2002; 10(2): 307–312.Find this resource:


1. The Belmont Report. Office of the Secretary. Ethical principles and guidelines for the protection of human subjects of research. The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. Washington, DC: U.S. Government Printing Office, 1979.Find this resource:

2. Ferguson L. External validity, generalizability, and knowledge utilization. J of Nursing Scholarship. 2004; 36:1, 16–22.Find this resource:

3. CONSORT statement,]

4. Echt DS, Liebson PR, Mitchell LB, et al. Mortality and morbidity in patients receiving encainide, flecainide, or placebo: The Cardiac Arrhythmia Suppression Trial. N Engl J Med. 1991; 324: 781–788.Find this resource:

5. Morgan GA, Harmon RJ. Clinician’s guide to research methods and statistics: research question and hypotheses. J Am Acad Child Adolesc Psychiatry. 2000; 39(2): 261–263.Find this resource:

6. Macklin R. The ethical problems with sham surgery in clinical research. N Engl J Med. 1999 Sep 23; 341(13): 992–996.Find this resource:

7. Campbell SF. Science, art and drug discovery: a personal perspective. Clin Sci (Lond). 2000 Oct; 99(4): 255–260.Find this resource:

8. Lonn E. The use of surrogate endpoints in clinical trials: focus on clinical trial in cardiovascular disease. Pharmacoepidemiol Drug Safety. 2001; 10: 497–508.Find this resource:

9. Kip KE, Hollabaugh K, Marroquin OC, et al. The problem with composite end points in cardiovascular studies: the story of major adverse cardiac events and percutaneous coronary intervention. JACC. 2008; 51(7): 701–707Find this resource:

10. Katz R. Biomarkers and surrogate markers: an FDA perspective. NeuroRx. 2004 April; 1(2): 189–195.Find this resource:


1 Dr. André Brunoni and Professor Felipe Fregni prepared this case. Course cases are developed solely as the basis for class discussion. The situation in this case is fictional. Cases are not intended to serve as endorsements or sources of primary data. All rights reserved to the authors of this case.

2 Riva JJ, Malik KM, Burnie SJ, Endicott AR, Busse JW. What is your research question? An introduction to the PICOT format for clinicians. J Can Chiropr Assoc. 2012 Sep; 56(3):167–71.

3 These concepts will be discussed in details in Unit II of this volume.