Learn Before
Test-Retest Reliability
Test-retest reliability evaluates the extent to which a psychological measure yields consistent scores when administered to the same individuals across different times. This form of reliability is crucial when assessing constructs that are theoretically expected to remain stable, such as intelligence or self-esteem. However, it is not an appropriate standard for constructs that naturally vary over time, like mood or immediate stress levels.
0
1
Tags
Ch.2 Psychological Research - Psychology @ OpenStax
Psychology @ OpenStax
Introduction to Psychology @ OpenStax Course
OpenStax
OpenStax Psychology (2nd ed.) Textbook
Psychology
Social Science
Empirical Science
Science
KPU
Research Methods in Psychology - 4th American Edition @ KPU
Related
Inter-rater Reliability
Test-Retest Reliability
Internal Consistency
Match each type of measurement reliability with the aspect of consistency it evaluates.
A researcher is developing a new 15-item scale to measure 'subjective well-being.' To evaluate the measure, the researcher checks whether participants who agree with one item (e.g., 'I am happy with my life') also tend to agree with other items on the same scale (e.g., 'My life is close to my ideal'). Which type of reliability is the researcher primarily assessing?
A researcher is developing a new coding system to measure 'prosocial behavior' in toddlers by watching video recordings of their play. To ensure the measure is consistent, she has two different research assistants code the same set of videos. If their observations are highly similar, the researcher has established high test-retest reliability.
A psychologist developing a new behavioral observation tool for 'classroom aggression' finds that her three research assistants all report the same number of aggressive acts for each child. However, when the same children are observed again under identical conditions one week later, their aggression scores have changed dramatically. This pattern suggests the tool has high inter-rater reliability but low ________ reliability.
A psychologist is evaluating the overall reliability of a new behavioral observation scale designed to measure a stable personality trait. Rank the following reliability profiles from the one that provides the 'strongest' evidence of a scientifically sound measure to the one that provides the 'weakest' evidence.
You are tasked with creating a validation protocol for a new psychological instrument that measures 'Academic Persistence' using a combination of a -item questionnaire and a timed behavioral task. To ensure your design accounts for consistency over time, consistency within the questionnaire items, and consistency between different researchers, which of the following sets of procedures must you synthesize into your research plan?
In psychological research, the consistency of a measure's results across different researchers or observers is referred to as test-retest reliability.
Criterion Validity
Internal Consistency
Assessing Test-Retest Reliability
Test-Retest Reliability
Evaluating Measurement Failure
Even if a psychological measurement tool has been shown to be reliable and valid in previous studies, researchers must still evaluate its reliability and validity when used with a new sample of participants.
A researcher uses a well-established personality scale that has demonstrated high reliability in dozens of previous studies. Which of the following best explains why the researcher must still evaluate the scale's reliability using the scores from their own current participants?
A researcher is investigating the relationship between social media usage and self-esteem in high school students. After selecting a validated self-esteem scale, in what order should the researcher perform the following steps to evaluate their measure according to the standard research process?
A researcher is using an established personality inventory () to study a unique group of deep-sea explorers. Match each step of the measurement evaluation process to its primary analytical purpose based on the principles of psychological research.
Regardless of a researcher's expectations or the previous track record of a tool, the process of evaluating a measure in a new study generates new evidence regarding which of the following?
Match each aspect of evaluating a psychological measure to the statement that best explains its role in a new research study.
A researcher's decision to skip reliability and validity testing based on a tool's 'strong track record' is considered a failure of scientific rigor because researchers are required to generate and document new _____ regarding the tool's psychometric properties for every new sample and set of conditions.
Dr. Reyes has published five studies using a validated social anxiety scale exclusively with college student samples. Her colleague, Dr. Park, is now administering the identical scale to a sample of military veterans and plans to skip the psychometric evaluation step because the scale 'already has a proven track record.' Dr. Park's decision to omit the reliability and validity evaluation for this new sample is scientifically justified.
After collecting scores from a new administration of a standardized depression measure, a researcher systematically examines both the consistency of scores across scale items and the degree to which those scores correspond with an independent clinical diagnosis. This two-part evaluation addresses _____ and validity as the core psychometric properties that must be confirmed for each new sample and set of testing conditions.
A graduate researcher has just finished administering a psychological measure of academic motivation to a new sample of first-generation college students. She must now evaluate the measure's psychometric properties. Arrange the following activities in the most defensible scientific order, from the most foundational step (what must be done first) to the most dependent step (what can only be completed meaningfully after all prior steps).
According to the principles of evaluating a psychological measure, what two psychometric properties must a researcher thoroughly evaluate after administering a tool and collecting scores? What should be done with the resulting evidence regardless of prior expectations?
Explain why Dr. Alvarez's decision to skip evaluating the measure is incorrect. What must she verify about the scale, and what is the broader scientific value of conducting this evaluation?
A research team administers an established anxiety scale to a group of elderly residents in a care facility. Even though the scale has been validated in previous studies, apply the principles of measurement evaluation to explain what the team should do with their collected scores before conducting further analysis, and why this is necessary.
Learn After
Test-Retest Correlation
Assessing Test-Retest Reliability
Example of Reliability Without Validity
Test-retest reliability is considered an appropriate standard of consistency for which type of psychological construct?
A psychological measure designed to assess immediate stress levels must demonstrate high test-retest reliability to be considered a useful instrument.
A researcher is deciding whether test-retest reliability is an appropriate metric to evaluate the consistency of several different psychological measures. Match each construct with the correct rationale for using (or not using) this form of reliability.
A research team is reviewing the quality of several new psychological instruments. Rank the following scientific evaluations from the least appropriate application of test-retest reliability to the most appropriate application based on the nature of the constructs and the evidence provided.
Which procedure is used to assess the test-retest reliability of a psychological measure?
Match each research scenario with the correct interpretation of its test-retest reliability based on the nature of the construct being measured.
A researcher finds that a measure of 'General Intelligence' and a measure of 'Immediate Mood' both yield a test-retest correlation of . Upon analysis, the researcher concludes that the 'Immediate Mood' measure may be functioning correctly, but the 'General Intelligence' measure is severely flawed because intelligence is theoretically a(n) _____ construct.