SAS Cohort Analysis Overview - Locus IT Services

SAS Cohort Analysis Overview

Locus IT ServicesCohort AnalyticsSAS Cohort Analysis Overview

SAS Cohort Analysis Overview

SAS Cohort Analysis involve pairs of cases where outcomes of interest (e.g., mortality, morbidity, injury, etc.) among persons exposed to risk factors are compared with persons not exposed.

SAS Cohort Analysis have been used in traffic-crash research, for example, to compare the injury risk of drivers and passengers from the same crash or vehicle.

SAS Cohort Analysis helps reduce bias and confounding effects that may exist in estimating risk or hazard ratios. Only the data where one or more of the paired cases with the outcome of interest are used, which is useful when little or no information is available regarding pairs in which cases did not have the outcome.

SAS Cohort Analysis Features

  • Matching has been used in epidemiological studies as a means of reducing confounding and increasing efficiency in cohort studies. In traffic crash research, matching is used to estimate the association between an exposure, such as seat belt use, and an outcome, such as death.
  • SAS Cohort Analysis differ from matched-pair case-control studies involved in fatal crashes in that the data come from pairs experiencing the same outcome of interest (e.g., death). Case-control studies tend to overstate associations for the same outcome when the outcome is common.
  • Matching persons in the same vehicle (or vehicles involved in the same crash) gives research investigators the ability to control for variables that may be too costly or impossible to measure.
  • Matching on persons in the same vehicle could also control potential confounding effects of other vehicle-related factors in a crash, such as speed, vehicle make, whether the vehicle rolled over, and so forth.
  • Matching would even control for variables that may have been overlooked, but may be specific to a vehicle or crash and common to individuals in the same vehicle.

Creating Multiple Cohorts Using the SAS

  • The challenge of creating multiple cohorts of people within a data set, based on one or more common characteristics, is driven by the nature of the data that is collected and how it is structured.
  • A major benefit of establishing such a data set is the ease and flexibility with which you can statistically examine differences among respondents using various indicator variables, such as test dates, particularly if there is consistency to the data layout over time. If not, the data analyst may need to spend additional time to create such a layout.
  • A standardized test may have several administrations during the course of a calendar year. The SAS frequency procedure, PROC FREQ, could easily identify patterns in the data. However, while some candidates will take a test once, others may take the same test multiple times.
  • Each time a candidate takes a test, a record is appended to a database with the score, along with an appropriate indicator of whether or not the respondent is a repeat testtaker.
  • An in-depth cohort analysis would focus on the patterns of testing dates, including test-retest behavior, as the time between testing dates may impact performance.
  • There are two important issues to address at the outset. Multiple years of data may be needed to ensure adequate sample sizes for certain analyses.
  • Additionally, the combination of administration date and repeat test taking behavior can lead to a large series of frequency distributions that would need to be organized and interpreted, which may not be the most efficient way to start the cohort analysis.

Identification of Repeat Test-Taking Behavior Profiles

  • The data is now arranged so that for each unique respondent in the original file, a set of key variable information grouped by administration date is displayed. As mentioned earlier, the key fields will be blank (for repeat test-taking status) or missing  for any administration date in which the respondent did not take the test.
  • The creation of the repeater profile is done in two steps. First, given that the repeater status variable is text and not numeric, concatenation of these letters by administration is required. The width of the resulting variable, referred to as “string”, is equal to the number of administrations in the original data file.
  • The second step is to use the FIND function to identify the column positions, expressed as numeric values from zero to the number of administrations of four key test-taking dates, expressed as new variables: the instance of the ‘N’ (if it exists), the instance of the first ‘Y’ which marks the first repeat, the second ‘Y’, and the third ‘Y’, if any or all exist in the data file.

SAS Cohort Analysis is a simple, yet effective way to understand the performance of your acquisition efforts and marketing  retention. We at Locus IT provide SAS Cohort Analysis training, SAS Cohort Analysis support, SAS Cohort Analysis services and SAS Cohort Analysis Staffing. For more information please contact us.

Locus IT Project Management Office
What’s it?