Technology

An Introduction to SDTM Datasets

September 30, 2024

498

Clinical trials generate tons of data—everything from patient demographics to lab results and adverse events. This data often comes from different sources, making it hard to organize consistently.

That’s where SDTM datasets come in. These are specially formatted data files that follow the Study Data Tabulation Model (SDTM), a set of guidelines designed to standardize clinical trial data.

SDTM Datasets 101

SDTM datasets act as the “universal format” for clinical data, ensuring that all the trial information is structured in a way that regulatory agencies, like the FDA or EMA, can easily understand and review. Each SDTM dataset falls into a specific category called a domain, which groups similar data together.

For example:

DM (Demographics): Basic information about study participants, like age and sex
AE (Adverse Events): Records any adverse effects experienced during the trial
LB (Laboratory Data): Contains results from lab tests, like blood work or urine tests

Each domain has specific variables that need to be included and formatted in a particular way. For example, in the DM domain, you’ll always find variables like USUBJID (Unique Subject Identifier) and AGE (Participant Age).

To ensure you’re on the right track when dealing with domains and datasets, consider reviewing a detailed overview of SDTM datasets or other relevant guides. This will give you deeper insights into how these datasets are structured, how to handle different types of clinical data, and the best practices for creating them.

It’s a great first step toward ensuring your datasets are consistent, clear, and ready for regulatory review.

Importance of SDTM Datasets in Clinical Trials

Here’s why these guidelines are important in clinical trials:

Facilitate Regulatory Review

Regulatory agencies have to sift through mountains of data for each clinical trial. When data is presented in a standardized SDTM format, it’s much easier for them to review and interpret.

Ensure Data Consistency and Quality

SDTM provides a clear structure and set of rules for organizing data, which helps maintain high-quality standards. This reduces errors, misinterpretations, and inconsistencies that could potentially compromise the study’s integrity.

Support Data Sharing and Reusability

SDTM datasets aren’t just for regulatory submissions. They’re also great for data sharing within the scientific community. Researchers can reuse SDTM-formatted data for meta-analyses, secondary research, or to develop new hypotheses. Because everyone knows the format and standards, data sharing becomes seamless.

How to Create SDTM Datasets

Creating these datasets may feel a bit overwhelming if you’re new to the process. But it doesn’t have to be. Here are the steps you can put into practice to turn your raw clinical trial data into SDTM-compliant datasets:

Understand the Study Protocol and CRFs

Start by reviewing the study protocol, which outlines the trial’s objectives and data collection methods. Next, go through the Case Report Forms (CRFs) to understand the data points gathered during the trial, such as demographics, lab results, and adverse events.

For example, a diabetes study might collect blood glucose levels, patient age, and side effects. These pieces of information will eventually be mapped to SDTM domains like DM (Demographics) and AE (Adverse Events).

Map Raw Data to SDTM Domains

“Mapping” means assigning each variable from your raw data to the appropriate SDTM domain. For example, data in a column labeled “Gender” would map to the SDTM variable SEX.

Ensure the data uses controlled terminology, converting entries like “M” and “F” to “Male” and “Female.” Use a mapping tool or spreadsheet to keep track of these assignments.

Create SDTM Specifications

Your SDTM specs act as the blueprint for building your datasets. They outline the dataset names (e.g., DM for Demographics), variable names, data types (text, number), controlled terminology, and any transformations.

Generate SDTM Datasets

Using a programming language like SAS, Python, or R, convert your raw data into SDTM datasets. If coding isn’t your forte, don’t worry. This is often handled by clinical programmers. Verify variable names, data types, and any derived values to ensure they match the SDTM specifications.

Validate and QC the Datasets

Validation is key to ensuring datasets comply with CDISC SDTM standards. Use tools like Pinnacle 21 to catch missing variables, incorrect formats, or non-standard terminology. Manual quality checks are equally important. Spot-check records and review derived variables for accuracy.

Prepare for Submission

Finally, organize your datasets for submission to regulatory authorities like the FDA or EMA. Include SDTM datasets, the define.xml file, and annotated CRFs. Make sure everything aligns with eCTD (Electronic Common Technical Document) guidelines to meet regulatory requirements.

Closing Thoughts

This wraps up our introduction to SDTM datasets. While bringing them into being might seem complex at first, careful planning, attention to detail, and using the right tools can make it much more manageable.

By following these basic steps, you’ll set a solid foundation for creating submission-ready datasets that meet industry standards and facilitate the clinical trial review process.