Two Common Programming Issues When Pooling Data from Multiple Studies

Post Views: 1,851

In clinical trial industry, we need to pool data across several trials and combine data from multiple studies into a single dataset. And analyses can be run on the new combined dataset. Obviously, more data is better. More samples, across a broader range of patients, can increase the power of models which defines the factors to determine the relationship between drug exposure and efficacy or safety (more often) effects.
If ADaM structures are the same across those studies from which we need to pool data, we can definitely combine ADaM datasets to do analyses. Unfortunately, we have to create new ADaM based on combined SDTM datasets sometimes in reality. Recently, I got an opportunity to pool SDTM data from 4 studies to make safety report. And in this post, I’d like to introduce the two issue that I have encountered and how to fix it.

How to compile SDTM datasets from different studies together?

Suppose that libname for SDTM datasets from 4 studies are defined as sdtm1, sdtm2, sdtm3, sdtm4 respectively. Following presents the macro that I created to read a specific domain such as DM from each library and then set them together. As we all know, variables from SUPP domains like SUPPDS or SUPPCM may also be used in the derivation of ADaM variables. Thus, it is better to transpose SUPP domains and then merge them with corresponding domains together. Here, variable names in transposed SUPP domains are from values of QNAM of original SUPP domains. In the last, I also present you two examples on how to use this macro. Smart as you, I think that you can use similar approach to write a macro which can compile ADaM together.

Click here to hide/show code

%macro getalldata(liblist = sdtm1*sdtm2*sdtm3*sdtm4, domain= , supp= );
*Import domain;
data &domain;
set
%do i = 1 %to %sysfunc(countw(&liblist,*));
%let libnam = %scan(&liblist,&i,*);
%if %sysfunc(exist(sdtm&libnam..&domain)) %then %do; sdtm&libnam..&domain %end;
%end;;
run;

*Import supp-domain;
%if &supp ne %then %do;
data &supp;
set
%do i = 1 %to %sysfunc(countw(&liblist,*));
%let libnam = %scan(&liblist,&i,*);
%if %sysfunc(exist(sdtm&libnam..&supp)) %then %do; sdtm&libnam..&supp %end;
%end;;
run;

proc sort data=&supp out=&supp;
by usubjid idvarval;
run;
proc transpose data=&supp out=&supp._(drop= _name_ _label_);
by usubjid idvarval;
id qnam;
var qval;
run;

ods output Variables = var;
proc contents data=&supp._;
quit;
ods output close;

proc sql;
select “b.”||strip(variable) into: vlist separated by “, ” from var where variable ne “USUBJID”;
quit;

proc sql undo_policy=none;
create table &domain as select a.*, &vlist
from &domain as a left join &supp._ as b
on a.usubjid = b.usubjid and a.&domain.seq = input(b.idvarval,best.);
quit;
%end;
%mend;
%getalldata(domain=dm);
%getalldata(domain=dd, supp = suppdd);

Here you can see that values of QNAM such as DDAUYP and DDCERTO from original SUPPDD turns to be new variables of DD dataset.

What should we do when lengths of a variable are different across different studies?

When lengths of a variable are not the same across studies to be pooled together. SAS usually prompt warning message like “Multiple lengths were specified for the variable XXX by input data set (s). This can cause truncation of data”. This is a common issue when doing integration. And since SAS takes the length of the variable from the first dataset, so we can put the dataset having the biggest length first. Or we can try specifying the length before read in any data. No matter which approach you choose, you need to know which variables have this issue and their respective length from all pooled studies.

Click here to hide/show code

%macro chk_len(liblist = );
%do i = 1 %to %sysfunc(countw(&liblist,*));
%let libnam = %scan(&liblist,&i,*);
proc contents data=&libnam.._all_ noprint
out = dat&libnam.(keep=memname name length rename=(length=len&i.));
quit;
proc sort; by memname name; run;
%end;

data final;
merge
%do i = 1 %to %sysfunc(countw(&liblist,*));
%let libnam = %scan(&liblist,&i,*);
dat&libnam.
%end;;
by memname name;
run;

data chk_len;
set final;
if not (

%do i = 1 %to %sysfunc(countw(&liblist,*));
%let libnam = %scan(&liblist,&i,*);
len&i.
%if &i ne %sysfunc(countw(&liblist,*)) %then = ;
%end;);
run;
%mend;

%chk_len(liblist = sdtm1*sdtm2*sdtm3*sdtm4);

And here is an example to show you what the final chk_len dataset looks like.