Most of the times, we can do our job without having to worry about the language. Unfortunately, we may have to produce clinical report in language like Chinese, Japanese and Korean per client’s request. What should we do? SAS Unicode server helps a lot. And in this post, I will present you several common issues in dealing with multilingual SAS.

Change LOCALE option to switch language

Messages written to SAS log such as notes, warnings and errors always display in the language used at SAS system start up. If we want to change the language used for procedure output and user interface elements, we have to set LOCALE = option. Please note that LOCALLANGCHG option should be turned on to enable you to switch language. This option is turned on by default. If you want to switch it off, you have to add “–NOLOCALLANCHG” to SAS configuration file. Here you can find values for LOCALE = option. Here I have to remind you that encoding option is set to utf-8 for SAS Unicode server. And you have to turn this option on every time you run a SAS session. Just to add this option into autoexec does not work.

 Import multilingual Excel file into SAS dataset

Excel file containing double byte characters (Chinese, Japanese and Korean) cannot be imported into SAS using PROC IMPORT directly. Otherwise, you will see that a lot of characters are replaced with question marks. There are two ways to handle this issue. One requires us to reset regional language on our operating system and then reboot our computer. Another way is to save multilingual Excel file into Unicode text file and then import this text into SAS which is in a Unicode session. It is annoying to restart computer and thus only the second method will be discussed in this post.

Of course, we can open the Excel file and then select File > Save AS to trigger Save As dialog box. In the Save As dialog box, select “Unicode Text (*.txt) for “Save as type” and then click on Ok button to convert excel file into text file. However, there is a smarter way and that is to use VBA tool.

Following figure shows you what the worksheet looks like. Suppose that there is an excel file containing double byte characters stored in D:\ driver and the filename is sample. If you input the full pathname -“D:\sample.xlsx” – into the second cell of column B then click on xls2txt button, a text file with the same name as that of excel file will be created and stored in the same path (D:\ driver in our case). Moreover, the text encoding is Unicode.

Here is the code for xls2txt macro.

Click here to hide/show code


Sub xls2txt()
Application.ScreenUpdating = False
Application.DisplayAlerts = False

Dim SrcRg As Range
Dim CurrRow As Range
Dim CurrCell As Range
Dim CurrTextStr As String
Dim ListSep As String
Dim OutFile As Variant

‘Define variable
Dim wbk As Workbook
Dim ws As Worksheet

‘Open workbook
Set fnm = ThisWorkbook.Sheets(1).Cells(1, 2)
Set wbk = Workbooks.Open(fnm)
wbk.Worksheets(1).Activate

‘Set the pipe seperator
ListSep = “|”
Set SrcRg = ActiveSheet.UsedRange

‘Get and save text file name
OutFile = Replace(fnm, “.xlsx”, “.txt”)

Dim fs As Object, txtf As Object
Set fs = CreateObject(“Scripting.FileSystemObject”)
Set txtf = fs.CreateTextfile(Filename:=OutFile, Overwrite:=True, Unicode:=True)

For Each CurrRow In SrcRg.Rows
CurrTextStr = “”
For Each CurrCell In CurrRow.Cells
CurrTextStr = CurrTextStr & CurrCell.Value & ListSep
Next
While Right(CurrTextStr, 1) = ListSep
CurrTextStr = Left(CurrTextStr, Len(CurrTextStr) – 1)
Wend
‘Added next line to put | at end of each line
CurrTextStr = CurrTextStr & ListSep
txtf.WriteLine CurrTextStr

Next
txtf.Close

‘Close workbook
wbk.Close

End Sub

If you are familiar with VBA, you can see that a delimiter character “|” was put between cells within one row. And that’s why we need to implicitly define a delimiter in infile statement when reading text file into SAS.

Click here to hide/show code


filename infile “D:\sample.txt” encoding = “utf-16”;
data titles.titles;
infile infile delimiter=’|’ missover dsd lrecl=32767 firstobs=2;
length var1-var16 $600 ;
input var1-var16 $;

array arr(*) _character_;
do kk = 1 to dim(arr);
arr(kk) = ktrim(kleft(arr(kk)));
end;
run;

Using DBCS characters (Chinese, Japanese, Korean) in GTL

Graphical template language (GTL) is more and more popular in developing figures in the pharmaceutical industry. It is well known that Unicode can be applied in GTL to display special character like degree symbol. If we use the same to way to display DBCS characters in GTL, the DBCS characters can turn to be a white blank square. This still happens even if we set right LOCALE option. Why? This might because the style default font is not in consistence with double-byte characters.  We have to change font defined in style template which will be used in ODS RTF template. For example, if you want to draw a Chinese character like “生” in a figure using GTL, you have to use Unicode “751F” when defining Graphical template and also have to apply “SimSun” font in RTF style template. Here is where you can find the Unicode for a DBCS character. And below table gives you a summary of DBCS fonts.

Language Font Name
Japanese Msgothic
Msmincho
Simplified Chinese Simhei
Simsun
Traditional Chinese Heit
Mingliu
Korean Gulim
Batang