SAS ACT Users Group l SAUSAG
  Chair's Message  |  Presentations  |  Registration | Tips

Technical Tips
Here is a collection of miscellaneous tips.  Most were included in email notices advertising SAUSAG, but there are others too.

If you'd like to offer a tip to share with other users, send it to your favourite committee member on the home page.

Displaying Formats from the SAS Explorer?!
(details)

Display the details of a format by just clicking on it's catalog entry in the SAS Explorer window.

 

What Program is Running?!
(details)

It can be useful for your SAS program to know its own name and location.

 

The LENGTHs Lazy Programmers Go To
(details)

SAS is great for lazy programmers. You can write minimal code but do you really know what SAS is doing for you? 

 

Squeezing more from the COMPRESS function
(details)

The COMPRESS function in SAS9 has been enhanced to squeeze much more than the blanks out of a character string. 

 

Resetting the Date in a SAS Job or Session
(details)

Getting the real date and time in your report titles - instead of when the SAS session/job started.

 

Trap: Searching Strings with the INDEX function (details)

The familiar INDEX function can trip you up if the sub-string you're after is in a variable.

 

Nested Formats  (details)

Nested formats can eliminate duplicate look-up codes and provide easy flexiblity.

 

A Financial Year ‘Format’ – using the INTNX function (details)

There's a lot more to the INTNX function than most people realise - specify your own date/time periods and intervals, financial years and future SAUSAG dates.

PUT it _ALL_ out to the Log - in style  (details)

Here's an easy way to make that PUT _ALL_ a lot more readable.

 

Using variable lists (details)

Short cuts for referencing lots of variables. 

 

Tricks with the colon ':' operator modifier  (details)

Using a colon (:) after a comparison operator can make for simpler coding.

 

Creating macro variables using Proc SQL (details)

Its not only possible to assign simple values to macro variables from within an SQL select statement, but you can do some neat tricks on the way, and very easily.

 

LENGTHy Differences (details)

How long is a piece of string?  The length functions in SAS may not do what you expect.

 

Aligning PUT function results (details)

One of the many small enhancements with V8, introduced the ability to align the results of the PUT function with the  -l, -c, -r options.

 

Using the TYPES statement in Proc Summary (details)

The introduction of the TYPES statement to Proc Summary (and Means) is more than a mere nicety, it can be a life saver with big data sets. 

 

No more macro variable litter (details)

Using macro variables extensively can get messy, and start to chew up memory.  In 8.2 CALL SYMDEL enables macro variables to be deleted.

 

Clearing Variable LABELs and FORMATs (details)

Quickly clear variable labels and formats. 

 

Using TAGSORT with PROC SORT (details)
 

Sorting large datasets can be faster using the TAGSORT. 

 

   
 
 

Displaying Formats from the SAS Explorer

Displaying the contents of a user-defined format hasn’t been particularly easy.  You have to submit this to get the details printed in the Output window:

proc format lib=library.formats fmtlib ;

  select $fctr1l;

run;

However, if you're using Windows, its possible to define your own Actions (i.e. the available actions when you right-mouse click on something) for SAS Explorer objects.  One obvious application of this is to allow us to click on a format Catalog entry type to display the format’s details. 

The hard way is to do this manually but it's rather long-winded and impractical if you want a lot of SAS users to make the same change: make the Explorer window active; click on Tools  -->  Options… --> Explorer…  Then select Catalog Entries from the drop-down list at the top of the dialog box.  A list of catalog entry types is displayed, from where the current actions can be altered.

Here is the simple way: 

1. Copy and paste the following SAS Registry updates to a text file, e.g. c:\temp\regUpdate.txt. 

#--- Update registry with extra Explorer actions 

[CORE\EXPLORER\MENUS\ENTRIES\FORMAT]

"1;&List"="gsubmit ""proc format lib=%b.%b fmtlib; select %b;run;"""

"@"="gsubmit ""proc format lib=%b.%b fmtlib; select %b;run;"""

[CORE\EXPLORER\MENUS\ENTRIES\FORMATC]

"1;&List"="gsubmit ""proc format lib=%b.%b fmtlib; select $%b;run;"""

"@"="gsubmit ""proc format lib=%b.%b fmtlib; select $%b;run;"""

[CORE\EXPLORER\MENUS\ENTRIES\INFMT]

"1;&List"="gsubmit ""proc format lib=%b.%b fmtlib; select @%b;run;"""

"@"="gsubmit ""proc format lib=%b.%b fmtlib; select @%b;run;"""

[CORE\EXPLORER\MENUS\ENTRIES\INFMTC]

"1;&List"="gsubmit ""proc format lib=%b.%b fmtlib; select @$%b;run;"""

"@"="gsubmit ""proc format lib=%b.%b fmtlib; select @$%b;run;"""

 

2. Then submit this:

/* Change the file if necessary */

filename regin 'c:\temp\regUpdate.txt';

proc registry import=regin;

run;

 

After this, you can just double click on a format entry type (or right-click and select List)  to run the associated code, printing the details in the Output window. 

Now isn’t that easier?!

.
 

 
 

What Program is Running?!

You may think you know but, it's sometimes useful for a program to know what it's name is.

This can be handy if you have some generic code that issues messages or emails from a batch program. Or perhaps you want to check the full path of the program to check the version that's running.  Or maybe you want the program to behave differently if its running from a Test environment instead of Prod.

In batch, the following statement will retrieve the current program:

%put This program is %sysfunc(getoption(sysin));

In an interactive session, running SAS9, the environment variables SAS_EXECFILENAME and SAS_EXECFILEPATH can be used in a similar way to return the current program in the Enhanced Editor:

%put %sysget(SAS_EXECFILENAME);

The SAS_EXECFILEPATH variable returns the full path including the file name itself.

 

 

   
 
 

The LENGTHs Lazy Programmers Go To

SAS is great for lazy programmers. You can write minimal code and SAS will fill in the gaps and make all kinds of decisions for you. But it pays to know what SAS is doing for you.

For example, if you can't be bothered with LENGTH statements, you should know how long SAS will make the newText variable below:

data strings;
text = 'MICKEY MOUSE';
newText = 'MINI ' || scan(text,2);
run;


What is the length of newText:

a) 10   b) 17   c) 200   d) 205   e) 400 ?

Lazy programming can cause unexpected results. It pays to be explicit.

This example is based on a question from a SAS Institute Training Assessment.
 

 

 
 

Resetting the Date in a SAS Job or Session

 

By default, the date and time SAS associates with a job, and uses in the titles of the SAS log and the listing file, is the date and time when the job, or your SAS session, was initiated.

 

For example, if you start a SAS session at 9.00am, by default, all output sent to the Output window will have a time of 9.00 in the title, however long your SAS session.

A new SAS system option, DTRESET, is available in V9. With this option in force, the time displayed in SAS titles is the time at which they were actually written. This can be useful as an easy way to spot which steps in a job are taking the time.

Whether you are using DTRESET or the default of NODTRESET, the current system date and time is always available to your programs using the functions TODAY(), TIME() and DATETIME().

 

 
 

Squeezing more from the COMPRESS function

 

 The COMPRESS function is used to remove specified characters from a string.  The first argument is searched for any characters specified in the second argument.  For example:

 

    random_chars = '0a1b2c3d4';

    no_digits = compress(random_chars, '0123456789');

 

will result in no_digits being equal to: abcd.

 

If no second argument is used, then blanks are removed from the string.

 

In SAS 9, a third argument has been added to specify a character class (such as all characters, or all digits) along with modifiers that further alter the behaviour of the function.

 

For example, the example above could be re-written as:

 

    no_digits = compress(random_chars, , 'd');

 

The second argument is not specified, but marked by an extra comma.  The third argument, a 'd', represents all digits so the result is just the same.  All alphabetic characters could be removed by simply using the 'a' character class.

 

The 'k' modifier, in conjunction with the 'd', reverses the effect by compressing all characters except digits e.g.

 

    no_digits = compress(random_chars, , 'kd');

 

See the SAS Samples for more on this new feature.

 

 

 
 

Trap: Searching Strings with the INDEX function

 

Most SAS users would be familiar with the INDEX function which allows you to search a character variable for a given sub-string.

 

It's not without its dangers though, if the sub-string you're after is in a variable.

 

The INDEX function in the code below should return the starting position of the string 'and' within theQuestion variable. 

 

data _null_;

   length theQuestion theSearch $100;

   theQuestion = 'What is the answer to Life, the Universe and Everything';

   theSearch = 'and';

   theAnswer = index(theQuestion, theSearch);

   put theAnswer=;

run;

 

But it doesn't.

   theAnswer=0

 

The trap is theSearch variable is 100 long and so has lots of trailing blanks, which the INDEX function faithfully includes in the search.  So use the TRIM function to remove the trailing spaces and you'll get what you're really looking for.

 

data _null_;

   length theQuestion theSearch $100;

   theQuestion = 'What is the answer to Life, the Universe and Everything';

   theSearch = 'and';

   theAnswer = index(theQuestion, trim(theSearch));

   put theAnswer=;

run;

   theAnswer=42

As is so often the case, the problem lies in knowing what you're looking for!

 

 
 

Nested Formats

 

A feature introduced in V8 was the ability to nest formats.  Its not something you'd want to do every day but it can be really useful.

 

Here is an example where the use of nested formats avoids having multiple copies of codes or look up values.

 

In this instance a personnel data set has one record for each employee.  The variable empgrp identifies the type of employment group they belong to - the values and their meaning can be seen in the empgrp. format.

 

Personnel with an empgrp of 1 - 8 are counted as a RESOURCE but only 1 - 5 are counted as STAFF.

 

proc format;

   value $empgrp

   '1'='Perm FT'

   '2'='Temp FT'

   '3'='Temp PT'

   '4'='Perm PT'

   '5'='Casual'

   '6'='Consultant'

   '7'='Contractor'

   '8'='External'

   other = 'Unknown'

   ;

 

   /* Informat to return a 1 to identify employees as STAFF */

   invalue staff

   '1','2','3','4','5' = 1

   other = 0

   ;

 

   /* Informat to return a 1 to identify employees as a RESOURCE */

   invalue resrce

   '6','7','8' = 1

   other = [staff.] /* THE NESTED FORMAT - if not 6,7,8 then use STAFF. */

   ;

 

data personnel_counts;

   set hr.personnel;

   /* Create indicator/count variables for every personnel record

      to show if they are counted as STAFF and/or as a RESOURCE. */

   staff = input(empgrp, staff.);

   resource = input(empgrp, resrce.);

run;

 

Where a record has a value of '6' for empgrp RESOURCE will be set to 1 but STAFF to 0. 

Where a record has a value of '1' for empgrp RESOURCE will be set to 1 and STAFF to 1. 

Where a record has a value of '9' for empgrp RESOURCE will be set to 0 and STAFF to 0. 

 

The beauty of nested formats in this case, is that it enables code values to be defined once rather than be repeated in multiple formats. 

 

Note that nesting more than 2 or 3 deep can impact performance - even if you can keep track of the logic.

 

 
 

A Financial Year ‘Format’ – using the INTNX function

 

Well, it’s not quite a format, but achieves the same result…

 

Using a largely unknown facility in the INTNX function, finding the financial year of a particular date is easy.  There is no need to write our own formats to do it.

 

The INTNX function enables a given SAS date to be moved forwards or backwards a given number of periods.  For example,  to return the start of next month:

 

     intnx('month', today(), 1)

 

The ‘month’ period is used and the date, returned from the today() function, is advanced 1 period – to the start of the specified period.  If the period was ‘year’ then the 1st of January next year would be returned.

 

There are more subtleties to the INTNX function than most people realise though.

 

The period can be modified by specifying a multiple of the period, and a starting point for the period.  So, to get the start of the current financial year:

 

     finYearStart = intnx('year1.7', today(), 0);

 

The 'year1.7'  means periods of 1 year (the default), but starting from the 7th month (the unit for shifting the starting point depends on the specified period).  The 0 indicates the start of the current period.

 

If you want the financial year in a more descriptive form…

 

     date = '20may04'd;

     finYear = put(intnx('year1.7',date,0), year4.) || '/' ||

               put(intnx('year1.7',date,0,'E'),year2.);

 

results in:  2003/04.

 

 

The ‘E’ parameter (optional) indicates the end of the period, instead of the beginning (‘B’ – the default, or ‘M’ the middle).

 

So you can make sure you don’t miss any SAUSAG meetings, here is another example.  SAUSAG meetings are on the 3rd Thursday of the month, every 3 months, starting in February.

 

     /* Find the SAUSAG meeting dates for 2004 */

 

     data _null_;

       yr = '01Jan2004'd;

       do i=1 to 4;

          meeting = intnx('week1.5', intnx('month3.2',yr, i), 3);

          put   meeting weekdatx25.;

       end;

     run;

 

    Thursday, 19 Feb 2004

    Thursday, 20 May 2004

    Thursday, 19 Aug 2004

    Thursday, 18 Nov 2004

 

 
  Aligning PUT function results

Using the PUT function to convert numeric values to character leaves the result right aligned. For example:

   string = ‘{‘ || put(someNumber, 6.) || ‘}’; 

results, if someNumber is 12, in the variable string having a value of: {    12}.

The PUT function in Version 8 allows us to use the alignment specifications (-L, -C, -R) previously available only in the PUT statement.

To centre the number we can now write:

   string = ‘{‘ || put(someNumber, 6.-C) || ‘}’; 

and gives a result of: {  12  }.

 

 
  Deleting Macro Variables

Good house keeping demands that we throw away things when we no longer need them, such as data sets and variables.  They clutter up our applications and can impact performance.  Macro variables are no exception.

In 8.2 CALL SYMDEL allows us to delete macro variables from the macro global symbol table:

   %let mvar998 = Another macro variable;

   %let mvar999 = Yet another macro variable;

 

   data _null_;

      mvarName = 'mvar998';

      call symdel(mvarName, 'nowarn');

      call symdel('mvar999');

   run;

The nowarn option suppresses a warning message if the macro variable doesn't exist.

There are no excuses for littering your applications with old macro variables now.

 

 
 

PUT it _ALL_ out to the Log - in style

 

It's often useful to print to the Log all the values of variables in the current Data step.  This is usually done with:


         put _all_

 
The problem with this is that the variables and their values are compressed onto as few lines as possible, making them hard to read.  The PUT statement also prints out automatic SAS variables too - often useful but not always wanted.
 

So try:

 

         put (_all_)  (= /);
 
The parentheses turn _all_ from a keyword into a variable list, and a variable list can have an associated format list.  In this case, the = format modifier is used to make the value appear with its variable name, and the  /  throws a new line for each variable.
 
So for example:
 

data _null_ ;

  input a b c $;

  put _all_ ;

  put (_all_) (= /);

  put (_all_) (+0);

cards;

50 99 string

;

run;

 
results in: 

a=50 b=99 c=string _ERROR_=0 _N_=1

 

a=50

b=99

c=string

 

50 99 string

 
Note the third PUT statement has a "do nothing" pointer control, and prints just the variable values.
 
There are many useful possibilities here especially if you also want to treat the _character_ and _numeric_ keywords as variable lists in the same way as _all_.  For example, you could give all the character variables a common format.
 
 

Using Variable Lists

 

If you are handling SAS datasets with many variables, it can be very tedious, and error prone, to have to list them all individually.  There are a number of short cuts available in SAS though to make your coding a little easier.

 

A variable list is a way to abbreviate references to SAS variables.  There are a few different flavours.

 

A numbered range list is familiar to most SAS programmers.  In this case a series of variables have the same name except for a numeric suffix, which must be consecutive numbers.  For example:

 

   length qtr1 qtr2 qtr3 qtr4  8;

   length qtr1-qtr4  8;

 

are both equivalent.

 

A name range list relies on the position of the variables in the program data vector  (the order in which they defined when the Data step was compiled).

 

For example:

 

   input idnum  region $  qtr1-qtr4;

   keep region--qtr4; /* Keeps vars. from region to qtr4 */

   keep idnum-numeric-qtr4;

               /* Keeps numeric vars. from idnum to qtr4 */

 

This one may be useful sometimes but there is a definite risk of mistakes using a name range list.

 

 

A name prefix list allows you to keep all variables which begin with a common prefix:

 

   keep idnum  qtr: ;

   tot = sum(of qtr:);

 

 

Finally, there are special SAS name lists:  _numeric_  (all numeric variables),  _character_ (all character variables), and _all_ (all variables defined in the Data step).  For example:

 

   keep _numeric_; /* Keeps all numeric vars. */

 

 Using variable lists can make coding a lot easier but there is a flip side: they can also hide what’s happening and lead to unintentioned events that are hard to track down.  Handle with care.

 

 
 

Clearing Variable LABELs and FORMATs

 

Sometimes there is a need to remove all variable labels and formats from a SAS dataset - perhaps when making a copy.

 

A quick way to do this is to use the automatic variable lists _all_, _character_ and _numeric_.


 

data new;

  set master;

  format _all_ ;     /* Clears formats for all variables */

  format _numeric_ ; /* Clears formats for numeric variables */

/* label _all_ = ''; /* SAS does NOT accept this as valid syntax */

  attrib _all_ label=''; /* Use ATTRIB statement to clear labels */

run;

 

 
  Tricks with the colon (:) operator modifier

When making character comparisons, you can use the colon (:) after the operator to compare only the first character(s) of the two operands.

For example:     if name =: 'S'

compares just the first character of name  with 'S' to be true for all names which begin with 'S'.

But the fun doesn't stop there.  Have a look at these examples...

name='SMITH';

 

if name <= 'S'

This will be false as 'SMITH' is compared with 'S    '.

 

if name <=: 'S'

While this will be true as only the first character of each value is compared.

 

if name in: ('A','SM')

The length of each value in the list does not have to be the same.  So this finds names starting with A or SM (but not just S).

 

Although sometimes known as the 'Starts With' operator, this can be misleading because the comparison is based on the length of the shortest string, whichever side of the operator it is.

 

if name in:('A','SMITHERS') So this will match with 'SMITH' too.

 

 
 

The TYPES statement in Proc Summary (and Means)

When you only need certain combinations of CLASS variables in a Proc Summary, you might do it like this:

   proc summary data=basedata;

      class vara varb varc;

      output out=summdata(where=(_type_ in (1,3,5))

             sum=;

      /* i.e. combinations 0 0 1, 0 1 1, 1 0 1 */

   run;

 

In Version 8 you can specify the required combinations with the TYPES statement:

   proc summary data=basedata;

      class vara varb varc;

      types varc         /* _type_ = 1 */

            varb * varc  /* _type_ = 3 */

            vara * varc  /* _type_ = 5 */ ;

      output out=summdata

             sum=;

   run;

Apart from being easier to understand what's going on, it can save significant processing and memory on big data sets because SAS doesn't waste resources calculating unwanted _type_s, as with the first example.

 

 
 

LENGTHy Differences

 

The LENGTH function is often used to establish the number of characters in a string, without trailing spaces:


         length string  $10;

         string = 'four';

         len = length(string);

         put len=;

 

Results in :

         len=4

 

Many people assume therefore, that where string is null, all blanks, the length function will return a value of 0.  This is not the case!  The length of a null string is 1.

This is because the "LENGTH function returns an integer that represents the position of the right-most nonblank character in the argument. If the value of the argument is missing, LENGTH returns a value of 1."  Of course, if we all read the doco carefully, we'd know this.

The real problem is that SAS is not consistent.

In SCL, the length function will return a value of 0 for a null string.  And the macro function %length will also return a 0 for a null argument.

So remember, if you're using the length function in a Data step, you can get something for nothing!

 

 
 

Using TAGSORT with PROC SORT

If you struggle sorting large datasets, consider using the TAGSORT option with PROC SORT.

 

It works best where you have long records but a relatively short sort key – the BY variables(s).  The data is read once to create “tags”, consisting of the BY variable(s) and the observation numbers, which are sorted before rereading the data again to generate the sorted output.

 

The saving comes from reduced intermediate disk space and associated I/O.  Whether this really saves elapsed time will depend on your environment and the actual shape of your data.  It will probably use more CPU time.

 

Here is a simple test case, with the timings with and without the TAGSORT option.

 

1    data table;

2    length text $80;

3    retain text 'Record padding';

4    do i=1 to 5000000;

5    j= mod(i,3);

6    output;

7    end;

8    run;

 

NOTE: The data set WORK.TABLE has 5000000 observations and 3 variables.

NOTE: DATA statement used:

      real time           53.89 seconds

      cpu time            8.37 seconds

 

 

9    proc sort data=table out=t1;

10   by j;

11   run;

 

NOTE: There were 5000000 observations read from the data set WORK.TABLE.

NOTE: The data set WORK.T1 has 5000000 observations and 3 variables.

NOTE: PROCEDURE SORT used:

      real time           7:26.04

      cpu time            32.32 seconds

 

 

12   proc sort data=table out=t2 tagsort;

13   by j;

14   run;

 

NOTE: Tagsort reads each observation of the input data set twice.

NOTE: The data set WORK.T2 has 5000000 observations and 3 variables.

NOTE: PROCEDURE SORT used:

      real time           5:57.04

      cpu time            38.06 seconds

 

This test was run on a 1.6Ghz P4 with 512MB RAM. 

 

Before using the TAGSORT on regular jobs, make sure you test its effectiveness first.

 


Back to SAUSAG