![]() ![]() |
|||||||||||||||||||||||||||||||||||||
| SAS ACT Users Group l SAUSAG | |||||||||||||||||||||||||||||||||||||
| Chair's Message | Presentations | Registration | Tips | |||||||||||||||||||||||||||||||||||||
|
Technical Tips |
|||||||||||||||||||||||||||||||||||||
| Here is a collection of miscellaneous tips.
Most were included in email notices advertising SAUSAG, but there are
others too. If you'd like to offer a tip to share with other users, send it to your favourite committee member on the home page.
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Displaying Formats from the SAS Explorer Displaying the contents of a user-defined format hasn’t been particularly easy. You have to submit this to get the details printed in the Output window: proc format lib=library.formats fmtlib ; select $fctr1l; run; However, if you're using Windows, its possible to define your own Actions (i.e. the available actions when you right-mouse click on something) for SAS Explorer objects. One obvious application of this is to allow us to click on a format Catalog entry type to display the format’s details. The hard way is to do this manually but it's rather long-winded and impractical if you want a lot of SAS users to make the same change: make the Explorer window active; click on Tools --> Options… --> Explorer… Then select Catalog Entries from the drop-down list at the top of the dialog box. A list of catalog entry types is displayed, from where the current actions can be altered. Here is the simple way: 1. Copy and paste the following SAS Registry updates to a text file, e.g. c:\temp\regUpdate.txt. #--- Update registry with extra Explorer actions [CORE\EXPLORER\MENUS\ENTRIES\FORMAT] "1;&List"="gsubmit ""proc format lib=%b.%b fmtlib; select %b;run;""" "@"="gsubmit ""proc format lib=%b.%b fmtlib; select %b;run;""" [CORE\EXPLORER\MENUS\ENTRIES\FORMATC] "1;&List"="gsubmit ""proc format lib=%b.%b fmtlib; select $%b;run;""" "@"="gsubmit ""proc format lib=%b.%b fmtlib; select $%b;run;""" [CORE\EXPLORER\MENUS\ENTRIES\INFMT] "1;&List"="gsubmit ""proc format lib=%b.%b fmtlib; select @%b;run;""" "@"="gsubmit ""proc format lib=%b.%b fmtlib; select @%b;run;""" [CORE\EXPLORER\MENUS\ENTRIES\INFMTC] "1;&List"="gsubmit ""proc format lib=%b.%b fmtlib; select @$%b;run;""" "@"="gsubmit ""proc format lib=%b.%b fmtlib; select @$%b;run;"""
2. Then submit this: /* Change the file if necessary */ filename regin 'c:\temp\regUpdate.txt'; proc registry import=regin; run;
After this, you can just double click on a format entry type (or right-click and select List) to run the associated code, printing the details in the Output window. Now isn’t that easier?!
. |
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
You may think you know but, it's sometimes useful
for a program to know what it's name is.
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
The LENGTHs Lazy Programmers Go To SAS is great for lazy programmers. You can
write minimal code and SAS will fill in the gaps and make all kinds of
decisions for you. But it pays to know what SAS is doing for you.
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Resetting the Date in a SAS Job or Session
By default, the date and time SAS associates with a job, and uses in the titles of the SAS log and the listing file, is the date and time when the job, or your SAS session, was initiated.
For example, if you start a SAS session at 9.00am, by default, all output sent to the Output window will have a time of 9.00 in the title, however long your SAS session. A new SAS system option, DTRESET, is available in V9. With this option in force, the time displayed in SAS titles is the time at which they were actually written. This can be useful as an easy way to spot which steps in a job are taking the time. Whether you are using DTRESET or the default of NODTRESET, the current system date and time is always available to your programs using the functions TODAY(), TIME() and DATETIME().
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Squeezing more from the COMPRESS function
The COMPRESS function is used to remove specified characters from a string. The first argument is searched for any characters specified in the second argument. For example:
random_chars = '0a1b2c3d4'; no_digits = compress(random_chars, '0123456789');
will result in no_digits being equal to: abcd.
If no second argument is used, then blanks are removed from the string.
In SAS 9, a third argument has been added to specify a character class (such as all characters, or all digits) along with modifiers that further alter the behaviour of the function.
For example, the example above could be re-written as:
no_digits = compress(random_chars, , 'd');
The second argument is not specified, but marked by an extra comma. The third argument, a 'd', represents all digits so the result is just the same. All alphabetic characters could be removed by simply using the 'a' character class.
The 'k' modifier, in conjunction with the 'd', reverses the effect by compressing all characters except digits e.g.
no_digits = compress(random_chars, , 'kd');
See the SAS Samples for more on this new feature.
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Trap: Searching Strings with the INDEX function
Most SAS users would be familiar with the INDEX function which allows you to search a character variable for a given sub-string.
It's not without its dangers though, if the sub-string you're after is in a variable.
The INDEX function in the code below should return the starting position of the string 'and' within theQuestion variable.
data _null_;length theQuestion theSearch $100; theQuestion = 'What is the answer to Life, the Universe and Everything'; theSearch = 'and'; theAnswer = index(theQuestion, theSearch); put theAnswer=; run ;
But it doesn't. theAnswer=0
The trap is theSearch variable is 100 long and so has lots of trailing blanks, which the INDEX function faithfully includes in the search. So use the TRIM function to remove the trailing spaces and you'll get what you're really looking for.
data _null_; length theQuestion theSearch $100; theQuestion = 'What is the answer to Life, the Universe and Everything'; theSearch = 'and'; theAnswer = index(theQuestion, trim(theSearch)); put theAnswer=; run ;theAnswer=42 As is so often the case, the problem lies in knowing what you're looking for!
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
A feature introduced in V8 was the ability to nest formats. Its not something you'd want to do every day but it can be really useful.
Here is an example where the use of nested formats avoids having multiple copies of codes or look up values.
In this instance a personnel data set has one record for each employee. The variable empgrp identifies the type of employment group they belong to - the values and their meaning can be seen in the empgrp. format.
Personnel with an empgrp of 1 - 8 are counted as a RESOURCE but only 1 - 5 are counted as STAFF.
proc format;value $empgrp '1'='Perm FT' '2'='Temp FT' '3'='Temp PT' '4'='Perm PT' '5'='Casual' '6'='Consultant' '7'='Contractor' '8'='External'other = 'Unknown';
/* Informat to return a 1 to identify employees as STAFF */ invalue staff'1','2','3','4','5' = 1 other = 0;
/* Informat to return a 1 to identify employees as a RESOURCE */ invalue resrce'6','7','8' = 1 other = [ staff.] /* THE NESTED FORMAT - if not 6,7,8 then use STAFF. */;
data personnel_counts;set hr.personnel; /* Create indicator/count variables for every personnel record to show if they are counted as STAFF and/or as a RESOURCE. */ staff = input(empgrp, staff.);resource = input(empgrp, resrce.);run ;
Where a record has a value of '6' for empgrp RESOURCE will be set to 1 but STAFF to 0. Where a record has a value of '1' for empgrp RESOURCE will be set to 1 and STAFF to 1. Where a record has a value of '9' for empgrp RESOURCE will be set to 0 and STAFF to 0.
The beauty of nested formats in this case, is that it enables code values to be defined once rather than be repeated in multiple formats.
Note that nesting more than 2 or 3 deep can impact performance - even if you can keep track of the logic.
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
A Financial Year ‘Format’ – using the INTNX function
Well, it’s not quite a format, but achieves the same result…
Using a largely unknown facility in the INTNX function, finding the financial year of a particular date is easy. There is no need to write our own formats to do it.
The INTNX function enables a given SAS date to be moved forwards or backwards a given number of periods. For example, to return the start of next month:
intnx('month', today(), 1)
The ‘month’ period is used and the date, returned from the today() function, is advanced 1 period – to the start of the specified period. If the period was ‘year’ then the 1st of January next year would be returned.
There are more subtleties to the INTNX function than most people realise though.
The period can be modified by specifying a multiple of the period, and a starting point for the period. So, to get the start of the current financial year:
finYearStart = intnx('year1.7', today(), 0);
The 'year1.7' means periods of 1 year (the default), but starting from the 7th month (the unit for shifting the starting point depends on the specified period). The 0 indicates the start of the current period.
If you want the financial year in a more descriptive form…
date = '20may04'd; finYear = put(intnx('year1.7',date,0), year4.) || '/' || put(intnx('year1.7',date,0,'E'),year2.);
results in: 2003/04.
The ‘E’ parameter (optional) indicates the end of the period, instead of the beginning (‘B’ – the default, or ‘M’ the middle).
So you can make sure you don’t miss any SAUSAG meetings, here is another example. SAUSAG meetings are on the 3rd Thursday of the month, every 3 months, starting in February.
/* Find the SAUSAG meeting dates for 2004 */
data _null_; yr = '01Jan2004'd; do i=1 to 4; meeting = intnx('week1.5', intnx('month3.2',yr, i), 3); put meeting weekdatx25.; end; run;
Thursday, 19 Feb 2004 Thursday, 20 May 2004 Thursday, 19 Aug 2004 Thursday, 18 Nov 2004
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Aligning PUT function results Using the PUT function to convert numeric values to character leaves the result right aligned. For example: string = ‘{‘ || put(someNumber, 6.) || ‘}’; results, if someNumber is 12, in the variable string having a value of: { 12}.The PUT function in Version 8 allows us to use the alignment specifications (-L, -C, -R) previously available only in the PUT statement. To centre the number we can now write: string = ‘{‘ || put(someNumber, 6.-C) || ‘}’; and gives a result of: { 12 }.
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Deleting
Macro Variables Good house keeping demands that we throw away things when we no longer need them, such as data sets and variables. They clutter up our applications and can impact performance. Macro variables are no exception. In 8.2 CALL SYMDEL allows us to delete macro variables from the macro global symbol table: %let mvar998 = Another macro variable; %let mvar999 = Yet another macro variable;
data _null_; mvarName = 'mvar998'; call symdel(mvarName, 'nowarn'); call symdel('mvar999'); run; The nowarn option suppresses a warning message if the macro variable doesn't exist. There are no excuses for littering your applications with old macro variables now.
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
PUT it _ALL_ out to the Log - in style
It's often useful to print to the Log all the values of variables in the current Data step. This is usually done with:
The problem
with this is that the variables and their values are compressed onto as
few lines as possible, making them hard to read. The PUT statement also
prints out automatic SAS variables too - often useful but not always
wanted.
So try:
put (_all_) (= /);
The
parentheses turn _all_ from a keyword into a variable
list, and a variable list can have an associated format
list. In this case, the = format modifier is used to make the
value appear with its variable name, and the / throws a new line
for each variable.
So for
example:
data _null_ ; input a b c $; put _all_ ; put (_all_) (= /); put (_all_) (+0); cards; 50 99 string ; run;
results in:
a=50 b=99 c=string _ERROR_=0 _N_=1 a=50 b=99 c=string
50 99 string
Note the third PUT statement has a "do
nothing" pointer control, and prints just the variable values.
There are many useful possibilities here
especially if you also want to treat the _character_ and
_numeric_ keywords as variable lists in the same way as
_all_. For example, you could give all the character variables a
common format.
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
If you are handling SAS datasets with many variables, it can be very tedious, and error prone, to have to list them all individually. There are a number of short cuts available in SAS though to make your coding a little easier.
A variable list is a way to abbreviate references to SAS variables. There are a few different flavours.
A numbered range list is familiar to most SAS programmers. In this case a series of variables have the same name except for a numeric suffix, which must be consecutive numbers. For example:
length qtr1 qtr2 qtr3 qtr4 8; length qtr1-qtr4 8;
are both equivalent.
A name range list relies on the position of the variables in the program data vector (the order in which they defined when the Data step was compiled).
For example:
input idnum region $ qtr1-qtr4; keep region--qtr4; /* Keeps vars. from region to qtr4 */ keep idnum-numeric-qtr4; /* Keeps numeric vars. from idnum to qtr4 */
This one may be useful sometimes but there is a definite risk of mistakes using a name range list.
A name prefix list allows you to keep all variables which begin with a common prefix:
keep idnum qtr: ; tot = sum(of qtr:);
Finally, there are special SAS name lists: _numeric_ (all numeric variables), _character_ (all character variables), and _all_ (all variables defined in the Data step). For example:
keep _numeric_; /* Keeps all numeric vars. */
Using variable lists can make coding a lot easier but there is a flip side: they can also hide what’s happening and lead to unintentioned events that are hard to track down. Handle with care.
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Clearing Variable LABELs and FORMATs
Sometimes there is a need to remove all variable labels and formats from a SAS dataset - perhaps when making a copy.
A quick way to do this is to use the automatic variable lists _all_, _character_ and _numeric_.
data new; set master; format _all_ ; /* Clears formats for all variables */ format _numeric_ ; /* Clears formats for numeric variables */ /* label _all_ = ''; /* SAS does NOT accept this as valid syntax */ attrib _all_ label=''; /* Use ATTRIB statement to clear labels */ run;
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Tricks with
the colon (:) operator modifier When making character comparisons, you can use the colon (:) after the operator to compare only the first character(s) of the two operands. For example: if name =: 'S' compares just the first character of name with 'S' to be true for all names which begin with 'S'. But the fun doesn't stop there. Have a look at these examples...
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
The TYPES statement in Proc Summary (and Means) When you only need certain combinations of CLASS variables in a Proc Summary, you might do it like this: proc summary data=basedata; class vara varb varc; output out=summdata(where=(_type_ in (1,3,5)) sum=; /* i.e. combinations 0 0 1, 0 1 1, 1 0 1 */ run;
In Version 8 you can specify the required combinations with the TYPES statement: proc summary data=basedata; class vara varb varc; types varc /* _type_ = 1 */ varb * varc /* _type_ = 3 */ vara * varc /* _type_ = 5 */ ; output out=summdata sum=; run; Apart from being easier to understand what's going on, it can save significant processing and memory on big data sets because SAS doesn't waste resources calculating unwanted _type_s, as with the first example.
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
The LENGTH function is often used to establish the number of characters in a string, without trailing spaces:
string = 'four'; len = length(string); put len=;
Results in : len=4
Many people
assume therefore, that where string is null, all blanks, the
length function will return a value of 0. This is not the case! The
length of a null string is 1.
This is because the "LENGTH function returns an integer that represents the position of the right-most nonblank character in the argument. If the value of the argument is missing, LENGTH returns a value of 1." Of course, if we all read the doco carefully, we'd know this. The real problem is that SAS is not consistent. In SCL, the length function will return a value of 0 for a null string. And the macro function %length will also return a 0 for a null argument. So remember, if you're using the length function in a Data step, you can get something for nothing!
|
|||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||
|
Using TAGSORT with PROC SORT If you struggle sorting large datasets, consider using the TAGSORT option with PROC SORT.
It works best where you have long records but a relatively short sort key – the BY variables(s). The data is read once to create “tags”, consisting of the BY variable(s) and the observation numbers, which are sorted before rereading the data again to generate the sorted output.
The saving comes from reduced intermediate disk space and associated I/O. Whether this really saves elapsed time will depend on your environment and the actual shape of your data. It will probably use more CPU time.
Here is a simple test case, with the timings with and without the TAGSORT option.
1 data table; 2 length text $80; 3 retain text 'Record padding'; 4 do i=1 to 5000000; 5 j= mod(i,3); 6 output; 7 end; 8 run;
NOTE: The data set WORK.TABLE has 5000000 observations and 3 variables. NOTE: DATA statement used: real time 53.89 seconds cpu time 8.37 seconds
9 proc sort data=table out=t1; 10 by j; 11 run;
NOTE: There were 5000000 observations read from the data set WORK.TABLE. NOTE: The data set WORK.T1 has 5000000 observations and 3 variables. NOTE: PROCEDURE SORT used: real time 7:26.04 cpu time 32.32 seconds
12 proc sort data=table out=t2 tagsort; 13 by j; 14 run;
NOTE: Tagsort reads each observation of the input data set twice. NOTE: The data set WORK.T2 has 5000000 observations and 3 variables. NOTE: PROCEDURE SORT used: real time 5:57.04 cpu time 38.06 seconds
This test was run on a 1.6Ghz P4 with 512MB RAM.
Before using the TAGSORT on regular jobs, make sure you test its effectiveness first.
|
|||||||||||||||||||||||||||||||||||||
|
|