Giter VIP home page Giter VIP logo

xport's Introduction

Xport

Python reader for SAS XPORT data transport files.

What's it for?

XPORT is the binary file format used by a bunch of United States government agencies for publishing data sets. It made a lot of sense if you were trying to read data files on your IBM mainframe back in 1988.

How do I use it?

Let's make this short and sweet:

import xport
with xport.XportReader(xport_file) as reader:
    for row in reader:
        print row

Each row will be a dict with a key for each field in the dataset. Values will be either a unicode string, a float or an int, depending on the type specified in the file for that field.

Getting file info

Once you have an XportReader object, there are a few properties and methods that will give you details about the file:

  • reader.file: the underlying Python file object (see next section).
  • reader.record_start: the position (in bytes) in the file where records start (see next section).
  • reader.record_length: the length (in bytes) of each record (see next section).
  • reader.record_count(): number of records in file. (Warning: this will seek to the end of the file to determine file length.)
  • reader.file_info and reader.member_info: dicts containing information about when and how the dataset was created.
  • reader.fields: list of fields in the dataset. Each field is a dict containing the following keys, copied from the spec:

    struct NAMESTR {
        short   ntype;              /* VARIABLE TYPE: 1=NUMERIC, 2=CHAR    */
        short   nhfun;              /* HASH OF NNAME (always 0)            */
    *   short   field_length;       /* LENGTH OF VARIABLE IN OBSERVATION   */
        short   nvar0;              /* VARNUM                              */
    *   char8   name;              /* NAME OF VARIABLE                    */
    *   char40  label;             /* LABEL OF VARIABLE                   */
    
        char8   nform;              /* NAME OF FORMAT                      */
        short   nfl;                /* FORMAT FIELD LENGTH OR 0            */
    *   short   num_decimals;       /* FORMAT NUMBER OF DECIMALS           */
        short   nfj;                /* 0=LEFT JUSTIFICATION, 1=RIGHT JUST  */
        char    nfill[2];           /* (UNUSED, FOR ALIGNMENT AND FUTURE)  */
        char8   niform;             /* NAME OF INPUT FORMAT                */
        short   nifl;               /* INFORMAT LENGTH ATTRIBUTE           */
        short   nifd;               /* INFORMAT NUMBER OF DECIMALS         */
        long    npos;               /* POSITION OF VALUE IN OBSERVATION    */
        char    rest[52];           /* remaining fields are irrelevant     */
        };

NOTE: items with stars have been renamed from the short names given in the spec. Since this is an alpha release, other items may be renamed in the future, if someone tells me what they're for.

Random access to records

If you want to access specific records, instead of iterating, you can use Python's standard file access functions and a little math.

Get 1000th record:

reader.file.seek(reader.record_start + reader.record_length * 1000, 0)
reader.next()

Get record before most recent one fetched:

reader.file.seek(-reader.record_length * 2, 1)
reader.next()

Get last record:

reader.file.seek(reader.record_start + reader.record_length * (reader.record_count() - 1), 0)
reader.next()

(In this last example, note that we can't seek from the end of the file, because there may be padding bytes. Good old fixed-width binary file formats.)

Please fix/steal this code!

I wrote this up because it seemed ridiculous that there was no easy way to read a standard government data format in most programming languages. I may have gotten things wrong. If you find a file that doesn't decode propery, send a pull request. The official spec is here. It's surprisingly straightforward for a binary file format from the 80s.

Please also feel free to use this code as a base to write your own library for your favorite programming language. Government data should be accessible, man.

xport's People

Contributors

jcushman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

xport's Issues

Possible issues when creating sas xport files from python

Possible issues when creating sas xport files from python

GitHub
https://tinyurl.com/2p8dzn7n
https://github.com/rogerjdeangelis/utl-possible-issues-when-creating-sas-xport-files-from-python

/*


/ | | | | _ _ | _ _ \ / _|| | | |
__ \ || | | | | | | | | | | | (| | | | || |
|
/_,|| || ||| || ||_,|| _, |
|___/
*/

Thre appears to be an issue with the V56 xport files created
by the Python xport package(and pyreadstat).

Platform: SAS 9.4M7 Python 3.9 and Win`0 64bit

Gregory Warnes R SASxport is rock solid and more flexible.

I suspect the same issue exits with pyreadstat. I could not import using either package.

Here is a comparison of the differences between a python xport file
and a sas created xport file with the same data

From _ Record 80 byte Card
........................XXX.....XXXXXXXX...................................XX..X
SAS 2 SAS SAS SASLIB 9.4 X64_10PR 30DEC21:09:48:27
PYTHON 2 SAS SAS SASLIB 30DEC21:09:36:25

                  ........XXXXXXX.........XXX.....XXXXXXXX...................................XX..X

SAS 6 SAS SASDATA SASDATA 9.4 X64_10PR 30DEC21:09:48:27
PYTHON 6 SAS SASDATA 30DEC21:09:36:25

The python xport file can be fixed by slugging the SAS record in python records 2 and 6.
If I subsitute the SAS card for the python card

Here is some code that corrects the Python xport file

filename pyx "c:/temp/example.xpt" lrecl=80 recfm=f;
filename pyfix "c:/temp/examplefix.xpt" lrecl=80 recfm=f;
data null;
infile pyx;
input lyn $char80.;
select(n);
when(2) substr(lyn,1,40)='SAS SAS SASLIB 9.4 X64_10PR';
when(6) substr(lyn,1,40)='SAS SASDATA SASDATA 9.4 X64_10PR';
otherwise;
end;
file pyfix;
put lyn $char80.;
run;quit;

proc fslist file=pyfix;
run;quit;

/* _ _
__ _ _ __ __ | | _ ()_
/ | \ / | | | | / __| / __| | (_| | | | | (_| | | |_| \__ \ \__ \ \__,_|_| |_|\__,_|_|\__, |___/_|___/ _ |___/ _ _ _ __| |_ __ ___ _ __ __| | _____ ___ __ _ __ _ _| |_| |__ ___ _ __ / _ | __/ _ \| \ / |/ _ \ \ /\ / / \ | _ \| | | | __| _ \ / _ | `_
| (| | | | () | |) | (| | () \ V V /| | | | | |) | || | || | | | () | | | |
_
,|| _/| ./ _,|_/ _/_/ || || | ./ _, |_|| ||_/|| ||
|| || |
_/
*/

  • create python v5 trasport file;

proc datasets lib=work kill;
run;quit;

%utlfkil(c:/temp/py_pgm.py);
%utlfkil(c:/temp/py_pgm.log);
%utlfkil(c:/temp/example.xpt);

filename ft15f001 "c:/temp/py_pgm.py";
parmcards4;
import xport.v56
import pandas as pd;
df = pd.DataFrame({
'ALPHA': ['A','B' , 'C'],
'BETA': ['x', 'y', 'z'],
})
ds = xport.Dataset(df)
with open('c:/temp/example.xpt', 'wb') as f:
xport.v56.dump(ds, f)
print(df)
;;;;
run;quit;

  • EXECUTE THE PYTHON PROGRAM;
    options noxwait noxsync;
    filename rut pipe "c:\Python39\python.exe c:/temp/py_pgm.py 2> c:/temp/py_pgm.log";
    run;quit;

data null;
file print;
infile rut;
input;
put infile;
putlog infile;
run;quit;

libname pyxpt xport "c:/temp/example.xpt";

proc contents data=xpt.all;
run;quit;

/*
Directory

Libref XPT
Engine XPORT
Physical Name c:\temp\example.xpt

Member Obs, Entries

Type or Indexes Vars Label

1 DATA 0 0
*/

proc datasets lib=work kill;
run;quit;

data pytest;
set pyxpt.sasdata;
run;quit;

proc datasets lib=work kill;
run;quit;

*ERROR: File PYXPT.SASDATA.DATA does not exist.;

filename pyfsl "c:/temp/sasxpt.xpt" lrecl=80 recfm=f;
proc fslist file=e9;
run;quit;
filename pyfsl clear;

/*
HEADER RECORDLIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000
SAS SAS SASLIB 9.4 X64_10PR 30DEC21:10:21:15
30DEC21:10:21:15
HEADER RECORD
MEMBER HEADER RECORD!!!!!!!000000000000000001600000000140
HEADER RECORDDSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000
SAS SASDATA SASDATA 9.4 X64_10PR 30DEC21:10:21:15
30DEC21:10:21:15
HEADER RECORD
NAMESTR HEADER RECORD!!!!!!!000000000200000000000000000000
� � �ALPHA
� � �BETA

HEADER RECORD*******OBS HEADER RECORD!!!!!!!000000000000000000000000000000
AxByCz
*/

/* _ _ _
__ | | __ | | _ () ___ __ _ ___
/ | |/ _ | | | | / __| / __| / __|/ ` / __|
| (
| | | (
| | | || _ \ __ \ __ \ (| _
_,||_,||_, |/|/ |/_,|/
|
_/
*/

proc datasets lib=work nodetails nolist kill;
run;quit;

  • create an equaivalent xport file using sas;

libname sasxpt xport "c:/temp/sasxpt.xpt";

data xpt.sasdata;
ALPHA='A';BETA='x';output;
ALPHA='B';BETA='y';output;
ALPHA='C';BETA='z';output;
run;quit;

proc contents data=xpt.all;
run;quit;

/*
Data Set Name XPT.SASDATA Observations .
Member Type DATA Variables 2
Engine XPORT Indexes 0
Created 12/30/2021 10:21:15 Observation Length 2
Last Modified 12/30/2021 10:21:15 Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation Default
Encoding Default

Alphabetic List of Variables and Attributes

Variable Type Len

1 ALPHA Char 1
2 BETA Char 1
*/

proc print data=sasxpt.sasdata;
run;quit;

data sastst;
set sasxpt.sasdata;
run;quit;

/*
Up to 40 obs WORK.SASTST total obs=3 30DEC2021:10:22:59

Obs ALPHA BETA

1 A x
2 B y
3 C z
*/

libname sasxpt clear;

filename e9 "c:/temp/sasxpt.xpt" lrecl=80 recfm=f;
proc fslist file=e9;
run;quit;

/* SAS Xport file
HEADER RECORDLIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000
SAS SAS SASLIB 9.4 X64_10PR 30DEC21:10:21:15
30DEC21:10:21:15
HEADER RECORD
MEMBER HEADER RECORD!!!!!!!000000000000000001600000000140
HEADER RECORDDSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000
SAS SASDATA SASDATA 9.4 X64_10PR 30DEC21:10:21:15
30DEC21:10:21:15
HEADER RECORD
NAMESTR HEADER RECORD!!!!!!!000000000200000000000000000000
� � �ALPHA
� � �BETA

HEADER RECORD*******OBS HEADER RECORD!!!!!!!000000000000000000000000000000
AxByCz
*/

/*__ _ _ _ _
/ ()_ __ _ __ _ | || |__ ___ _ __ __ ___ __ ___ _ | |
| || \ / / | _ \| | | | __| \ / _ | _ \ \ \/ / _ \ / _ | `
| |
| | |> < | |) | || | || | | | () | | | | > <| |) | () | | | |
|| |/_/_\ | .
/ _, |_|| ||_
/|| || /_/_\ ./ _/|| _|
|| |
/ ||
*/

filename sasx "c:/temp/sasxpt.xpt" lrecl=80 recfm=f;
data sasfyl;
infile sasx ;
input rec $char80.;
run;quit;

filename pyx "c:/temp/example.xpt.xpt" lrecl=80 recfm=f;
data pyfyl;
infile "c:/temp/example.xpt" lrecl=80 recfm=f;
input rec $char80.;
run;quit;

proc compare data=sasfyl compare=pyfyl outnoequal out=long outbase outcompare ;
run;quit;

TYPE OBS rec
........................XXX.....XXXXXXXX...................................XX..X
SAS 2 SAS SAS SASLIB 9.4 X64_10PR 30DEC21:09:48:27
PYTHON 2 SAS SAS SASLIB 30DEC21:09:36:25

                  ........XXXXXXX.........XXX.....XXXXXXXX...................................XX..X

SAS 6 SAS SASDATA SASDATA 9.4 X64_10PR 30DEC21:09:48:27
PYTHON 6 SAS SASDATA 30DEC21:09:36:25

filename pyx "c:/temp/example.xpt" lrecl=80 recfm=f;
filename pyfix "c:/temp/examplefix.xpt" lrecl=80 recfm=f;
data null;
infile pyx;
input lyn $char80.;
select(n);
when(2) substr(lyn,1,40)='SAS SAS SASLIB 9.4 X64_10PR';
when(6) substr(lyn,1,40)='SAS SASDATA SASDATA 9.4 X64_10PR';
otherwise;
end;
file pyfix;
put lyn $char80.;
run;quit;

proc fslist file=pyfix;
run;quit;

libname pyok xport "c:/temp/examplefix.xpt" ;

proc contents data=pyok.all;
run;quit;

data finTst;
set pyok.sasdata;
run;quit;

ASCII Flatfile Ruler & Hex
utlrulr
c:/temp/examplefix.xpt
c:\temp\delete.txt

/*__ _ _ _
/ ()_ _____ | | __ ___ __ | |
| || \ / / _ / | \ \/ / | __|
| | |> < __/ (| | > <| |
) | |_
|| |/_/__
|_,| /_/_\ ._/ _|
|
|
*/

--- Record Number --- 1 --- Record Length ---- 80

HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
44444525444542222222444545524444452544454222222233333333333333333333333333333322
8514520253F24AAAAAAAC92212908514520253F24111111100000000000000000000000000000000

--- Record Number --- 2 --- Record Length ---- 80

SAS SAS SASLIB 9.4 X64_10PR 30DEC21:09:36:25
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
54522222545222225454442232322222533533552222222222222222222222223344433333333333
3130000031300000313C92009E400000864F10020000000000000000000000003045321A09A36A25

--- Record Number --- 3 --- Record Length ---- 80

30DEC21:09:36:25
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
33444333333333332222222222222222222222222222222222222222222222222222222222222222
3045321A09A36A250000000000000000000000000000000000000000000000000000000000000000

--- Record Number --- 4 --- Record Length ---- 80

HEADER RECORD*******MEMBER HEADER RECORD!!!!!!!000000000000000001600000000140
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
44444525444542222222444445224444452544454222222233333333333333333333333333333322
8514520253F24AAAAAAAD5D252008514520253F24111111100000000000000000160000000014000

--- Record Number --- 5 --- Record Length ---- 80

HEADER RECORD*******DSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
44444525444542222222454555524444452544454222222233333333333333333333333333333322
8514520253F24AAAAAAA433204208514520253F24111111100000000000000000000000000000000

--- Record Number --- 6 --- Record Length ---- 80

SAS SASDATA SASDATA 9.4 X64_10PR 30DEC21:09:36:25
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
54522222545445425454454232322222533533552222222222222222222222223344433333333333
3130000031341410313414109E400000864F10020000000000000000000000003045321A09A36A25

--- Record Number --- 7 --- Record Length ---- 80

30DEC21:09:36:25
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
33444333333333332222222222222222222222222222222222222222222222222222222222222222
3045321A09A36A250000000000000000000000000000000000000000000000000000000000000000

--- Record Number --- 8 --- Record Length ---- 80

HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!!000000000200000000000000000000
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
44444525444542222222444455524444452544454222222233333333333333333333333333333322
8514520253F24AAAAAAAE1D534208514520253F24111111100000000020000000000000000000000

--- Record Number --- 9 --- Record Length ---- 80

........ALPHA ........
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
00000000445442222222222222222222222222222222222222222222222222220000000022222222
020001011C0810000000000000000000000000000000000000000000000000000000000000000000

--- Record Number --- 10 --- Record Length ---- 80

....................................................................BETA
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
00000000000000000000000000000000000000000000000000000000000000000000445422222222
00000000000000000000000000000000000000000000000000000000000002000102254100000000

--- Record Number --- 11 --- Record Length ---- 80

                                        ........        ....................

1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
22222222222222222222222222222222222222222222000000002222222200000000000000000000
00000000000000000000000000000000000000000000000000000000000000000001000000000000

--- Record Number --- 12 --- Record Length ---- 80

........................................
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
00000000000000000000000000000000000000002222222222222222222222222222222222222222
00000000000000000000000000000000000000000000000000000000000000000000000000000000

--- Record Number --- 13 --- Record Length ---- 80

HEADER RECORD*******OBS HEADER RECORD!!!!!!!000000000000000000000000000000
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
44444525444542222222445222224444452544454222222233333333333333333333333333333322
8514520253F24AAAAAAAF23000008514520253F24111111100000000000000000000000000000000

--- Record Number --- 14 --- Record Length ---- 80

AxByCz
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8
47474722222222222222222222222222222222222222222222222222222222222222222222222222
18293A00000000000000000000000000000000000000000000000000000000000000000000000000

Unmangled I Hope: Possible issues when creating sas xport files from python

%let pgm=utl-possible-issues-when-creating-sas-xport-files-from-python;

Possible issues when creating sas xport files from python

GitHub
https://tinyurl.com/2p8dzn7n
https://github.com/rogerjdeangelis/utl-possible-issues-when-creating-sas-xport-files-from-python

/*
 ___ _   _ _ __ ___  _ __ ___   __ _ _ __ _   _
/ __| | | | `_ ` _ \| `_ ` _ \ / _` | `__| | | |
\__ \ |_| | | | | | | | | | | | (_| | |  | |_| |
|___/\__,_|_| |_| |_|_| |_| |_|\__,_|_|   \__, |
                                          |___/
*/

Thre appears to be an issue with the V56 xport files created
by the Python xport package(and pyreadstat).

Platform: SAS 9.4M7 Python 3.9 and Win`0 64bit

Gregory Warnes R SASxport is rock solid and more flexible.

I suspect the same issue exits with pyreadstat. I could not import using either package.

Here is a comparison of the differences between a python xport file
and a sas created xport file with the same data


  From _    Record    80 byte Card
                      ........................XXX.....XXXXXXXX...................................XX..X
  SAS          2      SAS     SAS     SASLIB  9.4     X64_10PR                        30DEC21:09:48:27
  PYTHON       2      SAS     SAS     SASLIB                                          30DEC21:09:36:25


                      ........XXXXXXX.........XXX.....XXXXXXXX...................................XX..X
  SAS          6      SAS     SASDATA SASDATA 9.4     X64_10PR                        30DEC21:09:48:27
  PYTHON       6      SAS             SASDATA                                         30DEC21:09:36:25

The python xport file can be fixed by slugging the SAS record in python records 2 and 6.
If I subsitute the SAS card for the python card

Here is some code that corrects the Python xport file

filename pyx "c:/temp/example.xpt" lrecl=80 recfm=f;
filename pyfix "c:/temp/examplefix.xpt" lrecl=80 recfm=f;
data _null_;
  infile pyx;
  input lyn $char80.;
  select(_n_);
     when(2) substr(lyn,1,40)='SAS     SAS     SASLIB  9.4     X64_10PR';
     when(6) substr(lyn,1,40)='SAS     SASDATA SASDATA 9.4     X64_10PR';
     otherwise;
  end;
  file pyfix;
  put lyn $char80.;
run;quit;

proc fslist file=pyfix;
run;quit;

/*                 _           _
  __ _ _ __   __ _| |_   _ ___(_)___
 / _` | `_ \ / _` | | | | / __| / __|
| (_| | | | | (_| | | |_| \__ \ \__ \
 \__,_|_| |_|\__,_|_|\__, |___/_|___/
     _               |___/ _                                   _   _
  __| |_ __ ___  _ __   __| | _____      ___ __    _ __  _   _| |_| |__   ___  _ __
 / _` | `__/ _ \| `_ \ / _` |/ _ \ \ /\ / / `_ \  | `_ \| | | | __| `_ \ / _ \| `_ \
| (_| | | | (_) | |_) | (_| | (_) \ V  V /| | | | | |_) | |_| | |_| | | | (_) | | | |
 \__,_|_|  \___/| .__/ \__,_|\___/ \_/\_/ |_| |_| | .__/ \__, |\__|_| |_|\___/|_| |_|
                |_|                               |_|    |___/
*/

* create python v5 trasport file;

proc datasets lib=work kill;
run;quit;

%utlfkil(c:/temp/py_pgm.py);
%utlfkil(c:/temp/py_pgm.log);
%utlfkil(c:/temp/example.xpt);

filename ft15f001 "c:/temp/py_pgm.py";
parmcards4;
import xport.v56
import pandas as pd;
df = pd.DataFrame({
    'ALPHA': ['A','B' , 'C'],
    'BETA': ['x', 'y', 'z'],
})
ds = xport.Dataset(df)
with open('c:/temp/example.xpt', 'wb') as f:
    xport.v56.dump(ds, f)
print(df)
;;;;
run;quit;

* EXECUTE THE PYTHON PROGRAM;
options noxwait noxsync;
filename rut pipe  "c:\Python39\python.exe c:/temp/py_pgm.py 2> c:/temp/py_pgm.log";
run;quit;

data _null_;
  file print;
  infile rut;
  input;
  put _infile_;
  putlog _infile_;
run;quit;

libname pyxpt xport "c:/temp/example.xpt";

proc contents data=xpt._all_;
run;quit;

/*
            Directory

Libref         XPT
Engine         XPORT
Physical Name  c:\temp\example.xpt


   Member  Obs, Entries
#  Type     or Indexes   Vars  Label

1  DATA         0         0
*/

proc datasets lib=work kill;
run;quit;

data pytest;
  set pyxpt.sasdata;
run;quit;

proc datasets lib=work kill;
run;quit;

*ERROR: File PYXPT.SASDATA.DATA does not exist.;

filename pyfsl "c:/temp/sasxpt.xpt" lrecl=80 recfm=f;
proc fslist file=e9;
run;quit;
filename pyfsl clear;

/*
HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000
SAS     SAS     SASLIB  9.4     X64_10PR                        30DEC21:10:21:15
30DEC21:10:21:15
HEADER RECORD*******MEMBER  HEADER RECORD!!!!!!!000000000000000001600000000140
HEADER RECORD*******DSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000
SAS     SASDATA SASDATA 9.4     X64_10PR                        30DEC21:10:21:15
30DEC21:10:21:15
HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!!000000000200000000000000000000
 �   � �ALPHA
                                                             �   � �BETA
                                                                   �

HEADER RECORD*******OBS     HEADER RECORD!!!!!!!000000000000000000000000000000
AxByCz
*/

/*     _       _           _
  __ _| | __ _| |_   _ ___(_)___   ___  __ _ ___
 / _` | |/ _` | | | | / __| / __| / __|/ _` / __|
| (_| | | (_| | | |_| \__ \ \__ \ \__ \ (_| \__ \
 \__,_|_|\__,_|_|\__, |___/_|___/ |___/\__,_|___/
                 |___/
*/

proc datasets lib=work nodetails nolist kill;
run;quit;

* create an equaivalent xport file using sas;

libname sasxpt xport "c:/temp/sasxpt.xpt";

data xpt.sasdata;
 ALPHA='A';BETA='x';output;
 ALPHA='B';BETA='y';output;
 ALPHA='C';BETA='z';output;
run;quit;

proc contents data=xpt._all_;
run;quit;

/*
Data Set Name        XPT.SASDATA                        Observations          .
Member Type          DATA                               Variables             2
Engine               XPORT                              Indexes               0
Created              12/30/2021 10:21:15                Observation Length    2
Last Modified        12/30/2021 10:21:15                Deleted Observations  0
Protection                                              Compressed            NO
Data Set Type                                           Sorted                NO
Label
Data Representation  Default
Encoding             Default


Alphabetic List of Variables and Attributes

#    Variable    Type    Len

1    ALPHA       Char      1
2    BETA        Char      1
*/

proc print data=sasxpt.sasdata;
run;quit;

data sastst;
  set sasxpt.sasdata;
run;quit;

/*
Up to 40 obs WORK.SASTST total obs=3 30DEC2021:10:22:59

Obs    ALPHA    BETA

 1       A       x
 2       B       y
 3       C       z
*/

libname sasxpt clear;

filename e9 "c:/temp/sasxpt.xpt" lrecl=80 recfm=f;
proc fslist file=e9;
run;quit;

/* SAS Xport file
HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000
SAS     SAS     SASLIB  9.4     X64_10PR                        30DEC21:10:21:15
30DEC21:10:21:15
HEADER RECORD*******MEMBER  HEADER RECORD!!!!!!!000000000000000001600000000140
HEADER RECORD*******DSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000
SAS     SASDATA SASDATA 9.4     X64_10PR                        30DEC21:10:21:15
30DEC21:10:21:15
HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!!000000000200000000000000000000
 �   � �ALPHA
                                                             �   � �BETA
                                                                   �

HEADER RECORD*******OBS     HEADER RECORD!!!!!!!000000000000000000000000000000
AxByCz
*/

/*__ _                    _   _                                         _
 / _(_)_  __  _ __  _   _| |_| |__   ___  _ __   __  ___ __   ___  _ __| |_
| |_| \ \/ / | `_ \| | | | __| `_ \ / _ \| `_ \  \ \/ / `_ \ / _ \| `__| __|
|  _| |>  <  | |_) | |_| | |_| | | | (_) | | | |  >  <| |_) | (_) | |  | |_
|_| |_/_/\_\ | .__/ \__, |\__|_| |_|\___/|_| |_| /_/\_\ .__/ \___/|_|   \__|
             |_|    |___/                             |_|
*/

filename sasx "c:/temp/sasxpt.xpt" lrecl=80 recfm=f;
data sasfyl;
  infile sasx ;
  input rec $char80.;
run;quit;

filename pyx "c:/temp/example.xpt.xpt" lrecl=80 recfm=f;
data pyfyl;
  infile "c:/temp/example.xpt"  lrecl=80 recfm=f;
  input rec $char80.;
run;quit;

proc compare data=sasfyl compare=pyfyl outnoequal out=long  outbase outcompare ;
run;quit;



  _TYPE_     _OBS_    rec
                      ........................XXX.....XXXXXXXX...................................XX..X
  SAS          2      SAS     SAS     SASLIB  9.4     X64_10PR                        30DEC21:09:48:27
  PYTHON       2      SAS     SAS     SASLIB                                          30DEC21:09:36:25


                      ........XXXXXXX.........XXX.....XXXXXXXX...................................XX..X
  SAS          6      SAS     SASDATA SASDATA 9.4     X64_10PR                        30DEC21:09:48:27
  PYTHON       6      SAS             SASDATA                                         30DEC21:09:36:25


filename pyx "c:/temp/example.xpt" lrecl=80 recfm=f;
filename pyfix "c:/temp/examplefix.xpt" lrecl=80 recfm=f;
data _null_;
  infile pyx;
  input lyn $char80.;
  select(_n_);
     when(2) substr(lyn,1,40)='SAS     SAS     SASLIB  9.4     X64_10PR';
     when(6) substr(lyn,1,40)='SAS     SASDATA SASDATA 9.4     X64_10PR';
     otherwise;
  end;
  file pyfix;
  put lyn $char80.;
run;quit;

proc fslist file=pyfix;
run;quit;

libname pyok xport "c:/temp/examplefix.xpt" ;

proc contents data=pyok._all_;
run;quit;

data finTst;
  set pyok.sasdata;
run;quit;

/*__ _              _              _                                             
 / _(_)_  _____  __| | __  ___ __ | |_                                           
| |_| \ \/ / _ \/ _` | \ \/ / `_ \| __|                                          
|  _| |>  <  __/ (_| |  >  <| |_) | |_                                           
|_| |_/_/\_\___|\__,_| /_/\_\ .__/ \__|                                          
                            |_|                                                  
*/                                                                               
                                                                                 
ASCII Flatfile Ruler & Hex                                                       
utlrulr                                                                          
c:/temp/examplefix.xpt                                                           
c:\temp\delete.txt                                                               
                                                                                 
                                                                                 
 --- Record Number ---  1   ---  Record Length ---- 80                           
                                                                                 
HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000   
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
44444525444542222222444545524444452544454222222233333333333333333333333333333322 
8514520253F24AAAAAAAC92212908514520253F24111111100000000000000000000000000000000 
                                                                                 
                                                                                 
 --- Record Number ---  2   ---  Record Length ---- 80                           
                                                                                 
SAS     SAS     SASLIB  9.4     X64_10PR                        30DEC21:09:36:25 
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
54522222545222225454442232322222533533552222222222222222222222223344433333333333 
3130000031300000313C92009E400000864F10020000000000000000000000003045321A09A36A25 
                                                                                 
                                                                                 
 --- Record Number ---  3   ---  Record Length ---- 80                           
                                                                                 
30DEC21:09:36:25                                                                 
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
33444333333333332222222222222222222222222222222222222222222222222222222222222222 
3045321A09A36A250000000000000000000000000000000000000000000000000000000000000000 
                                                                                 
                                                                                 
 --- Record Number ---  4   ---  Record Length ---- 80                           
                                                                                 
HEADER RECORD*******MEMBER  HEADER RECORD!!!!!!!000000000000000001600000000140   
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
44444525444542222222444445224444452544454222222233333333333333333333333333333322 
8514520253F24AAAAAAAD5D252008514520253F24111111100000000000000000160000000014000 
                                                                                 
                                                                                 
 --- Record Number ---  5   ---  Record Length ---- 80                           
                                                                                 
HEADER RECORD*******DSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000   
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
44444525444542222222454555524444452544454222222233333333333333333333333333333322 
8514520253F24AAAAAAA433204208514520253F24111111100000000000000000000000000000000 
                                                                                 
                                                                                 
 --- Record Number ---  6   ---  Record Length ---- 80                           
                                                                                 
SAS     SASDATA SASDATA 9.4     X64_10PR                        30DEC21:09:36:25 
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
54522222545445425454454232322222533533552222222222222222222222223344433333333333 
3130000031341410313414109E400000864F10020000000000000000000000003045321A09A36A25 
                                                                                 
                                                                                 
 --- Record Number ---  7   ---  Record Length ---- 80                           
                                                                                 
30DEC21:09:36:25                                                                 
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
33444333333333332222222222222222222222222222222222222222222222222222222222222222 
3045321A09A36A250000000000000000000000000000000000000000000000000000000000000000 
                                                                                 
                                                                                 
 --- Record Number ---  8   ---  Record Length ---- 80                           
                                                                                 
HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!!000000000200000000000000000000   
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
44444525444542222222444455524444452544454222222233333333333333333333333333333322 
8514520253F24AAAAAAAE1D534208514520253F24111111100000000020000000000000000000000 
                                                                                 
                                                                                 
 --- Record Number ---  9   ---  Record Length ---- 80                           
                                                                                 
........ALPHA                                                   ........         
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
00000000445442222222222222222222222222222222222222222222222222220000000022222222 
020001011C0810000000000000000000000000000000000000000000000000000000000000000000 
                                                                                 
                                                                                 
 --- Record Number ---  10   ---  Record Length ---- 80                          
                                                                                 
....................................................................BETA         
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
00000000000000000000000000000000000000000000000000000000000000000000445422222222 
00000000000000000000000000000000000000000000000000000000000002000102254100000000 
                                                                                 
                                                                                 
 --- Record Number ---  11   ---  Record Length ---- 80                          
                                                                                 
                                            ........        .................... 
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
22222222222222222222222222222222222222222222000000002222222200000000000000000000 
00000000000000000000000000000000000000000000000000000000000000000001000000000000 
                                                                                 
                                                                                 
 --- Record Number ---  12   ---  Record Length ---- 80                          
                                                                                 
........................................                                         
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
00000000000000000000000000000000000000002222222222222222222222222222222222222222 
00000000000000000000000000000000000000000000000000000000000000000000000000000000 
                                                                                 
                                                                                 
 --- Record Number ---  13   ---  Record Length ---- 80                          
                                                                                 
HEADER RECORD*******OBS     HEADER RECORD!!!!!!!000000000000000000000000000000   
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
44444525444542222222445222224444452544454222222233333333333333333333333333333322 
8514520253F24AAAAAAAF23000008514520253F24111111100000000000000000000000000000000 
                                                                                 
                                                                                 
 --- Record Number ---  14   ---  Record Length ---- 80                          
                                                                                 
AxByCz                                                                           
1...5....10...15...20...25...30...35...40...45...50...55...60...65...70...75...8 
47474722222222222222222222222222222222222222222222222222222222222222222222222222 
18293A00000000000000000000000000000000000000000000000000000000000000000000000000 

num_decimals zero for non-integer data

I've run into an issue analyzing some NHANES data that might be a problem in other datasets as well. xport finds columns with num_decimals == 0 when the data is not an integer. I can't tell from the specification file if the .xpt file is in error here, but I've created a minimal example of the problem so that you can investigate further if you have time.

http://nbviewer.ipython.org/5004964/

Correction it now works with slight change in R script

proc datasets lib=work kill;                                                               
run;quit;                                                                                  
                                                                                           
%utlfkil(c:/temp/py_pgm.py);                                                               
%utlfkil(c:/temp/py_pgm.log);                                                              
%utlfkil(c:/temp/example.xpt);                                                             
                                                                                           
filename ft15f001 "c:/temp/py_pgm.py";                                                     
parmcards4;                                                                                
import xport.v56                                                                           
import pandas as pd;                                                                       
df = pd.DataFrame({                                                                        
    'ALPHA': ['A','B' , 'C'],                                                              
    'BETA': ['x', 'y', 'z'],                                                               
})                                                                                         
ds = xport.Dataset(df)                                                                     
with open('c:/temp/example.xpt', 'wb') as f:                                               
    f:xport.from_columns(ds,f)                                                             
print(df)                                                                                  
;;;;                                                                                       
run;quit;                                                                                  
                                                                                           
* EXECUTE THE PYTHON PROGRAM;                                                              
options noxwait noxsync;                                                                   
filename rut pipe  "c:\Python39\python.exe c:/temp/py_pgm.py 2> c:/temp/py_pgm.log";       
run;quit;                                                                                  
                                                                                           
data _null_;                                                                               
  file print;                                                                              
  infile rut;                                                                              
  input;                                                                                   
  put _infile_;                                                                            
  putlog _infile_;                                                                         
run;quit;                                                                                  
                                                                                           
libname pyxpt xport "c:/temp/examplefix.xpt";                                              
                                                                                           
proc contents data=pyxpt._all_;                                                            
run;quit;                                                                                  
                                                                                           
proc print data=pyxpt.sasdata;                                                             
run;quit;                                                                                  
                                                                                           
                                                                                           
    Obs    NAME       SEX    AGE    HEIGHT    WEIGHT                                       
                                                                                           
      1    Alfred      M      14     69.0      112.5                                       
      2    Alice       F      13     56.5       84.0                                       
      3    Barbara     F      13     65.3       98.0                                       
      4    Carol       F      14     62.8      102.5                                       
      5    Henry       M      14     63.5      102.5                                       
      6    James       M      12     57.3       83.0                                       
      7    Jane        F      12     59.8       84.5                                       
      8    Janet       F      15     62.5      112.5                                       
      9    Jeffrey     M      13     62.5       84.0                                       
     10    John        M      12     59.0       99.5                                       
     11    Joyce       F      11     51.3       50.5                                       
     12    Judy        F      14     64.3       90.0                                       
     13    Louise      F      12     56.3       77.0                                       
     14    Mary        F      15     66.5      112.0                                       
     15    Philip      M      16     72.0      150.0                                       
     16    Robert      M      12     64.8      128.0                                       
     17    Ronald      M      15     67.0      133.0                                       
     18    Thomas      M      11     57.5       85.0                                       
     19    William     M      15     66.5      112.0                                       

Needs graceful handling of SIGPIPE

http://en.wikipedia.org/wiki/SIGPIPE

If I try to check a small amount of output by piping to head

$ python ~/src/xport/xport/xport.py ~/data/file.xpt | head -n 5

It'll parse the file properly, print the first n lines and then give an error

Traceback (most recent call last):
  File "/Users/mike/src/xport/xport/xport.py", line 283, in <module>
    print obj
IOError: [Errno 32] Broken pipe
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr

I tried catching the IOError, checking for the Errno that indicates SIGPIPE, and then exiting. That avoids the traceback, but the with block still tries to close the file even though the file is already closed, so the second error, "close failed ..." still gets printed to stdout.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.