demiangomez / parallel.gamit Goto Github PK

Python wrapper to parallelize GAMIT executions

License: GNU General Public License v3.0

Python 52.52% Shell 0.90% PLpgSQL 0.79% HTML 33.62% CSS 1.25% JavaScript 5.04% MATLAB 1.19% Makefile 0.10% TeX 4.57% Perl 0.02%

gps gamit parallel python-wrapper postgres rinex

parallel.gamit's People

Contributors

Stargazers

Watchers

Forkers

chiyu-chiu hello-june yxw027 gnssguo bababu bambang mfkiwl mqsanatgar nahuel zbai cdx08222028 aatwum prime-habiat pomath daf111

parallel.gamit's Issues

Add switch to pyScanArchive to perform the PPP calculations on a subset of stations

And also allow to select a date interval. The frame should be specified by hand given that there are orbit files with the wrong frame in their headers.

Add a "Check node" object to verify a node before sending jobs

When invoking a job server for parallel python, a "check node" object should handle the job creation and verification that each node in the cluster has all the necessary dependencies to run. If the node doesn't have all the necessary programs/dependencies, remove the node from the cluster and continue execution without it.

Insert an info event when a station information record is modified

Whenever a modification is performed to a station information record, it would be good to create an event that shows network, station, old and new station info record.

Update the antenna mechanical offsets

Mechanical offsets between measured height-of-instrument (HI) and the antenna reference point (ARP) for SEPPOLANT_X_MF

error handling

When it gets a new station that has a "problem" (too close to more than one station, etc.), and ends up in the data_rejected directory, it would be nice if the error message had the psql command to add the station to the database, or at least printed out the info one needed (xyz, lat/lon/ht) to add it. (the problem may also be a rename, so that command might also be useful in the error message).

It would be nice if the rejected folder had some subfolders based on why it was rejected - at least one subfolder for failed to find location after 6 tries, and one subfolder for confused with another station(s), and possibly one for "other" (for now, could break down farther as other specific problems arise).

Put a or at the end of the log files so the prompt shows up on a new line.

ppp reporting no station info

ppp reports no station info found when database says there is station info. problem arises when station info start/end is to second, or within a second of rinex start end.

convert using the pg module to using the updated pgdb module for SQL queries

database organizational change

Change in the "NetworkCode" to make it more useful for multiple applications of database.

Make "NetworkCode" meaningless to the end user. There could be a base network code, say n00, where all new sites with unique names go. Duplicate names go into networks n01, n02, etc. as such sites come into the database.

Second network code where user can organize groups of sites based on need or preference.

Example - IGN and CAP use same underlying database but can organize the interface how they want/need (IGN can have IGS, SIR, RMS, etc. networks, while CAP can have ARG, ARS, CHI, CHS, PIF, MAU, PIS, ...)

So any given site could be in multiple "networks". Need to be careful when deleting sites - if in multiple application networks, does not get touched at the nXX level. If a unique site sends a message. Or always send message about sites affiniies.

new lines at end of errors

This is an expansion of the missing new line at the end of log files issue (labeled as bug) from Dec.

Newlines are missing at the end of many of the error messages from IntegrityCheck (and probably ScanArchive and ArchiveService). It makes it hard to read and process automatically.

convert to python 3

Format and style

Run pep8 on everything.

Add RINEX count to pyIntegrityCheck

When checking for gaps, report the total number of RINEX files and the total number of missing files (between the reported start and stop dates).

pyArchiveService.py behavior

Started with ~1100 files missing from an igs continuous station (got our attention because entry in one of ? directories, but no rinex in the repository tree, no locks. Got the 1100 files by looking at data in osu archive and moving the ones not in the PG archive into data_in).

Ran pyArchiveSevice.py on these 1100 files. About half were moved to the archive (or at least disappeared from the repository directory tree. A handful ended up in data_retry_in and data_rejected, the remaining ones still in data_in (the number of files in the archive grew by 500).

There were no locked files and probably a single error in the error message file (errors_pyArchiveService.log in /Volumes/UsersDrive/Users/smalley/Working.Parallel.GAMIT/run_dir on capybara). I think each run of pyArchiveService.py generates one error.

Re-running pyArchiveService.sh started reporting 120 files (ls | wc shows ~575, and the number of files in the archive is constant), and each iteration it drops the number of files reorted by 5-7 files (a few times it dropped by more). I think it added one line to the error file each run).(see screen output).

So - files in data_in not going anywhere (from ls), but "disappearing" from processing by apArchiveService.py, not locked and no errors.

Make the pyPPPETM error "No PPP solutions" more verbose

When the problem is related to no PPP solutions due to outliers.

Add fields to event table

Instead of writing the relevant information about network-station, year, doy, etc in the EventDescription field, add independent fields (that can be set to NULL) to facilitate searching events. Also, add another event description (besides type = warn, info and error).

Create an install script

-Fetches and installs dependencies
-Add a dev flag which also installs a DB and sets the program up for testing.

Stop PG when no nodes are found

When starting a program, stop execution if there are no nodes found in dispy
Also, check what happens with multihomed computers when creating a cluster. Seems like PG is not finding nodes connected to secondary ethernet port.

Add orbit frame to gnss_data.cfg rather than reading it from the sp3 files

Some sp3 files have the wrong frame in their headers. This produces coordinates in the ppp_soln with a frame that does not correspond to the actual frame of the orbits. The frame should be declared in the gnss_data.cfg rather than being read from the sp3 files.

error message reporting in database

When looking at error statistics it would be handy if the errors had a number. Each error is "unique" in the sense that the file name, and other details are unique, so it is hard to find out the kinds of errors and how many of each kind there are. If the errors had numbers and there was an error number table one could quickly find specific types of errors.

Add a date to the PPP solution

Either in the events table or maybe add a new field to the ppp_soln table. We should save when the PPP coordinate was generated to make sure that the coordinate is updated after a metadata change.

local switch

add switch to only run on local machine, would still be parallel, but not try to go over network to other machines.

dup name handling

There are 3 sites named corr in Argentina (1 cap and 1 saga from 1993 and 1 de un proyecto LISN [it's possible it's closed).
I put rinex files that I thought were all from the cap corr site (they were in the cap folder of the osu databae) into data_in and ran ArchiveSerivce.

Two of the files were actually from the saga site. They stayed in data_in and got associated with the ??? network, but there were no messages anywhere saying there was a duplicate name for a new station. They show up with a very different lat/lon in the database, but otherwise no information.

example - bad rinex our processing, nrcan gets good soln

error message from ArchiveService run, there should be a pdf attached with NRCAN solution.

says problem is with sampling interval.

RINEX sampling interval could not be determined. The output from RinSum was:

RinSum, part of the GPS Toolkit, Ver 2.2 10/31/13, Run 2017/12/30 00:39:50

+++++++++++++ RinSum summary of Rinex obs file production/rinex/6e271835-0489-4a0f-9e9a-dde6b5be02d3/lo101310.00o +++++++++++++
Warning : Failed to read header: text 0:Unidentified label: >ANTENNA: DEL<
text 1:In record 0
text 2:In file production/rinex/6e271835-0489-4a0f-9e9a-dde6b5be02d3/lo101310.00o
text 3:Near file line 12
location 0:/Volumes/UsersDrive/moved_user/smalley/Downloads/gpstk-2.5.src/dev/ext/lib/FileHandling/RINEX3/Rinex3ObsHeader.cpp:1425
location 1:/Volumes/UsersDrive/moved_user/smalley/Downloads/gpstk-2.5.src/dev/ext/lib/FileHandling/RINEX3/Rinex3ObsHeader.cpp:1471
location 2:/Volumes/UsersDrive/moved_user/smalley/Downloads/gpstk-2.5.src/dev/ext/lib/FileHandling/FFStream.cpp:150
location 3:/Volumes/UsersDrive/moved_user/smalley/Downloads/gpstk-2.5.src/dev/ext/lib/FileHandling/FFStream.hpp:184
location 4:/Volumes/UsersDrive/moved_user/smalley/Downloads/gpstk-2.5.src/dev/ext/lib/FileHandling/FFStream.hpp:184

Header dump follows.
---------------------------------- REQUIRED ----------------------------------
Rinex Version 2.00, File type O BSERVATION DATA, System G (GPS).
Prgm: ASHTORIN, Run: 31 - MAY - 00 16:20, By:
Marker type: .
Observer : AO_, Agency:
Rec#: GN-1331, Type: TOPCON GP-R1DY, Vers:
Antenna # : BX-3191, Type :
Position (XYZ,m) : (1911998.2600, -4237323.8800, -4352388.4900).
Antenna Delta (HEN,m) : (0.0000, 0.0000, 0.0000).
Time of first obs -002/12/01 00:00:00.000 UNK
(This header is VALID)
---------------------------------- OPTIONAL ----------------------------------
Marker number :
Comments (3) :

Concatenated from 2 rinex files by RNXCAT on 20 Jul 00

-------------------------------- END OF HEADER --------------------------------
RinSum timing: processing 0.007 sec, wallclock: 0 sec.

/lo101310.00d.Z: (file moved to /Volumes/UsersDrive/repository/data_rejected/bad_rinex/2000/131)
lo101310.pdf

Multiday RINEX file handling during PPP

When a multiday RINEX file was already in the database (entered through an old version of pyScanArchive) PPP fails to process it because pyRinex by default bins it into the multiple days. Therefore, an IOError: [Errno 2] No such file or directory exception is thrown when normalizing the header.

See example:
IOError: [Errno 2] No such file or directory: 'production/rinex/8f77b22c-f0ee-4bcc-ab6d-3a4e8593c7d5/tuc12242.10o' processing: rms tuc1 2010 224 using node elvira END OF ERROR ===================

Should deal with these files correctly

divide by zero in etm

This error does not seem to prevent getting an ETM solution or a plot. It occurs once (there are 506 stations in the database, and 500 stations [not counting duplicate names] in the ppp_soln, I've not yet figured out the postgresql command to do the "distinct" using both StationCode and NetworkCode)

/Volumes/Sierra750GB/usr/local/pyconda/lib/python2.7/site-packages/numpy/linalg/linalg.py:1487: RuntimeWarning: divide by zero encountered in true_divide
return s[..., 0]/s[..., -1]
Successfully plotted bra.bomj

There are a number of sites similar to bomj (a sirgas campaign measurement followed years later by continuous operations) but the other ones do not report any errors.

There are a number of stations with less than 4 occupations and they are captured by an if statement and not processed. It would be more informative if they produced a message saying this and did not receive further processing. Here is the message all but one of them produce.

Traceback (most recent call last):
File "../classes/pyPlotETM.py", line 58, in main
json.dump(etm.todictionary(True), f, indent=4, sort_keys=False)
File "/Volumes/UsersDrive/Users/smalley/Working.Parallel.GAMIT/classes/pyPPPETM.py", line 752, in todictionary
etm['Linear'] = {'tref': self.Linear.tref, 'params': self.Linear.values.tolist()}
AttributeError: ETM instance has no attribute 'Linear'

One station with 3 "occupations"/5 days of data: 98[2],03[2],10[1], produces this error message

Error during processing of cer.ccrn
Traceback (most recent call last):
File "../classes/pyPlotETM.py", line 52, in main
etm = pyPPPETM.ETM(cnn, stn['NetworkCode'], stn['StationCode'], False)
File "/Volumes/UsersDrive/Users/smalley/Working.Parallel.GAMIT/classes/pyPPPETM.py", line 624, in init
self.Jumps = JumpsTable(cnn, NetworkCode, StationCode, ppp.t, add_antenna_jumps=self.Periodic.params)
File "/Volumes/UsersDrive/Users/smalley/Working.Parallel.GAMIT/classes/pyPPPETM.py", line 294, in init
DOP = np.diag(np.linalg.inv(np.dot(self.A.transpose(), self.A)))
File "/Volumes/Sierra750GB/usr/local/pyconda/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 513, in inv
ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
File "/Volumes/Sierra750GB/usr/local/pyconda/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 90, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
LinAlgError: Singular matrix

Campaign data does not have equipment jumps, but may have earthquake jumps with only one occupation/file after the earthquake, and that could make it fail.

But it seems that once the test for the number occupations fails, it should exit cleanly.

Start integrating some of the simpler dependencies

Add message when pyETM does not generate a model

When pyETM does not generate a model, print out a message (in the console or plot) saying why a model could not be computed.

rinex sample time probelm

ppp is having trouble with rinex files in which the sample time is not lined up on whole seconds (for 1 sec and slower sampling)
ex. rinex file

 2.11           OBSERVATION DATA    G (GPS)             RINEX VERSION / TYPE

....
UBAT - RBMC Ubatuba MARKER NAME
...
15.0000 INTERVAL
SNR is mapped to RINEX snr flag value [1-9] COMMENT
L1: 3 -> 1; 8 -> 5; 40 -> 9 COMMENT
L2: 1 -> 1; 5 -> 5; 60 -> 9 COMMENT
2006 1 2 0 0 15.1880000 GPS TIME OF FIRST OBS
END OF HEADER
06 1 2 0 0 15.1880000 0 9G02G04G26G08G29G24G09G17G07

pyArchiveService does not use cpus configuration item

When pyArchiveService starts it doesn't pull the max cpus info form gnss_data.cfg

Create an ETMs table auto-purge

Whenever there's a version change, the ETMs table should be auto-purged before running the etms generation script.

Check disk space in nodes during JobServer start

Found an error due to low disk space. Check disk space before starting PG.

Report "No data for site" after finishing GAMIT process

Report to screen (and monitor.log) if a site ended up not having data in the H file. Read log from monitor.log:

----------------------------------------------------------------
 Processing file   1 h-file ../133/hsirga.18133
 Atm models:  DryZen UFL   WetZen GP25  DryMap VMF1  WetMap VMF1  IonSrc NONE  MagFld
 No data for site MGV1    
 No data for site UYRO    
 No data for site UYSO    
 There are  41 sites in ../133/hsirga.18133
     Name      Full name
   1 BATF      BATF_GPS          60664  TRM  0.0         0.0100
   2 BAVC      BAVC_GPS          29198  TRM  0.0         0.0080
   3 BOGT      BOGT_GPS          21936  JAV  0.0         0.0610
   4 CEFE      CEFE_GPS          20514  TRM  0.0         0.0000

Get it to run on the OSC HPC

Add a bash command line to rename a RINEX file sent to the retry folder

Whenever a station match is found during pyArchiveService but the name of the RINEX files does not agree with the StationCode, add an optional command line to rename the RINEX in the log file created in the retry folder.

Sphinx documentation

Format docstrings for sphinx

Add option to plot time window of ETMs

Add a switch to plot a portion of the time series rather than the whole thing. This helps to view the last part of the TS to identify missing jumps, metadata problems, etc.

station info updating

had ~80 newly added stations that needed station info information added. Did about 30, complaining about 50 and not loading their information. Similar to pyArchiveService.py, is stuck. Number files not changing when run it again.
See screen and files in run_dir.

pyArchiveStruct.scan_archive_struct needs a better filename check

The check
if file.endswith("d.Z"):
needs to be improved to avoid problems with files named old.Z, since condition now lets this type of file pass. A regular expression should be used to guarantee that the filename has the form stnmddd.yyd.Z

Write unit tests

etm and ts/etm plotting

Peter and I were looking at the ETM and ts/etm plotting today.

We found out why most of the ones not plotting were "failing" - they had less than 4 observations - some had between 3 and 1, and at least one had 0 observations, no rinex file in the archive, and no return from postgresql in the station list).

Here is our request. Separate the ETM calculations and plotting. The ETM class should calculate the ETM and provide an object to pass around in the program or write it to disk (with time series optional as it is now). Put all zeros in the etm parameters result to signify there was no fit. This will allow the objet/file to be used to pass just the time series.

In the PlotETM class, plot the time series (from an object within the program or from a file) as the basic result, and plot the etm if the parameters indicate an ETM was found (at least one amplitude not zero), or no ETM if it was not found. Should be able to plot just the time series if ETM exists. Should also be able to plot the residuals after removing the ETM (to see if there is any structure in the residuals).

There may be some complications if one want's to do the jumps (know where they are), but probably best for just time series to plot raw time series.

(some/most?) log files need new line at end

It is very hard to read the log files when printing out a series of them as the first line of the (N+1)th log file continues without a newline as a continuation of the last line of the Nth log file

capybara:bad_rinex smalley$ cat ????/???/*log
During decimation or remove_systems (to run auto_coord), teqc returned: %sCould not find a first observation in RINEX file. Truncated file? Header follows:
2.11 OBSERVATION DATA G (GPS) RINEX VERSION / TYPE

.......
SAN JOSE DE MORRO MARKER NAME
MORR MARKER NUMBER
..........
Forced Modulo Decimation to 30 seconds COMMENT

/morr0800.94d.Z: (file moved to /Volumes/UsersDrive/repository/data_rejected/bad_rinex/1994/080)During decimation or remove_systems (to run auto_coord), teqc returned: %sCould not find a first observation in RINEX file. Truncated file? Header follows:
2.11 OBSERVATION DATA G (GPS) RINEX VERSION / TYPE
.........
PALO MARKER NAME
0200 MARKER NUMBER
..........
Forced Modulo Decimation to 30 seconds COMMENT

/palo0800.94d.Z: (file moved to /Volumes/UsersDrive/repository/data_rejected/bad_rinex/1994/080)During decimation or remove_systems (to run auto_coord), teqc returned: %sCould not find a first observation in RINEX file. Truncated file? Header follows:
2.11 OBSERVATION DATA G (GPS) RINEX VERSION / TYPEteqc 2017Jul3 20171230 06:45:19UTCPGM / RUN BY / DATE