Giter VIP home page Giter VIP logo

parallel.gamit's People

Contributors

daf111 avatar demiangomez avatar nahuel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

parallel.gamit's Issues

Add a "Check node" object to verify a node before sending jobs

When invoking a job server for parallel python, a "check node" object should handle the job creation and verification that each node in the cluster has all the necessary dependencies to run. If the node doesn't have all the necessary programs/dependencies, remove the node from the cluster and continue execution without it.

error handling

When it gets a new station that has a "problem" (too close to more than one station, etc.), and ends up in the data_rejected directory, it would be nice if the error message had the psql command to add the station to the database, or at least printed out the info one needed (xyz, lat/lon/ht) to add it. (the problem may also be a rename, so that command might also be useful in the error message).

It would be nice if the rejected folder had some subfolders based on why it was rejected - at least one subfolder for failed to find location after 6 tries, and one subfolder for confused with another station(s), and possibly one for "other" (for now, could break down farther as other specific problems arise).

Put a or at the end of the log files so the prompt shows up on a new line.

ppp reporting no station info

ppp reports no station info found when database says there is station info. problem arises when station info start/end is to second, or within a second of rinex start end.

database organizational change

Change in the "NetworkCode" to make it more useful for multiple applications of database.

Make "NetworkCode" meaningless to the end user. There could be a base network code, say n00, where all new sites with unique names go. Duplicate names go into networks n01, n02, etc. as such sites come into the database.

Second network code where user can organize groups of sites based on need or preference.

Example - IGN and CAP use same underlying database but can organize the interface how they want/need (IGN can have IGS, SIR, RMS, etc. networks, while CAP can have ARG, ARS, CHI, CHS, PIF, MAU, PIS, ...)

So any given site could be in multiple "networks". Need to be careful when deleting sites - if in multiple application networks, does not get touched at the nXX level. If a unique site sends a message. Or always send message about sites affiniies.

new lines at end of errors

This is an expansion of the missing new line at the end of log files issue (labeled as bug) from Dec.

Newlines are missing at the end of many of the error messages from IntegrityCheck (and probably ScanArchive and ArchiveService). It makes it hard to read and process automatically.

Add RINEX count to pyIntegrityCheck

When checking for gaps, report the total number of RINEX files and the total number of missing files (between the reported start and stop dates).

pyArchiveService.py behavior

Started with ~1100 files missing from an igs continuous station (got our attention because entry in one of ? directories, but no rinex in the repository tree, no locks. Got the 1100 files by looking at data in osu archive and moving the ones not in the PG archive into data_in).

Ran pyArchiveSevice.py on these 1100 files. About half were moved to the archive (or at least disappeared from the repository directory tree. A handful ended up in data_retry_in and data_rejected, the remaining ones still in data_in (the number of files in the archive grew by 500).

There were no locked files and probably a single error in the error message file (errors_pyArchiveService.log in /Volumes/UsersDrive/Users/smalley/Working.Parallel.GAMIT/run_dir on capybara). I think each run of pyArchiveService.py generates one error.

Re-running pyArchiveService.sh started reporting 120 files (ls | wc shows ~575, and the number of files in the archive is constant), and each iteration it drops the number of files reorted by 5-7 files (a few times it dropped by more). I think it added one line to the error file each run).(see screen output).

So - files in data_in not going anywhere (from ls), but "disappearing" from processing by apArchiveService.py, not locked and no errors.

Add fields to event table

Instead of writing the relevant information about network-station, year, doy, etc in the EventDescription field, add independent fields (that can be set to NULL) to facilitate searching events. Also, add another event description (besides type = warn, info and error).

Create an install script

-Fetches and installs dependencies
-Add a dev flag which also installs a DB and sets the program up for testing.

Stop PG when no nodes are found

When starting a program, stop execution if there are no nodes found in dispy
Also, check what happens with multihomed computers when creating a cluster. Seems like PG is not finding nodes connected to secondary ethernet port.

error message reporting in database

When looking at error statistics it would be handy if the errors had a number. Each error is "unique" in the sense that the file name, and other details are unique, so it is hard to find out the kinds of errors and how many of each kind there are. If the errors had numbers and there was an error number table one could quickly find specific types of errors.

Add a date to the PPP solution

Either in the events table or maybe add a new field to the ppp_soln table. We should save when the PPP coordinate was generated to make sure that the coordinate is updated after a metadata change.

local switch

add switch to only run on local machine, would still be parallel, but not try to go over network to other machines.

dup name handling

There are 3 sites named corr in Argentina (1 cap and 1 saga from 1993 and 1 de un proyecto LISN [it's possible it's closed).
I put rinex files that I thought were all from the cap corr site (they were in the cap folder of the osu databae) into data_in and ran ArchiveSerivce.

Two of the files were actually from the saga site. They stayed in data_in and got associated with the ??? network, but there were no messages anywhere saying there was a duplicate name for a new station. They show up with a very different lat/lon in the database, but otherwise no information.

example - bad rinex our processing, nrcan gets good soln

error message from ArchiveService run, there should be a pdf attached with NRCAN solution.

says problem is with sampling interval.

RINEX sampling interval could not be determined. The output from RinSum was:

RinSum, part of the GPS Toolkit, Ver 2.2 10/31/13, Run 2017/12/30 00:39:50

+++++++++++++ RinSum summary of Rinex obs file production/rinex/6e271835-0489-4a0f-9e9a-dde6b5be02d3/lo101310.00o +++++++++++++
Warning : Failed to read header: text 0:Unidentified label: >ANTENNA: DEL<
text 1:In record 0
text 2:In file production/rinex/6e271835-0489-4a0f-9e9a-dde6b5be02d3/lo101310.00o
text 3:Near file line 12
location 0:/Volumes/UsersDrive/moved_user/smalley/Downloads/gpstk-2.5.src/dev/ext/lib/FileHandling/RINEX3/Rinex3ObsHeader.cpp:1425
location 1:/Volumes/UsersDrive/moved_user/smalley/Downloads/gpstk-2.5.src/dev/ext/lib/FileHandling/RINEX3/Rinex3ObsHeader.cpp:1471
location 2:/Volumes/UsersDrive/moved_user/smalley/Downloads/gpstk-2.5.src/dev/ext/lib/FileHandling/FFStream.cpp:150
location 3:/Volumes/UsersDrive/moved_user/smalley/Downloads/gpstk-2.5.src/dev/ext/lib/FileHandling/FFStream.hpp:184
location 4:/Volumes/UsersDrive/moved_user/smalley/Downloads/gpstk-2.5.src/dev/ext/lib/FileHandling/FFStream.hpp:184

Header dump follows.
---------------------------------- REQUIRED ----------------------------------
Rinex Version 2.00, File type O BSERVATION DATA, System G (GPS).
Prgm: ASHTORIN, Run: 31 - MAY - 00 16:20, By:
Marker type: .
Observer : AO_, Agency:
Rec#: GN-1331, Type: TOPCON GP-R1DY, Vers:
Antenna # : BX-3191, Type :
Position (XYZ,m) : (1911998.2600, -4237323.8800, -4352388.4900).
Antenna Delta (HEN,m) : (0.0000, 0.0000, 0.0000).
Time of first obs -002/12/01 00:00:00.000 UNK
(This header is VALID)
---------------------------------- OPTIONAL ----------------------------------
Marker number :
Comments (3) :

Concatenated from 2 rinex files by RNXCAT on 20 Jul 00

-------------------------------- END OF HEADER --------------------------------
RinSum timing: processing 0.007 sec, wallclock: 0 sec.

/lo101310.00d.Z: (file moved to /Volumes/UsersDrive/repository/data_rejected/bad_rinex/2000/131)
lo101310.pdf

Multiday RINEX file handling during PPP

When a multiday RINEX file was already in the database (entered through an old version of pyScanArchive) PPP fails to process it because pyRinex by default bins it into the multiple days. Therefore, an IOError: [Errno 2] No such file or directory exception is thrown when normalizing the header.

See example:
IOError: [Errno 2] No such file or directory: 'production/rinex/8f77b22c-f0ee-4bcc-ab6d-3a4e8593c7d5/tuc12242.10o' processing: rms tuc1 2010 224 using node elvira END OF ERROR ===================

Should deal with these files correctly

divide by zero in etm

This error does not seem to prevent getting an ETM solution or a plot. It occurs once (there are 506 stations in the database, and 500 stations [not counting duplicate names] in the ppp_soln, I've not yet figured out the postgresql command to do the "distinct" using both StationCode and NetworkCode)

/Volumes/Sierra750GB/usr/local/pyconda/lib/python2.7/site-packages/numpy/linalg/linalg.py:1487: RuntimeWarning: divide by zero encountered in true_divide
return s[..., 0]/s[..., -1]
Successfully plotted bra.bomj

bra bomj

There are a number of sites similar to bomj (a sirgas campaign measurement followed years later by continuous operations) but the other ones do not report any errors.

There are a number of stations with less than 4 occupations and they are captured by an if statement and not processed. It would be more informative if they produced a message saying this and did not receive further processing. Here is the message all but one of them produce.

Traceback (most recent call last):
File "../classes/pyPlotETM.py", line 58, in main
json.dump(etm.todictionary(True), f, indent=4, sort_keys=False)
File "/Volumes/UsersDrive/Users/smalley/Working.Parallel.GAMIT/classes/pyPPPETM.py", line 752, in todictionary
etm['Linear'] = {'tref': self.Linear.tref, 'params': self.Linear.values.tolist()}
AttributeError: ETM instance has no attribute 'Linear'

One station with 3 "occupations"/5 days of data: 98[2],03[2],10[1], produces this error message

Error during processing of cer.ccrn
Traceback (most recent call last):
File "../classes/pyPlotETM.py", line 52, in main
etm = pyPPPETM.ETM(cnn, stn['NetworkCode'], stn['StationCode'], False)
File "/Volumes/UsersDrive/Users/smalley/Working.Parallel.GAMIT/classes/pyPPPETM.py", line 624, in init
self.Jumps = JumpsTable(cnn, NetworkCode, StationCode, ppp.t, add_antenna_jumps=self.Periodic.params)
File "/Volumes/UsersDrive/Users/smalley/Working.Parallel.GAMIT/classes/pyPPPETM.py", line 294, in init
DOP = np.diag(np.linalg.inv(np.dot(self.A.transpose(), self.A)))
File "/Volumes/Sierra750GB/usr/local/pyconda/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 513, in inv
ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
File "/Volumes/Sierra750GB/usr/local/pyconda/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 90, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
LinAlgError: Singular matrix

Campaign data does not have equipment jumps, but may have earthquake jumps with only one occupation/file after the earthquake, and that could make it fail.

But it seems that once the test for the number occupations fails, it should exit cleanly.

rinex sample time probelm

ppp is having trouble with rinex files in which the sample time is not lined up on whole seconds (for 1 sec and slower sampling)
ex. rinex file

 2.11           OBSERVATION DATA    G (GPS)             RINEX VERSION / TYPE

....
UBAT - RBMC Ubatuba MARKER NAME
...
15.0000 INTERVAL
SNR is mapped to RINEX snr flag value [1-9] COMMENT
L1: 3 -> 1; 8 -> 5; 40 -> 9 COMMENT
L2: 1 -> 1; 5 -> 5; 60 -> 9 COMMENT
2006 1 2 0 0 15.1880000 GPS TIME OF FIRST OBS
END OF HEADER
06 1 2 0 0 15.1880000 0 9G02G04G26G08G29G24G09G17G07

Report "No data for site" after finishing GAMIT process

Report to screen (and monitor.log) if a site ended up not having data in the H file. Read log from monitor.log:

----------------------------------------------------------------
 Processing file   1 h-file ../133/hsirga.18133
 Atm models:  DryZen UFL   WetZen GP25  DryMap VMF1  WetMap VMF1  IonSrc NONE  MagFld
 No data for site MGV1    
 No data for site UYRO    
 No data for site UYSO    
 There are  41 sites in ../133/hsirga.18133
     Name      Full name
   1 BATF      BATF_GPS          60664  TRM  0.0         0.0100
   2 BAVC      BAVC_GPS          29198  TRM  0.0         0.0080
   3 BOGT      BOGT_GPS          21936  JAV  0.0         0.0610
   4 CEFE      CEFE_GPS          20514  TRM  0.0         0.0000

Add option to plot time window of ETMs

Add a switch to plot a portion of the time series rather than the whole thing. This helps to view the last part of the TS to identify missing jumps, metadata problems, etc.

station info updating

had ~80 newly added stations that needed station info information added. Did about 30, complaining about 50 and not loading their information. Similar to pyArchiveService.py, is stuck. Number files not changing when run it again.
See screen and files in run_dir.

etm and ts/etm plotting

Peter and I were looking at the ETM and ts/etm plotting today.

We found out why most of the ones not plotting were "failing" - they had less than 4 observations - some had between 3 and 1, and at least one had 0 observations, no rinex file in the archive, and no return from postgresql in the station list).

Here is our request. Separate the ETM calculations and plotting. The ETM class should calculate the ETM and provide an object to pass around in the program or write it to disk (with time series optional as it is now). Put all zeros in the etm parameters result to signify there was no fit. This will allow the objet/file to be used to pass just the time series.

In the PlotETM class, plot the time series (from an object within the program or from a file) as the basic result, and plot the etm if the parameters indicate an ETM was found (at least one amplitude not zero), or no ETM if it was not found. Should be able to plot just the time series if ETM exists. Should also be able to plot the residuals after removing the ETM (to see if there is any structure in the residuals).

There may be some complications if one want's to do the jumps (know where they are), but probably best for just time series to plot raw time series.

(some/most?) log files need new line at end

It is very hard to read the log files when printing out a series of them as the first line of the (N+1)th log file continues without a newline as a continuation of the last line of the Nth log file

capybara:bad_rinex smalley$ cat ????/???/*log
During decimation or remove_systems (to run auto_coord), teqc returned: %sCould not find a first observation in RINEX file. Truncated file? Header follows:
2.11 OBSERVATION DATA G (GPS) RINEX VERSION / TYPE

.......
SAN JOSE DE MORRO MARKER NAME
MORR MARKER NUMBER
..........
Forced Modulo Decimation to 30 seconds COMMENT

/morr0800.94d.Z: (file moved to /Volumes/UsersDrive/repository/data_rejected/bad_rinex/1994/080)During decimation or remove_systems (to run auto_coord), teqc returned: %sCould not find a first observation in RINEX file. Truncated file? Header follows:
2.11 OBSERVATION DATA G (GPS) RINEX VERSION / TYPE
.........
PALO MARKER NAME
0200 MARKER NUMBER
..........
Forced Modulo Decimation to 30 seconds COMMENT

/palo0800.94d.Z: (file moved to /Volumes/UsersDrive/repository/data_rejected/bad_rinex/1994/080)During decimation or remove_systems (to run auto_coord), teqc returned: %sCould not find a first observation in RINEX file. Truncated file? Header follows:
2.11 OBSERVATION DATA G (GPS) RINEX VERSION / TYPEteqc 2017Jul3 20171230 06:45:19UTCPGM / RUN BY / DATE

read o.Z files natively

I'm bringing in the CAP campaign data - the 1993 data is in rinex v1 and rnx2crx only works with v2 and above.
The current solution is to use teqc to convert it all to rinex v2 (I'm testing the results of both now)

net.allx problem

if there are stations with 4 letter codes "allx", where x is the 4th letter in the code, these are interpreted as "all" by ScanArchive - so cer.allo behaves as cer.all

wildcards in station and network codes

could be helpful to allow reg exp wildcards in the python commands for station and network codes, e.g. cer.at1[0-9] instead of having to write cer.a01 cer.at02 ... cer.at09

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.