arpa-simc / dballe Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 6.0 17.64 MB

Fast on-disk database for meteorological observed and forecast data.

License: Other

Shell 0.13% Python 7.08% C++ 85.58% C 0.57% Perl 0.02% Fortran 3.38% Makefile 0.73% M4 1.71% Meson 0.80%

bufr database meteorology

dballe's People

Contributors

Stargazers

Watchers

Forkers

simaria nourou6 sanjaymsh mfortini dainiuxus debian-janitor

dballe's Issues

quantesono/elencamele: documentazione risultati attesi

Al momento, per una quantesono/elencamele il numero di stazioni risultanti cambia a seconda dei database:

per i database mem:, c'è una stazione per ogni combinazione di (report, lat, lon, ident)
per i database basati su sql, c'è una stazione per ogni combinazione di (lat, lon, ident)

L'ordinamento delle stazioni nel risultato al momento non è definito: i database basati su SQL ordinano per ana_id, il database mem: non mi è chiaro per cosa ordini al momento, ma mi sembra dipenda dalla query che viene fatta.

In future implementazioni di database basati su SQL, vorrei andare nella direzione di una stazione per ogni combinazione di (report, lat, lon, ident), come nel database mem:.

Per l'ordinamento, vorrei documentare che i risultati non sono ordinati.

Mi confermi che questo coincide con la realtà di come DB-All.e è usato al momento?

unused result on types.cc

(rpmbuild defaults include -Werror)

/bin/sh ../libtool  --tag=CXX   --mode=compile g++ -DHAVE_CONFIG_H -I. -I..  -DTABLE_DIR=\"/usr/share/wreport\" -I.. -I..    -I/usr/include/mysql   -Werror   -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -std=gnu++11 -c -o libdballe_la-types.lo `test -f 'types.cc' || echo './'`types.cc
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I.. -DTABLE_DIR=\"/usr/share/wreport\" -I.. -I.. -I/usr/include/mysql -Werror -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -std=gnu++11 -c types.cc  -fPIC -DPIC -o .libs/libdballe_la-types.o
types.cc: In function 'std::string dballe::{anonymous}::fmtf(const char*, ...)':
types.cc:812:27: error: ignoring return value of 'int vasprintf(char**, const char*, __va_list_tag*)', declared with attribute warn_unused_result [-Werror=unused-result]
     vasprintf( &c, f, ap );
                           ^
cc1plus: all warnings being treated as errors
make[3]: *** [libdballe_la-types.lo] Error 1
make[3]: Leaving directory `/home_local/makerpm/rpmbuild/BUILD/dballe-7.9-1/dballe'
make[2]: *** [all] Error 2

bufr2json --format=dballe do not output valid json

with bufr2json --format=dballe "[" at beginning and "]" at the end are missed as "," as separator in the middle.

dbamsg dump --json synop.bufr

Documentazione in formati diversi da LaTeX

Uno dei requisiti iniziali era che la documentazione di DB-All.e fosse in LaTeX. Questo però richiede che per far build di DB-All.e serva avere installato tutto LaTeX (e la distribuzione di latex usata per far build della roba prodotta da doxygen è piuttosto grossa), e al momento non stiamo usando formule o altre cose per cui LaTeX eccelle. Oltre a questo, per la generazione di HTML al momento stiamo usando latex2html che non è software libero.

È possibile rilassare questo requisito, e portare la documentazione a un formato piú leggero tipo markdown o restructuredText?

Raccogliere un po' di dati sull'uso di database in produzione

Con il commit 9fd88f8 ho finito di implementare il logging su stderr dell'analisi delle query che vengono fatte al database.

Se si lancia un programma che usa DB-All.e settanto la variabile di ambiente DBA_EXPLAIN=1, su stderr compariranno tutte le query fatte al database, i parametri di query corrispondenti di DB-All.e, e un'analisi della query fatta dal database.

Sarebbe possibile far girare con DBA_EXPLAIN=1 un po' di procedure significative effettivamente usate in produzione, e mandarmi il loro standard error?

Vorrei usare queste informazioni per vedere se effettivamente il database (e i suoi indici) sono stati strutturati in un modo che effettivamente corrisponde alle esigenze reali di utililzzo.

dbadb with -f return Duplicate entry for key 'id_station'

r-map/rmap#78

dbamsg dump --interpreted do not output datetime information

dbamsg dump --interpreted
#0[0] generic message with 1 contexts:

Level -,-,-,- tr -,-,- 5 vars:
001011 SHIP OR MOBILE LAND STATION IDENTIFIER(CCITTIA5): dancast78
001194 [SIM] Report mnemonic(CCITTIA5): rmap
001213 AIRBASE AIR QUALITY OBSERVING STATION CODE(CCITTIA5): conn
005001 LATITUDE (HIGH ACCURACY)(DEGREE): 44.69950
006001 LONGITUDE (HIGH ACCURACY)(DEGREE): 10.64555

mqtt2bufr -t rmap/dancast78/#|dbamsg dump
#0 BUFR message: 112 bytes, origin 200:0, category 0 255:255:0, bufr edition 4, tables 14:1, subsets 1, values: 11/11:

Subset 0:
001194 [SIM] Report mnemonic(CCITTIA5): rmap
004001 YEAR(YEAR): 2016
004002 MONTH(MONTH): 1
004003 DAY(DAY): 29
004004 HOUR(HOUR): 12
004005 MINUTE(MINUTE): 14
004006 SECOND(SECOND): 55
001011 SHIP OR MOBILE LAND STATION IDENTIFIER(CCITTIA5): dancast78
001213 AIRBASE AIR QUALITY OBSERVING STATION CODE(CCITTIA5): conn
005001 LATITUDE (HIGH ACCURACY)(DEGREE): 44.69950
006001 LONGITUDE (HIGH ACCURACY)(DEGREE): 10.64555

Python db.load raise exception when loading from urlopen()

Starting a web server from top src dir:

$ python -m SimpleHTTPServer 8000

The code sometimes raise KeyError: 'could not detect the encoding of ' or OSError: reading a 5505024-bytes record from : Illegal seek.

# test.py
import dballe
from urllib2 import urlopen
from glob import glob

db = dballe.DB.connect_from_file("/tmp/buttami.db")
db.reset()
for f in glob("extra/bufr/*.bufr"):
    r = urlopen("http://localhost:8000/{}".format(f))
    db.load(r)

Inspecting with gdb, it seems that sometimes the encoding is not dected the line int c = getc(stream); return 255 (encoding not detected) or 0 (create a AOF file):

$ gdb python
(gdb) l dballe/file.cc:98
93      if (c == EOF)
94          return create(BUFR, st.release(), close_on_exit, name);
95  
96      if (ungetc(c, stream) == EOF)
97          error_system::throwf("cannot put the first byte of %s back into the input stream", name.c_str());
98  
99      switch (c)
100     {
101         case 'B': return create(BUFR, st.release(), close_on_exit, name);
102         case 'C': return create(CREX, st.release(), close_on_exit, name);
(gdb) b dballe/file.cc:98
(gdb) r test.py
Starting program: /usr/bin/python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, dballe::File::create (stream=0x7fbc60, close_on_exit=close_on_exit@entry=true, name="") at file.cc:99
99      switch (c)
(gdb) p c
$1 = 0

exporting data with rep_memo not in archive return error

Hoto reproduce:
dbadb wipe --dsn=sqlite:/tmp/test.sqlite
[pat1@asus-pat1 ~]$ dbadb export rep_memo=mnw --dsn=sqlite:/tmp/test.sqlite
looking for repinfo corresponding to 'mnw'
[pat1@asus-pat1 ~]$ echo $?
1

In application like Borinud this stop elaboration when the expected is that no data should be returned.

Duplicate variable

In dballe .txt we have two entries for sea temperature, B22042 and B22043, with different precision and historically used in different templates. Since B22043 is the reference variable internally used by dballe when interpreting bufr, can we drop B22042 in dballe.txt to avoid confusion and leave it only in wreport tables?

Since I opened a branch for modifying dballe.txt I can add this modification to the branch together with others before making a PR.

I attach 2 ECMWF bufr files using variable B22042, if needed for testing.
ship_bufr.zip

msg_wr_export.synop-rad2.bufr test fails

Importing and re-exporting extra/bufr/synop-rad2.bufr does not seem to currently give the same file.

dbadb stations: filter by rep_memo is ignored

Togliere context_id come parametro di query

Quando si fanno query (voglioquesto, dimenticami), viene mai usato context_id?

Se non viene mai usato, vorrei dichiarare la feature ufficialmente non supportata.

problema con dbadb delete

se cerco di cancellare con "dbadb delete", ho questo errore:

dbadb delete attr_filter="B33007<50"
--dsn='mysql://localhost:3306/soglie?user=vpavan&password=80qwfwq'
cannot execute 'DELETE FROM data WHERE id IN ()':You have an error in
your SQL syntax; check the manual that corresponds to your MariaDB
server version for the right syntax to use near ')' at line 1

Che dire?
Grazie e ciao.
Andrea

dbadb import -t json ignore stdin

In branch devel:

dbadb import -t json --dsn=$MYDSN < in.json doesn't import the data.

dballe fapi: do not use stdin/stdout with filename ""

ARPA-SIMC/libsim#10

remove user and password from API

following ARPA-SIMC/libsim#15
I suggest to remove username and password from all APIs.
I can start to do this in libsim and when ready should be done in dballe too.

ordinamento in voglioquesto/dammelo

Facendo review del codice ho notato comportamenti incoerenti nell'ordinamento dei risultati della voglioquesto/dammelo, e ho aggiunto test per verificarli. Al momento lo stato è questo:

Database memdb:

coordinate, ident, datetime, level, timerange, varcode, report

Database SQL:

ana_id, datetime, level, timerange, report, varcode

Ordinare per ana_id nei db SQL significa solo che i dati sono raggruppati per coordinate e ident, ma l'ordine dei gruppi non è definito.

In teoria, l'ordinamento nel caso dei database SQL sarebbe da correggere scambiando report e varcode.

In pratica però, siccome mantenere questi ordinamenti è costoso in termini di performance, e se tutto al momento sta funzionando con ordinamenti piú scarpazzoni, vorrei ridefinire i requisiti di ordinamento con qualcosa che permetta ai software di funzionare e lasci piú libertà possibile alle implementazioni di DB-All.e.

Per esempio, possiamo distinguere raggruppamenti e ordinamenti, e dire che i dati siano raggruppati per (coordinate, ident) invece di ordinati per (coordinate, ident).

Stesso discorso per level e timerange: serve un ordinamento, o basta un raggruppamento? In caso di raggruppamento, serve raggruppare per livello e poi raggruppare per timerange tutti i dati con lo stesso livello, o basta raggruppare per combinazioni uniche di livello e timerange?

Riguardo all'ordinamento/raggruppamento per report, come viene usato? All'inizio l'idea era di avere per ogni variabile in un certo punto+istante, tutti i suoi valori report per report, ma visto che l'implementazione per i database SQL al momento rompe questo vincolo, mi chiedo: questo vincolo serve, o potrei invece, per esempio, raggruppare tutto per (report, coordinate, ident) invece che per (coordinate, ident)?

Quanto è possibile rilassare tutto questo? Possiamo decidere che i dati vengano restituiti in ordine sparso? Qual è il minimo requisito di raggruppamento / ordinamento necessario a far funzionare i programmi che usano DB-All.e?

Rimuovere implementazione e test di context_id come parametro di query per i dati

Togliere core::Query::data_id e in cascata tutto quello che lo usa, e i test relativi: non vengono usati (#17)

fortran API: error: cannot insert attributes for variable 000000: no data id given or found from last prendilo()

From libsim examples I get this trace:

 // ** Execution begins **
 auto_ptr<DB> db0(DB::connect_from_url("sqlite:/tmp/dballe.sqlite"));
 DbAPI dbapi0(*db0, "write", "write", "write");
 dbapi0.scopa();
 MsgAPI msgapi1("/dev/null", "w", BUFR);
 // msgapi1 not used anymore
 M sgAPI msgapi1("/dev/null", "w", BUFR);
 // msgapi1 not used anymore
 dbapi0.unsetall();
 dbapi0.seti("lat", 4500000);
 dbapi0.seti("lon", 1000000);
 dbapi0.unset("ident");
 dbapi0.unset("mobile");
 dbapi0.setc("rep_memo", "generic");
 dbapi0.setdate(2014, 1, 6, 18, 0, 0);
 dbapi0.setlevel(105, 2000, 2147483647, 2147483647);
 dbapi0.settimerange(4, 3600, 7200);
 dbapi0.seti("B13003", 85);
 dbapi0.prendilo();
 dbapi0.setd("*B33192", 30.000000);
 dbapi0.seti("*B33193", 50);
 dbapi0.setd("*B33194", 70.000000);
 dbapi0.critica();
 dbapi0.seti("B12101", 27315);
 dbapi0.prendilo();
 dbapi0.setd("*B33192", 30.000000);
 dbapi0.seti("*B33193", 50);
 dbapi0.critica();
 // error: cannot insert attributes for variable 000000: no data id given or found from last prendilo()

Program received signal SIGSEGV, Segmentation fault

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff74d89c1 in dballe::fortran::Handler<HSimple, 50>::get (this=0x7ffff76dc860 <hsimp>, id=2147483647) at handles.h:109
109         assert(records[id].used);
(gdb) where
#0  0x00007ffff74d89c1 in dballe::fortran::Handler<HSimple, 50>::get (this=0x7ffff76dc860 <hsimp>, id=2147483647) at handles.h:109
#1  0x00007ffff74d7aa7 in idba_messages_open_input_ (handle=0x7fffffffaba0, filename=0x426c77 "example_dballe.bufr", 
    mode=0x7fffffffa640 "r", ' ' <repeats 39 times>, "\330\344p\367\377\177", 
    format=0x7fffffffa670 "BUFR", ' ' <repeats 36 times>, "\220\256\377\377\377\177", simplified=0x7fffffffa63c, filename_length=19, mode_length=40, 
    format_length=40) at binding.cc:1778
#2  0x00007ffff7a19c6e in dballe_class::dbasession_messages_open_input (session=..., filename='example_dballe.bufr', mode='r', 
    format='BUFR', ' ' <repeats 36 times>, simplified=.TRUE., _filename=19, _mode=1, _format=40) at dballe_class.F03:1132
#3  0x00007ffff79f7aa5 in dballe_class::dbasession_init (connection=..., anaflag="", dataflag="", attrflag="", filename='example_dballe.bufr', 
    mode='r', format='BUFR', template="", write=<error reading variable: Cannot access memory at address 0x0>, wipe=.TRUE., repinfo="", 
    simplified=<error reading variable: Cannot access memory at address 0x0>, memdb=.TRUE., loadfile=.FALSE., categoryappend="", _anaflag=0, 
    _dataflag=0, _attrflag=0, _filename=19, _mode=1, _format=4, _template=0, _repinfo=0, _categoryappend=0) at dballe_class.F03:3893
#4  0x000000000040ea2e in readmem () at example_dballe.F03:824
#5  0x00000000004069ab in example_dballe () at example_dballe.F03:177
#6  0x000000000042657c in main (argc=1, argv=0x7fffffffdc7c) at example_dballe.F03:3
#7  0x0000003859a21d65 in __libc_start_main (main=0x426548 <main>, argc=1, argv=0x7fffffffd838, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffd828) at libc-start.c:285
#8  0x0000000000403659 in _start ()
(gdb)

export DBALLE_TRACE_FORTRAN=tmp.log
produce un file vuoto

Empty report in dbadb import/export when --report is not set

In branch devel, when dbadb import or dbadb export are executed, the report name is forced to an empty string.

The bug was introduced in commit 9aa1e53 (see removed function dballe::cmdline::dbadb::parse_op_report).

Performance di dbadb import

Ho cambiato la struttura delle transazioni SQL in DB-All.e e ora le performance di dbadb import dovrebbero essere migliorate in maniera significativa; vi chiederei di farmi sapere se notate qualcosa, e se è migliorato, di quanto è migliorato.

Se è migliorato in maniera significativa, vorrei far si che tutto quello che sta tra una preparati e una fatto stia dentro a un'unica transazione. Questo avrebbe un paio di effetti di cui vale la pena discutere, ma direi di farlo solo dopo aver misurato quanto cambia dbadb.

msg_wr_export.gts-amdar2.bufr test fails

Importing and re-exporting extra/bufr/gts-amdar2.bufr does not seem to currently give the same file.

Precision in coordinates

In msg.cc lon and lat are written to csv files (and possibly other text formats) using setprecision(5), thus having 5 overall significant digits, e.g. 11.572, while it is necessary to have 5 significant decimal digits, e.g. 11.57293 the equivalent of %.5f C format, don't know how to do it with C++ iostream.

Reference:
http://www.cplusplus.com/reference/iomanip/setprecision/

idba_remove_all(handle) not documented in fapi

non trovo documentazione

dbamsg should use csv header on input

dbamsg convert -t csv -d bufr
do not take in account csv header for reading positional csv fields

Impossible to open stdin and stdout from Fortran bindings

In devel branch, the labels (stdin) and (stdout) were removed but from Fortran bindings is not possible to read from stdin or write to stdout (see idba_messages_open_input and idba_messages_open_output).

fortran API: cannot write block and station on bufr file

// ** Execution begins **
MsgAPI msgapi0("SMRERSOUNDLMM.2010012712.bufr", "w", BUFR);
msgapi0.unsetall();
msgapi0.setcontextana();
msgapi0.setc("rep_memo", "temp");
msgapi0.setd("lat", 45.027700);
msgapi0.setd("lon", 9.666700);
msgapi0.seti("mobile", 0);
msgapi0.seti("block", 0);
msgapi0.seti("station", 101);
msgapi0.prendilo();
// error: no year information found in message to import

msg_wr_export.gts-amdar1.bufr test fails

Importing and re-exporting extra/bufr/gts-amdar1.bufr does not seem to currently give the same file.

msg_wr_export.pilot-gts1.bufr test fails

Importing and re-exporting extra/bufr/pilot-gts1.bufr does not seem to currently give the same file.

msg_wr_export.temp-timesig18.bufr test fails

Importing and re-exporting temp-timesig18.bufr does not seem to currently give the same file.

dbamsg convert write on file named "(stdout)", not on stdout

$ ls
archive2013.csv
$ cat archive2013.csv |dbamsg convert -t csv -d bufr
$ ls -lrt
-rw-rw-r-- 1 ppatruno ppatruno 116630106 7 feb 10.25 archive2013.csv
-rw-rw-r-- 1 ppatruno ppatruno 74155868 9 feb 12.06 (stdout)

quindi scrive su un file che chiama "(stdout)"

rpm -q dballe
dballe-7.7-2.x86_64

missed information on error importing wrong json data

dbadb import -t json --wipe-first --dsn=sqlite:/dev/shm/tmp.sqlite tmp.json
Invalid JSON value

cat tmp.json
[{"ident": null, "network": "arpav", "lon": 1187637, "lat": 4649926, "date": "2016-01-01T01:00:00Z", "data": [{"vars": {"B01019": {"v": "3 Arabba" },"B07030": {"v": 1645.0},"B07031": {"v": 1645.0} } }, {"timerange": [1,0,3600],"vars": {"B13011": { "a": { }, "v": 0.0 } },"level": [ 1,null,null,null]} ] },
{"ident": null, "network": "arpav", "lon": 1187637, "lat": 4649926, "date": "2016-01-01T02:00:00Z", "data": [{"vars": {"B01019": {"v": "3 Arabba" },"B07030": {"v": 1645.0},"B07031": {"v": 1645.0} } }, {"timerange": [1,0,3600],"vars": {"B13011": { "a": { }, "v": 0.0 } },"level": [ 1,null,null,null]} ] }]

dbadb import -t json --wipe-first --dsn=sqlite:/dev/shm/tmp.sqlite tmp.json
unexpected character

cat tmp.json
{"ident": null, "network": "arpav", "lon": 1187637, "lat": 4649926, "date": "2016-01-01T01:00:00Z", "data": [{"vars": {"B01019": {"v": "3 Arabba" },"B07030": {"v": 1645.0},"B07031": {"v": 1645.0} } }, {"timerange": [1,0,3600],"vars": {"B13011": { "a": { }, "v": 0.0 } },"level": [ 1,null,null,null]} ] },
{"ident": null, "network": "arpav", "lon": 1187637, "lat": 4649926, "date": "2016-01-01T02:00:00Z", "data": [{"vars": {"B01019": {"v": "3 Arabba" },"B07030": {"v": 1645.0},"B07031": {"v": 1645.0} } }, {"timerange": [1,0,3600],"vars": {"B13011": { "a": { }, "v": 0.0 } },"level": [ 1,null,null,null]} ] }

Nuovo formato database SQL "V7"

Ho creato un nuovo formato di database sperimentale, implementato finora solo per sqlite e postgresql, che dovrebbe velocizzare un po' le cose almeno per PostgreSQL, perché cerca di fare tutto con meno query.

Di default dballe lavora sempre col formato stabile. Per provare quello nuovo esportare DBA_DB_FORMAT=V7 prima di creare un nuovo database.

Il database V7 si comporta come memdb, per cui a un ana_id corrisponde un'unica combinazione di (lat, lon, ident e rep_memo) invece di (lat, lon, ident) come nel formato attuale.

Apro questo issue per tracciare le prove che si fanno.

dbadb export return "write to temporary file was interrupted"

dbadb export --dsn=rmap lat=44.65305 lon=11.62301 month=04 year=2016 >richardson_aprile.bufr
write to temporary file was interrupted

dbadb import json: uppercase and lowercase management in network

dbadb import -t json --wipe-first --dsn=sqlite:/dev/shm/tmp.sqlite tmp.json
looking for repinfo corresponding to 'ARPAV'

with tmp.json:

 {
    "ident": null,
    "network": "ARPAV",
    "lon": 1187637,
    "lat": 4649926,
    "date": "2016-01-01T01:00:00Z",
    "data": [
        {
            "vars": {
                "B01019": {
                    "v": "3 Arabba"
                },
                "B07030": {
                    "v": 1645.0
                },
                "B07031": {
                    "v": 1645.0
                }
            }
        },
        {
            "timerange": [
                1,
                0,
                3600
            ],
            "vars": {
                "B13011": {
                    "a": {
                },
                "v": 0.0
            }
        },
        "level": [
            1,
            null,
            null,
            null
        ]
    }
]
 }

the problem is the same with DB initialized with repinfo.csv:

  1,synop,synop  ,  1,oss, 255
 50,ARPAV,ARPAV   , 50,oss, 255
 255,generic,export generici da db meteo,1000,?,255

with "network": "arpav" everything works well

interpretation of bufr like test-soil1.bufr

Would it be a hard work to implement full interpretation of a bufr message like extra/bufr/test-soil1.bufr? I mean interpret context B07061 as depth below land (leveltype 106 according to fapi_ltypes.md) and place temperature at that level. Now interpretation is limited to station data:

$ dbamsg dump --interpreted ../../../extra/bufr/test-soil1.bufr
#0[0] synop message with 1 contexts:
Level -,-,-,-  tr -,-,-  6 vars:
001001 WMO BLOCK NUMBER(NUMERIC): 11
001002 WMO STATION NUMBER(NUMERIC): 406
002001 TYPE OF STATION(CODE TABLE): 0
005001 LATITUDE (HIGH ACCURACY)(DEGREE): 50.06972
006001 LONGITUDE (HIGH ACCURACY)(DEGREE): 12.39306
007030 HEIGHT OF STATION GROUND ABOVE MEAN SEA LEVEL (SEE NOTE 3)(M): 483.0

Removing ODBC support

I would like to remove ODBC support. It is slow and buggier than direct connection methods, and currently only useful to use Oracle as a SQL database, which is untested and, as far as I understand, unneeded.

I'm opening this issue to track what is still using ODBC support in DB-All.e. If by the 18th of April 2016 there is no notice of anything using it, I will remove the relevant code.

Comportamento degli attributi alla modifica dei dati

I dati possono essere importati (dbadb import) e inseriti (prendilo).

Ricordo che si era parlato di cosa dovrebbe succedere agli attributi già esistenti in quei due casi, ma non ricordo fosse mai stato definito niente.

Al momento il comportamento è:

import: gli attributi importati si aggiungono agli attributi correnti, eventualmente sostituendo quelli che già esistono.
prendilo: gli attributi esistenti vengono mantenuti immutati.

ricordo vagamente una richiesta di questo tipo:

import: tutti gli attributi esistenti vengono rimossi, e gli unici attributi presenti dopo l'import sono quelli presenti nel messaggio importato.
prendilo: tutti gli attributi esistenti vengono rimossi.

Prima di procedere a studiare strategie piú efficienti di gestione degli attributi, vorrei rendere definitiva questa specifica del loro comportamento.

Remove old version of "provami" (files and makefile references)

It's already broken and the namespace conflicts with the working https://github.com/ARPA-SIMC/provami

PM1 in btable

La definizione di PM1 in btable è fatta per l'output di Chimere (così come implementato al SIMC ad oggi):
015203 [SIM] PM1 Concentration (tot. aerosol < 1.25 ug)
Attenzione, non è esatta. Se si vorrà usare per gli osservati (che già sono presenti per alcune stazioni di monitoraggio), se ne dovrà introdurre una diversa
xxxxxx [SIM] PM1 Concentration (tot. aerosol < 1 ug)
e magari ridefinire con maggiore accuratezza l'attuale
015203 [SIM] PM1.25 Concentration (tot. aerosol < 1.25 ug)

Performance di memdb

Ci siamo accorti che le performance di memdb in importazione, usato in libsim per importare file bufr, peggiorano fortemente e non linearmente all'aumentare della dimensione del file bufr, si arriva a 17 minuti per un file di ~1.5MB e ~7Kmessaggi. L'importazione su database "vero", es sqlite, non mostra il problema. Esempio per riprodurre:

wget ftp://ftp.smr.arpa.emr.it/incoming/dav/arpapiemonte/common20160229
for n in 10 50 100 500 1000 5000; do dbamsg cat --index=1-$n common20160229 > testbufr_$n; done
for file in testbufr_* ; do echo $file; time dbadb import --dsn=mem: $file; done

Un risultato simile in termini di prestazioni si ottiene importando con v7d_transform di libsim; un grezzo profiling al volo con perf top mostra:

Samples: 136K of event 'cycles', Event count (approx.): 57059249544
Overhead  Shared Object                       Symbol
  61,01%  libstdc++.so.6.0.19                 [.] std::_Rb_tree_increment
   7,93%  libdballe.so.7.0.3                  [.] dballe::stl::stlutils::Itersection<unsigned long>::sync_iters
   3,80%  libdballe.so.7.0.3                  [.] dballe::stl::stlutils::SequenceIters<std::_Rb_tree_const_iterator<unsigned long> >::next
   2,43%  libdballe.so.7.0.3                  [.] dballe::stl::stlutils::SequenceIters<std::_Rb_tree_const_iterator<unsigned long> >::valid
   0,99%  [vdso]                              [.] __vdso_gettimeofday
   0,90%  libdballe.so.7.0.3                  [.] dballe::stl::stlutils::SequenceIters<std::_Rb_tree_const_iterator<unsigned long> >::get

mentre export DBA_PROFILE=Y non aggiunge nulla. Stiamo facendo qualche errore procedurale, o la situazione è questa?

core_json tests fail

If I run: ./run-check -C dballe TEST_WHITELIST="core_json*"

then several tests have now started to fail:

core_json: .xxxx..

core_json.bool: value 'truefalse' is different than the expected 'false'
  core/json-test.cc:34:actual(out.str()) == "false"

core_json.int: value '1-1234567' is different than the expected '-1234567'
  core/json-test.cc:46:actual(out.str()) == "-1234567"

core_json.double: value '1.100000-1.100000' is different than the expected '-1.100000'
  core/json-test.cc:57:actual(out.str()) == "-1.100000"

core_json.string: value '"""antani"' is different than the expected '"antani"'
  core/json-test.cc:76:actual(out.str()) == "\"antani\""

4/7 tests failed

No attributes found in python 3.3.2 module

$ python3 --version
Python 3.3.2
$ python3 -c "import dballe; print(dir(dballe))"
['__doc__', '__initializing__', '__loader__', '__name__', '__package__', '__path__']

$ python3 --version
Python 3.4.3+
$ python3 -c "import dballe; print(dir(dballe))"
['Cursor', 'DB', 'Record', 'Var', 'Varinfo', 'Vartable', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_dballe', 'absolute_import', 'describe_level', 'describe_trange', 'var', 'varinfo', 'wreport']

messages_read_next(); // error: date/time informations not found (or incomplete) in message to insert

Reading bufrs from file
bufr.zip
return:

messages_read_next();
// error: date/time informations not found (or incomplete) in message to insert

with this trace:

dbapi5.messages_open_input("example_dballe.bufr", "r", BUFR, true);
ires = dbapi5.messages_read_next();
wassert(actual(ires) == 1);
dbapi5.unsetall();
dbapi5.unset("lat");
dbapi5.unset("lon");
dbapi5.unset("ident");
dbapi5.unset("mobile");
dbapi5.unset("rep_memo");
dbapi5.setc("var", "B12101");
dbapi5.unset("limit");
dbapi5.unset("priority");
dbapi5.unset("priomin");
dbapi5.unset("priomax");
dbapi5.unset("latmin");
dbapi5.unset("lonmin");
dbapi5.unset("latmax");
dbapi5.unset("lonmax");
dbapi5.unset("ana_filter");
dbapi5.unset("data_filter");
dbapi5.unset("attr_filter");
dbapi5.unset("query");
dbapi5.unset("yearmin");
dbapi5.unset("monthmin");
dbapi5.unset("daymin");
dbapi5.unset("hourmin");
dbapi5.unset("minumin");
dbapi5.unset("secmin");
dbapi5.unset("yearmax");
dbapi5.unset("monthmax");
dbapi5.unset("daymax");
dbapi5.unset("hourmax");
dbapi5.unset("minumax");
dbapi5.unset("secmax");
dbapi5.unset("varlist");
dbapi5.unset("*varlist");
ires = dbapi5.voglioquesto();
wassert(actual(ires) == 1);
sres = dbapi5.dammelo();
wassert(actual(sres) == "B12101");
ires = dbapi5.voglioquesto();
wassert(actual(ires) == 1);
sres = dbapi5.dammelo();
wassert(actual(sres) == "B12101");
MsgAPI msgapi6("/dev/null", "w", BUFR);
// msgapi6 not used anymore
MsgAPI msgapi6("/dev/null", "w", BUFR);
// msgapi6 not used anymore
dbapi5.remove_all();
ires = dbapi5.messages_read_next();
// error: date/time informations not found (or incomplete) in message to insert

dballe::core::Record.set("l1", "-") is interpreted as 0 instead of missing.

https://github.com/ARPA-SIMC/dballe/blob/master/dballe/core/record-test.cc#L187-L189

core_record.get_set_level: value '1,0,-,-' is different than the expected '1,-,-,-'
  core/record-test.cc:189:actual(rec.get_level()) == Level(1)

dbadb import: executing COMMIT:database is locked

start with this command:

dbadb import -t json --wipe-first --dsn=sqlite:/dev/shm/arpav.sqlite Dati_ARPAV_json_rmap_20151201-20151231.txt
executing COMMIT:database is locked

on an other terminal:

 provami-qt sqlite:///dev/shm/arpav.sqlite 
setNativeLocks failed: Risorsa temporaneamente non disponibile
setNativeLocks failed: Risorsa temporaneamente non disponibile
progress  "data" : new task:  "Loading data..."
progress  "summary" : new task:  "Loading summary..."
progress  "summary" : task update:  "Loading summary from db..."
progress  "data" : task update:  "Processing data..."
progress  "data" : task ends
progress  "summary" : task update:  "Processing summary..."
Refresh summary results arrived
"undefined:0: TypeError: undefined is not a function"
Summary collation started
process_summary

update stations
"undefined:1: ReferenceError: Can't find variable: set_stations"
"undefined:0: TypeError: undefined is not a function"
progress  "summary" : task ends

Is this expected with sqlite ?

Add JSON format

Add an input/output JSON format, e.g.:

{
    "ident": null,
    "network": "rer",
    "lon": 915454,
    "lat": 4451485,
    "date": "2015-07-30T15:30:00Z",
    "data": [
    {
            "vars": {
                "B01019": {
                    "v": "Torriglia"
                },
                "B07030": {
                    "v": 769.0
                },
                "B07031": {
                    "v": 769.0
                }
            }
    },
        {
            "timerange": [
                1,
                0,
                3600
            ],
            "vars": {
                "B13011": {
                    "a": {
                    },
                    "v": 0.0
                }
            },
            "level": [
                1,
                null,
                null,
                null
            ]
        },
    {
            "timerange": [
                254,
                0,
                0
            ],
            "vars": {
                "B12101": {
                    "a": {

                    },
                    "v": 297.15
                },
                "B13003": {
                    "a": {
                    },
                    "v": 50
                }
            },
            "level": [
                103,
                2000,
                null,
                null
            ]
        } 
    ]
}

Make a stable CSV output

Issue reported by Massimo Bider.

The output of dbamsg dump --csv is changed in 7.2:

Some field names are changed (e.g. edition is now master_table_number)
The fields are now quoted with ""
The date has now a uncomfortable format
- e.g. "representative_time","2015-5-27 9:0:0" was date,2015-05-27 09:00:00