yhat / db.py Goto Github PK
View Code? Open in Web Editor NEWdb.py is an easier way to interact with your databases
License: BSD 2-Clause "Simplified" License
db.py is an easier way to interact with your databases
License: BSD 2-Clause "Simplified" License
As an enhancement it consider silencing the logging messages, e.g., "Indexing schema. This will take a second..."
I tried installing db.py and was alarmed to find it trying to upgrade my pandas version. Given how much of a pain pandas can be to install and how surprising things can break between versions, are you sure you need 0.15.0?
Having had a quick prod, it seems to work fine with 0.14.1
In query you call through to pd.io.sql.read_sql() but don't provide a way of passing in the index_col you want to use. Can this be added to the method signature?
Hello,
First of all this is an amazing project. Will try to contribute as I want to contribute to database related projects in Python :)
I wanted to know why in tests.py did you decalre db = DemoDB() as a global object. You could have done the following in setUp:
def setUp(self):
self.db = DemoDB()
setUp() is called to prepare test fixture and is used to provide initialization for the tests. Since the test.py code is still not that big, you can replace all db variables by self.db. Let me know what you think. I can do this too if you think this is a viable contribution. Also adding:
def tearDown(self):
pass
since that is for cleaning up after all tests are done.
Is there a way to specify SSL over the connection using the guide here: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html#PostgreSQL.Concepts.General.SSL
Help on method query in module db.db:
query(self, q, limit=None) method of db.db.DB instance
Query your database with a raw string.
Parameters
----------
q: str
Query string to execute
limit: int
Number of records to return
Have you considered allowing a params tuple to be passed in that would use the SQL library's SQL escaping capabilities? (db.query('SELECT * FROM table WHERE foo="foo"')
does work, but isn't very safe, even for a data exploration library)
one error in file init.py, in line 69, print "var", var, """is it necessary?""", i have changed to print ("var", var).
I am using the version 3.5.2 of python
Python 2.7.5 (default, Oct 14 2013, 11:50:38)
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from db import DB
>>> db = DB(username='my id', password='my password', hostname='internal ip', dbname='db name', dbtype="mysql")
Refreshing schema. Please wait...Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data1/program/anaconda/lib/python2.7/site-packages/db/db.py", line 794, in __init__
self.refresh_schema(exclude_system_tables)
File "/data1/program/anaconda/lib/python2.7/site-packages/db/db.py", line 1216, in refresh_schema
self.tables = TableSet([Table(self.con, self._query_templates, t, tables[t]) for t in sorted(tables.keys())])
File "/data1/program/anaconda/lib/python2.7/site-packages/db/db.py", line 261, in __init__
foreign_key = Column(con, queries_templates, foreign_table, foreign_column, col.type)
AttributeError: 'str' object has no attribute 'type'
Any ideas?, please
I get a UnicodeEncodeError with Unicode query parameter with the present 'pip' version.
>>> db.query(u"select * from words where form == '{{ word }}' limit 10", data={'word': u'læge'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/db/db.py", line 1307, in query
q = self._apply_handlebars(q, data, union)
File "/usr/local/lib/python2.7/dist-packages/db/db.py", line 1144, in _apply_handlebars
query = str(query)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe6' in position 36: ordinal not in range(128)
In the present GitHub version the error occurs in line 1232.
I have no problem if I interpolate before calling the query method, i.e., avoid self._apply_handlebars.
>>> db.query(u"select * from words where form == 'læge' limit 10")
index id form pos
0 9275 11031550 læge Noun
example / whish :
from db import DB
db=DB(dbtype="sqlite", filename=":memory:")
db.query("""DROP TABLE IF EXISTS writer;")
CREATE TABLE writer (first_name, last_name, year_of_death);
INSERT INTO writer VALUES ('William', 'Shakespeare', 1616);
INSERT INTO writer VALUES ('Bertold', 'Brecht', 1956);
SELECT * , sqlite_version() as sqlite_version from Writer order by Year_of_death;""")
It would be great to add Oracle DB connectivity, given their massive presence in business.
boto is heavy, so having it optional would be kind
Currently TabletSets and ColumnSets are represented as "tablified" PrettyTables. However when working with a database with lots of tables, or a table with lots of columns (>100) it would be nice to be able to return only a slice of the table.
An easy way to do this would be to have the option to return the TableSets or ColumnSets as pandas DataFrames.
Obviously the pandas DataFrame representation in the IPython terminal is not as pretty, but at least in IPython notebook the HTML representation of DataFrames is quite nice.
Something like:
rows = []
for col in self._columns:
rows += [{"Column":col.name, "Type":col.type, "Foreign Keys":col._str_foreign_keys(), "Reference Keys":col._str_ref_keys()}]
df = pd.DataFrame(li)
Perhaps in a _repr_df_
function.
db.py
will identify all tables across all tables, but can't query them because it only tries to access the unqualified table, without the schema prepended.
For example, if I have a schema s
and a table t
, calling db.all()
tries to execute SELECT * FROM t
instead of SELECT * FROM s.t
. Unless users have configured a postgres search_path (and have unambiguous table names) this query will fail.
Hello,
First, thanks for this amazing library.
I am trying to template complex SQL queries and would benefit from being able to register partials for use in the handlebar queries.
Is there any way to accomplish that in db.py without any string manipulation shenanigans?
I'm getting a strange set of errors when trying to establish a database connection. Initially, it gave me:
273 attr = "_" + col.name
274 #print col
--> 275 setattr(self, attr, col)
276
277 self._cur.execute(self._query_templates['system']['foreign_keys_for_table'].format(table=self.name))
AttributeError: can't set attribute
Since this is a pretty complicated database, I thought that it might help to exclude system tables, so I set that parameter in the DB() call to True. Then, when I ran that same call call successive times, it began to alternate between the above error and this followingerror (I never seemed to get the same one two times in a row, so it does appear to be strict alternation.)
275 setattr(self, attr, col)
276
--> 277 self._cur.execute(self._query_templates['system']['foreign_keys_for_table'].format(table=self.name))
278 for (column_name, foreign_table, foreign_column) in self._cur:
279 col = getattr(self, column_name)
DatabaseError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Request in queries/mssql.py is missing table_schema
One issue I run into is having databases with large numbers of tables that are user-specific and necessary architecturally. A good solution for me has been using PyCharm/JetBrains products as their database connected allows you to set a regex on which tables to include in the schema. This lets me filter out all of those needless tables and only access the core that I need.
Hi,
I'm just curious. Why do you use the template engine pybars instead of a more mainstream package such as jinja2? There are some features that you use in pybars which are not implemented elsewhere? Or the author(s) have already known pybars so they just use it?
Thanks. Great job by the way, very (very) useful.
Damien
Are you planning on supporting Presto in the future?
That would be awesome.
http://prestodb.io/docs/current/
Whenever I initialize a DB object with my database (Postgres 9.3), and then try to conduct an operation on it (say, show_tables), I get
DatabaseError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
I know that it's some setting particular to my database configuration interacting with db.py, because I can use db.py on other databases, and I can use vanilla psycopg2 on this database without trouble, but I haven't been able to track down what the issue is.
Error appears to be due to the "_" prepended to the column name if the column name is either "name" or "con". In this specific case dim_cell.name gets prepended to form "_name".
--------------+
| Table | Name | Type | Foreign Keys | Reference Keys |
+------------------+------+---------+--------------+----------------+
| dim_cell | name | varchar | | |
+------------------+------+---------+--------------+----------------+
----> db = DB(username="user", password="pw", hostname="hostname",port="5432",dbname="dbname", dbtype="postgres")
C:\Users\user1\AppData\Local\Continuum\Anaconda\lib\site-packages\db\db.py in __init__(self, username, password, hostname, port, filename, dbname, dbtype, schemas, profile, exclude_system_tables, limit, keys_per_column)
833
834 self.tables = TableSet([])
--> 835 self.refresh_schema(exclude_system_tables)
836 self.handlebars = pybars.Compiler()
837
C:\Users\user1\AppData\Local\Continuum\Anaconda\lib\site-packages\db\db.py in refresh_schema(self, exclude_system_tables)
1339 tables[table_name].append(Column(self.con, self._query_templates, table_name, column_name, data_type, self.keys_per_column))
1340
-> 1341 self.tables = TableSet([Table(self.con, self._query_templates, t, tables[t], keys_per_column=self.keys_per_column) for t in sorted(tables.keys())])
1342 sys.stderr.write("done!\n")
1343
C:\Users\user1\AppData\Local\Continuum\Anaconda\lib\site-packages\db\db.py in __init__(self, con, query_templates, name, cols, keys_per_column)
272 if attr in ("name", "con"):
273 attr = "_" + col.name
--> 274 setattr(self, attr, col)
275
276 self._cur.execute(self._query_templates['system']['foreign_keys_for_table'].format(table=self.name))
AttributeError: can't set attribute
Since we're getting a few requests of the form "Can you support database X?", it'd be really helpful to have a developer checklist of features to implement when adding support for a new database.
I can't seem to get the SQL Server driver working on OSX. I've got pyodbc working (which seems to be the most accepted way to query SQL Server from Python).
Any suggestions?
Hi there, first of all thanks for writing this awesome project.
What are you thoughts on adding a way to get db/table size to this library? I was thinking something along the lines of:
db = DB(profile='default')
db.size # Returns db size
db.tables.my_table.size # Returns table size
db.tables.my_table.rows # Returns number of rows
The ability to run EXPLAIN
statements would also be really useful.
Hi,
I would like to get the number of rows in a specific table. Is it possible to add a count
or rows
method/property to the Table
class? You can also add the __len__
special method.
Moreover, this number of rows will be added to the TableSet
repr.
Cheers,
Damien
Hi,
Thanks for the very cool implementation.
When connected to an Amazon Redshift database, my attempt to TRUNCATE a table
from db import DB
db = DB()
db.query("TRUNCATE table_name;")
obtained an error
TypeError: 'NoneType' object is not iterable
Does db.py support the TRUNCATE command?
Many analysts work in enterprises where there are some hard and fast compliance rules for storing passwords in cleartext (yes, base64 == cleartext). Adding a option to let folks use a master "encryption key" (yes, that they would have to typed in) to encode/decode the credentials will add a bit of security to this feature and prevent an audit nightmare.
it seems that at start up this module runs a high number of queries against the information schema.
making init time exceedingly high
a possible solution might be temporarily disabling the calculation of added statistics using the following
SET innodb_stats_on_metadata=0;
thanks
In a dataframe resulting from e.g. db.tables.table.all(), utf-8 values from postgresql were double-encoded (encoded as utf-8 twice).
When i later had to save my dataframe to an Excelsheet or a .csv-file, i had to do a .decode('utf-8') on all values in the dataframe, for it to be able to export after some troubleshooting.
When trying to load a saved profile, I'm getting this AttributeError:
AttributeError Traceback (most recent call last)
<ipython-input-1-3981467c7daa> in <module>()
2 import pandas as pd
3
----> 4 NARDO = DB(profile='NARDO')
5 #NARDO = DB(hostname='[REDACTED]', dbtype='mssql')
c:\Anaconda\lib\site-packages\db\db.pyc in __init__(self, username, password, hostname, port, filename, dbname, dbtype, schemas, profile, exclude_system_tables, limit, keys_per_column, driver)
833 if HAS_ODBC:
834 base_con = "Driver={driver};Server={server};Database={database};".format(
--> 835 driver=self.driver or "SQL Server",
836 server=self.hostname or "localhost",
837 database=self.dbname or ''
AttributeError: 'DB' object has no attribute 'driver'
The odd part is that, I can connect to the database no problem if I specify the hostname as if it's a new connection NARDO = DB(hostname='[REDACTED]', dbtype='mssql')
. Not sure if this is also a problem with other DB types--have only tested it on MSSQL
Call to query do not pass arguments the right way and duplicate the _apply_handlebars function
Thedb.to_redshift
feature is great, but it does not follow the pattern of DB for database credentials. It would be great if you could put in the S3 credentials once, save it, and load it on the fly. Example of what I'd expect from the package.
from db import S3
s3 = S3(credentials)
s3.save_credentials('s3_for_redshift')
s3 = S3(profile='s3_for_redshift')
...
db.to_redshift(df, s3_cred = s3)
Hi,
Suppose, for demo or test purposes, I have a in-memory SQLite database such as:
engine = create_engine('sqlite://')
if I use SQLAlchemyCan I read it from db.DB? I test:
from db import DB
db = DB(filename="sqlite://", dbtype='sqlite')
without success. Is there a plan to support it?
Thanks !
Cheers,
Damien G.
This likely happens on any SET queries that do not return a result.
Rather than type in all the tables I'd like to exclude from loading in DB(), would it be possible to just select the few tables I'd like to load the schema for? I feel like this would be a ton faster for me.
http://blog.yhathq.com/posts/dbpy-0.4-handlebars-meets-sql.html
(trying on python3.4, windows)
D:\WinPython\basedir34\build\winpython-3.4.3.amd64\python-3.4.3.amd64\lib\site-packages\db\db.py in _apply_handlebars(self, q, data, union)
1034
1035 def _apply_handlebars(self, q, data, union=True):
-> 1036 q = unicode(q)
1037 template = self.handlebars.compile(q)
1038 if isinstance(data, list):
NameError: name 'unicode' is not defined
Raises an exception that db type can't be found.
download the package today. Appreciate if anyone can help with this error.
Python 2.7.3 (default, Jun 22 2015, 19:33:41)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
from db import DemoDB
db = DemoDB()
Indexing schema. This will take a second...finished!
db.tables
Refreshing schema. Please wait...Traceback (most recent call last):
File "", line 1, in
File "/home/magnolia/python/lib/python2.7/site-packages/db.py-0.4.4-py2.7.egg/db/db.py", line 959, in tables
self.refresh_schema(self._exclude_system_tables, self._use_cache)
File "/home/magnolia/python/lib/python2.7/site-packages/db.py-0.4.4-py2.7.egg/db/db.py", line 1510, in refresh_schema
tables = self._gen_tables_from_col_tuples(col_meta)
File "/home/magnolia/python/lib/python2.7/site-packages/db.py-0.4.4-py2.7.egg/db/db.py", line 1586, in _gen_tables_from_col_tuples
for (table_schema, table_name, column_name, data_type) in cols:
ValueError: need more than 3 values to unpack
You mentioned Oracle support in the list of oracle support but I can't quite figure it out. I tried dbtype="oracle" and "Oracle". I do have the Oracle package cx_oracle
/Users/gwulfs/anaconda/lib/python2.7/site-packages/db/db.pyc in init(self, username, password, hostname, port, filename, dbname, dbtype, profile, exclude_system_tables, limit)
670
671 if self.dbtype=="postgres" or self.dbtype=="redshift":
--> 672 self.con = pg.connect(user=self.username, password=self.password,
673 host=self.hostname, port=self.port, dbname=self.dbname)
674 self.cur = self.con.cursor()
NameError: global name 'pg' is not defined
example :
is ok :
from db import DB
db=DB(dbtype="sqlite", filename=":memory:")
db.query("select sqlite_version() as sqlite_version ;")
is not ok
db.query("DROP TABLE IF EXISTS writer;")
not ok ?
db.query("CREATE TABLE writer (first_name, last_name, year_of_death);")
not ok ?
db.query("INSERT INTO writer VALUES ('William', 'Shakespeare', 1616);")
At line 17 of db.py file shouldn't we also import ColumnSet? I am getting the error I pasted in the title section.
The schema of the database I'm trying to connect to includes a table with column name 'count'. When I try to connect to it, db.py creates a Table object for that table and calls setattr(self, attr, col) for each column in the table. Since Table has a property called 'count', when it gets to that column, it raises an AttributeError. Obviously a similar problem exists with the other method and field names in Table. Other people have mentioned this problem in issue #33
I'm not sure of the best way to handle this. Table.init already checks to make sure we aren't creating a column attribute with name 'name' or 'con'. If so, it puts an underscore in front of the attribute name (this also seems to be a bug, as _con is the name of the field). This behavior isn't necessarily bad, but it does make it hard to find the column names since underscore attributes are kind of hidden in python. For example, in IPython if I have a Table named Foo with a column attribute named _bar, then typing Foo. does not give me _bar as a suggestion. Apart from looking at attribute names on the class, I don't see any way to determine the columns in a table other than db.find_table("Foo").
Perhaps a combination of putting underscores in front of conflicting column name attributes and adding a 'columns' property that would return a list of column names? That still isn't quite ideal, but it would be better than the current situation which completely prevents me from being able to use db.py, which is a shame because it seems pretty cool.
I'd be happy to submit a pull request, but as I said, I'm not sure of the best resolution.
Need some docs...
It has problem reading properties of SpatialIndex - VirtualSpatialIndex virtual table even though I can run the same SQL in the CLI.
In ipython:
In [1]: from db import DB
In [2]: db = DB(username=None, password=None, hostname='localhost', filename='/home/bernardo/Dropbox/workspace/orache/db.sqlite3', dbtype='sqlite')
Indexing schema. This will take a second...---------------------------------------------------------------------------
OperationalError Traceback (most recent call last)
<ipython-input-2-a38e0de61324> in <module>()
----> 1 db = DB(username=None, password=None, hostname='localhost', filename='/home/bernardo/Dropbox/workspace/orache/db.sqlite3', dbtype='sqlite')
/home/bernardo/envs/devpy/local/lib/python2.7/site-packages/db.py-0.3.1-py2.7.egg/db/db.py in __init__(self, username, password, hostname, port, filename, dbname, dbtype, schemas, profile, exclude_system_tables, limit)
757 self.con = sqlite.connect(self.filename)
758 self.cur = self.con.cursor()
--> 759 self._create_sqlite_metatable()
760 elif self.dbtype=="mysql":
761 if not HAS_MYSQL:
/home/bernardo/envs/devpy/local/lib/python2.7/site-packages/db.py-0.3.1-py2.7.egg/db/db.py in _create_sqlite_metatable(self)
1170 tables = [row[0] for row in self.cur.execute("select name from sqlite_master where type='table';")]
1171 for table in tables:
-> 1172 for row in self.cur.execute("pragma table_info(%s)" % table):
1173 rows_to_insert.append((table, row[1], row[2]))
1174 # find for table and column names
OperationalError: no such module: VirtualSpatialIndex
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.