Giter VIP home page Giter VIP logo

Comments (19)

pykello avatar pykello commented on July 17, 2024

Thanks for reporting. We will look into this.

from cstore_fdw.

pykello avatar pykello commented on July 17, 2024

We discussed this a bit internally. We are not sure if we should delete the file or not.

As @jasonmp85 suggested, if we want to delete the file on DROP TABLE, then maybe the users shouldn't have to specify a filename at all. maybe they specify a directory path on the SERVER configuration, then the extension makes file within that directory for each table they create, i.e. the table -> file mapping is completely encapsulated within the extension.

If users specify filenames directly, we'd feel a little hesitant to delete files they've specified. What if they copied a file over from another box and expect file_fdw-like semantics on DROP TABLE?

Here are some of the options I can see:

  • Delete the file unconditionally on DROP TABLE,
  • Don't delete the file, but also raise a warning that the file didn't get deleted,
  • Make filename optional. For filename NULL, automatically manage cstore files inside a sub-directory of postgres data directory. For automatically managed tables, delete the file on DROP TABLE, but for tables with explicit filenames, don't delete the file.
  • Same as 3rd option, but manage files inside a data directory specified in SERVER configuration.

I think as the first step we can go with the 2nd option, and later implement either 3rd or 4th option. What do you think?

from cstore_fdw.

btubbs avatar btubbs commented on July 17, 2024

+1 to the third option, part of which I ticketed separately at issue 16 before reading this. Lots of users would benefit from not having to think about which directories are writable by the Postgres user.

from cstore_fdw.

jberkus avatar jberkus commented on July 17, 2024

So, I just ran into this issue myself. I'll tell you that coming from the world of PostgreSQL, intuitively I expect the file to get deleted when the table gets deleted, for the simple reason that the FDW created the file, so it should delete the file. Some additional thoughts:

  1. Users should be able to create a cstore without specifying a file name. In that case, the cstore should be written to the data/base directory for the current database, and should be named schema_tablename.cstore. This also has the advantage of making the user take extra steps to mount the same cstore in two different databases at the same time, making that less likely to happen by accident.

  2. Again, when creating without a user-specified filename, cstore_fdw should accept a "tablespace" directive.

  3. If a user does CREATE FOREIGN TABLE and it links an existing cstore file, either because they specified a duplicate filename, or because they intentionally dropped a prebuilt file into place, they should get a "WARNING: cstore file {filename} already exists. Attaching that file as table {tablename}." We don't want to block users from using existing files, since there are a bunch of advantages to that, but we also don't want them attaching old files to new tables by mistake.

So, yes, I like option #3. That is, for tables whose location was automatically managed, delete them when the table is dropped. For ones where the user manually specified the filename, don't delete them automatically, but do emit a WARNING. This makes sense because manually specified tables are more likely to be ones where the user wants to do something with them outside Postgres. As a refinement to that, manually specified files should be deleted if the user does DROP FOREIGN TABLE {tablename} CASCADE.

from cstore_fdw.

jberkus avatar jberkus commented on July 17, 2024

Hmmm. Actually, I'm going to take back my wholehearted endorsement of option 3, the reason being that implied automatic action based on non-atomic fields is bad. Therefore:

We should have a new foreign table option for cstore_fdw, called "delete_file" or something similar. This should default to True or False according to Option 3.

from cstore_fdw.

pykello avatar pykello commented on July 17, 2024

@jberkus Thanks a lot for looking into this.

As an alternative to schema_tablename.cstore, I was thinking naming the automatically managed tables using the relation's file node number, similar to how files for the regular tables are named. That is, they are stored as PGDATA/base/$db_oid/$relfilenode. To be consistent with other file namess in $db_oid directory, I was thinking to use the "_footer" suffix instead of ".footer" for the footer file.

I thought this option will make the implementation a bit simpler, because we can reuse some of PostgreSQL's internal functions for getting table file paths. Using these functions can make supporting tablespaces easier.

Another option which @ozgune suggested is to create a cstore_fdw specific directory like PGDATA/base/cstore_fdw.

What are your thoughts about these two options?

from cstore_fdw.

jberkus avatar jberkus commented on July 17, 2024

Yeah, using relfilenode makes sense. However, if you're going to do that, then a cstore_fdw directory is essential. And that cstore_fdw directory should be in the database_oid directory.

from cstore_fdw.

ozgune avatar ozgune commented on July 17, 2024

@jberkus I personally like option 3 as well -- as it feels more intuitive to me.

On the implied automatic action based on non-atomic fields is bad, could you clarify a bit (the part about non-atomic fields)?

from cstore_fdw.

jberkus avatar jberkus commented on July 17, 2024

So the problem is that the two ideas (1) filename and (2) auto-delete are orthangonal. While it may not make sense to support automatically-managed files which do not auto-delete as an option, at the very least we should somehow have that information explicitly available to users in the form of an "auto_delete" FT option which tells them whether files will be deleted or not.

For that matter, I can imagine a user wanting to have manually specified table locations which DO auto-delete.

from cstore_fdw.

jberkus avatar jberkus commented on July 17, 2024

Regarding the automatic filename:

Per pgsql-hackers, there is no interest in having core Postgres manage fdw files at this time, and a strong desire to have FDW authors put such files outside the core database files, so that it will be clear to users that they are not part of replication/backup/management/etc.

As such, I am changing my recommendations on the filepath for automatically managed files, they should be:

$PGDATA/cstore_fdw/{database-oid}/{relfilenode}

This puts all cstore tables below a single folder, also making it easier for users to deal with them for backup etc. purposes.

from cstore_fdw.

pykello avatar pykello commented on July 17, 2024
  1. $PGDATA/cstore_fdw/{database-oid}/{relfilenode} sounds good to me.
  2. 'auto_delete' foreign table option also sounds good.

What should we choose as the default for auto_delete? Should this depend on whether the file is in PGDATA or not? I am still unsure whether this should be always "true", or "true" for PGDATA tables and "false" for external tables. @jberkus What do you think?

from cstore_fdw.

jberkus avatar jberkus commented on July 17, 2024

I'm thinking it should default to "true" if not specified by the user. It's the behavior which users intuitively expect; while there's reasons to not want auto-delete to happen, those are fairly specific (and advanced) use cases.

from cstore_fdw.

samay-sharma avatar samay-sharma commented on July 17, 2024

As an update on the status of this issue, we have started implementing automatically determined file names and plan to support it in the next release.

We will continue to support the filename parameter but will make it optional. If the filename is not specified then cstore_fdw would use the file name $PGDATA/cstore_fdw/{database-oid}/{relfilenode} for storing its files (as per the above discussion).

from cstore_fdw.

ozgune avatar ozgune commented on July 17, 2024

One decision we haven't yet made relates to who gets to own the cstore_fdw's files. With mongo_fdw and json_fdw, this was easy. The data file was external to the foreign table, so you could create multiple "views" on the data, but nobody really owned it.

With cstore_fdw, the foreign table also creates and therefore owns the data file. If we support two foreign tables on the same file, can either of them delete the underlying file?

I'm inclined to start simple here and follow an approach where we automatically delete the file on table drop. As we understand different use-cases more, we can then add options / dependency tracking to provide more advanced functionality.

from cstore_fdw.

jberkus avatar jberkus commented on July 17, 2024

Well, this is why I wanted auto_delete to be a user-settable option. If the user knows that they want multiple "views",they set it to false.

from cstore_fdw.

jberkus avatar jberkus commented on July 17, 2024

actually, better to call it "auto_manage".

from cstore_fdw.

ozgune avatar ozgune commented on July 17, 2024

Hey Josh, at a higher level, we're currently unsure about the ownership model due to the following.

When the user creates a cstore table, we by default will put the table's data to {relfilenode}. That feels pretty similar to regular PostgreSQL tables. If the user then wants to create a view on this data, this view will need to have the same schema as the table (the user will manually point their file path to the other table's relfilenode).

In a sense, once we start using relfilenodes / schemas, our cstore tables start feeling like PostgreSQL tables. (Once we have Alter Table, [how] do we propagate schema changes from the owner table to the other ones? Is it safer to use PostgreSQL's dependency machinery for drops?)

I'm guessing we'll very likely add options to give more flexibility to the user soon, but we'd like to wait a bit to see how specific use-cases are going to evolve.

from cstore_fdw.

pykello avatar pykello commented on July 17, 2024

This issue is addressed in #29 which just got checked into the develop branch. I hope to merge the branch into the master branch in a month.

from cstore_fdw.

samay-sharma avatar samay-sharma commented on July 17, 2024

This issue has been addressed with the cstore_fdw v1.1 release. DROP FOREIGN TABLE now automatically deletes the table and footer files.

from cstore_fdw.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.