Comments (5)
Hey, thanks for the report.
First time I'm seeing this. Do you possibly have the JSON value that is causing this?
from octosql.
It's the same 10GB file as before https://files.pushshift.io/reddit/submissions/RS_2022-08.zst.
Not sure how to get the JSON value. In the code you should be able to add some logging in the stacktrace to print it but maybe with stuff like memory errors that is more difficult?
when I try to generate a trace it only creates a 16 byte file:
unzstd --memory=2048MB --stdout ~/lb/psaw/files.pushshift.io/reddit/submissions/RS_2022-08.zst | octosql "SELECT * FROM stdin.json" -o csv --profile trace
2022/10/15 13:02:41 profile: trace enabled, trace.out
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x908f1b]
goroutine 7 [running]:
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0x0, ...}}, ...}, ...)
/home/xk/octosql/datasources/json/execution.go:196 +0xc1b
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x8, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0xc00027aa00, ...}}, ...}, ...)
/home/xk/octosql/datasources/json/execution.go:209 +0xa58
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0xc0005f9b60}, {{0x0, ...}}, ...}, ...)
/home/xk/octosql/datasources/json/execution.go:196 +0xc78
github.com/cube2222/octosql/datasources/json.glob..func1.1()
/home/xk/octosql/datasources/json/workers.go:68 +0x5b8
created by github.com/cube2222/octosql/datasources/json.glob..func1
/home/xk/octosql/datasources/json/workers.go:44 +0x45
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x908f1b]
goroutine 8 [running]:
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0x0, ...}}, ...}, ...)
/home/xk/octosql/datasources/json/execution.go:196 +0xc1b
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x8, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0xc00027aa00, ...}}, ...}, ...)
/home/xk/octosql/datasources/json/execution.go:209 +0xa58
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0xc0005f9b60}, {{0x0, ...}}, ...}, ...)
/home/xk/octosql/datasources/json/execution.go:196 +0xc78
github.com/cube2222/octosql/datasources/json.glob..func1.1()
/home/xk/octosql/datasources/json/workers.go:68 +0x5b8
created by github.com/cube2222/octosql/datasources/json.glob..func1
/home/xk/octosql/datasources/json/workers.go:44 +0x45
▓▒▓░ lb:/ (main|●2?1) [141/SIGPIPE|2]🍪 ls trace.out
Permissions Size User Date Modified Git Name
.rw-r--r--@ 16 xk 15 Oct 13:02 -N trace.out
████ lb:/ (main|●2?1) 🌍 cat trace.out
go 1.11 trace⏎
from octosql.
Interesting... when I only do the first line then I get a more clear error message:
░▒░█ lb:/ (main|●2?1) 🍞 unzstd --memory=2048MB --stdout ~/lb/psaw/files.pushshift.io/reddit/submissions/RS_2022-08.zst | head -1 | octosql "SELECT * FROM stdin.json" -o csv --profile trace
2022/10/15 13:04:57 profile: trace enabled, trace.out
panic: invalid value type to print in CSV: List
goroutine 1 [running]:
github.com/cube2222/octosql/outputs/formats.FormatCSVValue(0xc0003ac480, {0x7, 0x0, 0x0, 0x0, {0x0, 0x0}, {0x0, 0x0, 0x0}, ...})
/home/xk/octosql/outputs/formats/csv_format.go:65 +0x785
github.com/cube2222/octosql/outputs/formats.(*CSVFormatter).Write(0xc00008b6a0, {0xc0009cc000, 0x5e, 0x0?})
/home/xk/octosql/outputs/formats/csv_format.go:42 +0x12c
github.com/cube2222/octosql/outputs/eager.(*OutputPrinter).Run.func1({{0xcdd960?, 0x0?}}, {{0xc0009cc000, 0x5e, 0x5e}, 0x0, {0x0, 0x0, 0x0}})
/home/xk/octosql/outputs/eager/eager.go:42 +0x54
github.com/cube2222/octosql/datasources/json.(*DatasourceExecuting).Run(0xc000497f50, {{0xf63870?, 0xc0003bd3b0?}, 0x0?}, 0xc0001abc50, 0x0?)
/home/xk/octosql/datasources/json/execution.go:127 +0xc86
github.com/cube2222/octosql/outputs/eager.(*OutputPrinter).Run(0xc0002091c0, {{0xf63870?, 0xc0003bd3b0?}, 0x0?})
/home/xk/octosql/outputs/eager/eager.go:39 +0x1e4
github.com/cube2222/octosql/cmd.glob..func4(0x157bd60, {0xc00009eb90, 0x1, 0x5?})
/home/xk/octosql/cmd/root.go:458 +0x3b54
github.com/spf13/cobra.(*Command).execute(0x157bd60, {0xc000032250, 0x5, 0x5})
/home/xk/go/pkg/mod/github.com/spf13/[email protected]/command.go:856 +0x67c
github.com/spf13/cobra.(*Command).ExecuteC(0x157bd60)
/home/xk/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
/home/xk/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
github.com/spf13/cobra.(*Command).ExecuteContext(...)
/home/xk/go/pkg/mod/github.com/spf13/[email protected]/command.go:895
github.com/cube2222/octosql/cmd.Execute({0xf637c8?, 0xc000208dc0?})
/home/xk/octosql/cmd/root.go:471 +0x53
main.main()
/home/xk/octosql/main.go:24 +0xe8
█░█▒ lb:/ (main|●2?1) [141/SIGPIPE|0|2]🌎 unzstd --memory=2048MB --stdout ~/lb/psaw/files.pushshift.io/reddit/submissions/RS_2022-08.zst | head -1
{"all_awardings":[],"allow_live_comments":false,"archived":false,"author":"Kresley","author_created_utc":1390870929,"author_flair_background_color":null,"author_flair_css_class":null,"author_flair_richtext":[],"author_flair_template_id":null,"author_flair_text":null,"author_flair_text_color":null,"author_flair_type":"text","author_fullname":"t2_f0c4k","author_patreon_flair":false,"author_premium":false,"awarders":[],"banned_by":null,"can_gild":true,"can_mod_post":false,"category":null,"content_categories":null,"contest_mode":false,"created_utc":1659312011,"discussion_type":null,"distinguished":"moderator","domain":"self.podcasts","edited":false,"event_end":1659326400.0,"event_is_live":false,"event_start":1659312000.0,"gilded":0,"gildings":{},"hidden":false,"hide_score":false,"id":"wcciq1","is_created_from_ads_ui":false,"is_crosspostable":true,"is_meta":false,"is_original_content":false,"is_reddit_media_domain":false,"is_robot_indexable":true,"is_self":true,"is_video":false,"link_flair_background_color":"#7e53c1","link_flair_css_class":"announce","link_flair_richtext":[],"link_flair_template_id":"efc6fb7e-75ae-11ea-baed-0e1ba7588035","link_flair_text":"Announcement","link_flair_text_color":"light","link_flair_type":"text","locked":false,"media":null,"media_embed":{},"media_only":false,"name":"t3_wcciq1","no_follow":false,"num_comments":8,"num_crossposts":1,"over_18":false,"parent_whitelist_status":"all_ads","permalink":"\/r\/podcasts\/comments\/wcciq1\/have_questions_about_recent_supreme_court\/","pinned":false,"post_hint":"self","preview":{"enabled":false,"images":[{"id":"Qalc2eReon-KD6FX7LpDmEuzwjap9_yobQbqDYd2RoQ","resolutions":[{"height":108,"url":"https:\/\/external-preview.redd.it\/aY2_qQw82pmeuXTAHhzUC3BvZrwYhi9GKxP6Gw9ECQU.jpg?width=108&crop=smart&auto=webp&s=01924a24a46b6e764caf58f310678ac69a6c6c8b","width":108},{"height":216,"url":"https:\/\/external-preview.redd.it\/aY2_qQw82pmeuXTAHhzUC3BvZrwYhi9GKxP6Gw9ECQU.jpg?width=216&crop=smart&auto=webp&s=ae46ea72110dc7c060d102b496cfd457ecd6b950","width":216},{"height":320,"url":"https:\/\/external-preview.redd.it\/aY2_qQw82pmeuXTAHhzUC3BvZrwYhi9GKxP6Gw9ECQU.jpg?width=320&crop=smart&auto=webp&s=f6b853bbf5f384084a45c8dde00d5d7b97356d69","width":320}],"source":{"height":500,"url":"https:\/\/external-preview.redd.it\/aY2_qQw82pmeuXTAHhzUC3BvZrwYhi9GKxP6Gw9ECQU.jpg?auto=webp&s=dc05866f0c0b48980e43549ceb0955176f30e3c6","width":500},"variants":{}}]},"pwls":6,"quarantine":false,"removed_by":null,"removed_by_category":null,"retrieved_on":1660025808,"score":16,"secure_media":null,"secure_media_embed":{},"selftext":"We're lucky to have an attorney who is *also* a veteran podcaster kind enough to join here for a live Reddit Talk about recent US Supreme Court decisions, Justices and perhaps some other federal law decisions you may (or may not have) heard about in the news lately. He may also have some friends and guests join. \n\n\nCharles Star is an attorney in Brooklyn, the host of [Hostile Witness](https:\/\/www.soundcloud.com\/hwpod), a panelist on [ALAB Series](https:\/\/www.alabseries.com\/) and the guy who did most of the talking on the late [Mic Dicta](https:\/\/soundcloud.com\/micdicta). He tweets at [@ugarles](https:\/\/twitter.com\/Ugarles) (and has a reddit account he almost never uses). He writes about legal issues from time to time and has bylines in Vice, Slate and a bunch of places that no longer exist. (He'd probably appear on your podcast.) \n\n\n**What is Reddit Talk?**\n\nTalk allows us to host interactive sessions with podcasters and users of the community can join in and interact with the speakers in real time from either new.reddit on desktop or though (an official) Reddit app. A Talk can handle up to 30 different speakers and up to 100k listeners in the audience can tune in at the same time.\n\nYou can learn more about Reddit Talk from the user guide [here](https:\/\/reddithelp.com\/hc\/en-us\/articles\/4404349195284-What-is-Reddit-Talk-) .\n\n**Ask Away!**\n\nIf you can't make it live at that time, you may leave any questions you might have as a comment on this announcement post and we'll try to pick some to read during the Talk. A recording of the Talk should continue to be available here after it concludes.\n\nYou can tap on **Follow** on this post if you'd like to receive a reminder notification when the Talk will begin, if you'd like to come ask a question live.","send_replies":true,"spoiler":false,"stickied":false,"subreddit":"podcasts","subreddit_id":"t5_2qh1d","subreddit_name_prefixed":"r\/podcasts","subreddit_subscribers":2322456,"subreddit_type":"public","suggested_sort":null,"thumbnail":"self","thumbnail_height":null,"thumbnail_width":null,"title":"Have questions about recent Supreme Court decisions? So do we! Join us Sunday @ 8:00 p.m. EST with a lawyer\/podcaster who can help answer how we got here.","top_awarded_type":null,"total_awards_received":0,"treatment_tags":[],"upvote_ratio":0.71,"url":"https:\/\/www.reddit.com\/r\/podcasts\/comments\/wcciq1\/have_questions_about_recent_supreme_court\/","view_count":null,"whitelist_status":"all_ads","wls":6}
from octosql.
That makes sense to me. Maybe instead of panic
it should just be Error
:
Error: typecheck error: invalid type for CSV: List
It works with no problems when specifying all columns except the list and object type columns. 38 million rows in 500 seconds.
unzstd --memory=2048MB --stdout ~/lb/psaw/files.pushshift.io/reddit/submissions/RS_2022-08.zst | octosql "SELECT author_flair_css_class,allow_live_co
mments,archived,author,author_patreon_flair,author_created_utc,author_flair_background_color,author_flair_template_id,author_flair_text,author_flair_text_color,author_flair_type,author_fullnam
e,author_premium,banned_by,can_gild,can_mod_post,category,content_categories,contest_mode,created_utc,discussion_type,distinguished,domain,edited,event_end,event_is_live,event_start,gilded,hid
den,hide_score,is_reddit_media_domain,id,is_created_from_ads_ui,is_crosspostable,is_meta,is_original_content,is_robot_indexable,is_self,is_video,link_flair_background_color,link_flair_css_clas
s,link_flair_template_id,link_flair_text,link_flair_text_color,link_flair_type,locked,media,media_only,name,no_follow,num_comments,num_crossposts,over_18,parent_whitelist_status,permalink,pinn
ed,post_hint,pwls,quarantine,removed_by,removed_by_category,retrieved_on,score,secure_media,send_replies,spoiler,stickied,subreddit,subreddit_id,subreddit_name_prefixed,subreddit_subscribers,s
ubreddit_type,suggested_sort,thumbnail_height,thumbnail,thumbnail_width,title,top_awarded_type,total_awards_received,upvote_ratio,url,view_count,whitelist_status,wls FROM stdin.json" -o csv >
RS_2022-08.csv
Maybe the only thing to fix here is to bubble up the error somehow so that it does not show up as "invalid memory address or nil pointer dereference"
from octosql.
Thanks for the writeup!
I'll admit I have no idea why you got the nil dereference instead of the List panic there, but I'll look into it (and probably, as you suggested, turn it into a normal error, to avoid further similar issues).
Cheers!
from octosql.
Related Issues (20)
- Add REPL HOT 1
- Add Statistical Window Functions HOT 1
- Refactor to support Apache Arrow HOT 3
- queries using filenames that start with non-alpha characters fail HOT 4
- CSV parameter to secify separator
- Support HAVING of GROUP BY HOT 1
- Offset clause in csv query is ignored HOT 1
- File line by line HOT 4
- result of StreamJoin or OuterJoin is not equal with database HOT 4
- Four arithmetic operations between aggregation functions are not supported HOT 12
- GroupBy dimension not work, need replace with distinct HOT 3
- unsupported expression CaseExpr HOT 2
- Add charindex(String, String) function HOT 3
- Add substring_index(String, String, Int) function HOT 2
- Panic in parquet query
- Can this project be used as a Go dependency library?
- Decoding Error when Reading Parquet File - "RLE: Decoded Run-Length Block"
- Add Int64 type
- 't' and 'f' strings in csv are considered as boolean
- Complex CTE usage
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from octosql.