Comments (4)
Tested the diagnose_category function also on Linux, using R 3.6 and got the same issue.
from dlookr.
Thank you. Alain
The diagnose_category function returns a tbl_df object. This object, unlike the data.frame object, prints only a few observations on the screen.
You can query the results of all categorical variables in several ways:
> library(dlookr)
> library(nycflights13)
>
> str(flights)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 336776 obs. of 19 variables:
$ year : int 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
$ month : int 1 1 1 1 1 1 1 1 1 1 ...
$ day : int 1 1 1 1 1 1 1 1 1 1 ...
$ dep_time : int 517 533 542 544 554 554 555 557 557 558 ...
$ sched_dep_time: int 515 529 540 545 600 558 600 600 600 600 ...
$ dep_delay : num 2 4 2 -1 -6 -4 -5 -3 -3 -2 ...
$ arr_time : int 830 850 923 1004 812 740 913 709 838 753 ...
$ sched_arr_time: int 819 830 850 1022 837 728 854 723 846 745 ...
$ arr_delay : num 11 20 33 -18 -25 12 19 -14 -8 8 ...
$ carrier : chr "UA" "UA" "AA" "B6" ...
$ flight : int 1545 1714 1141 725 461 1696 507 5708 79 301 ...
$ tailnum : chr "N14228" "N24211" "N619AA" "N804JB" ...
$ origin : chr "EWR" "LGA" "JFK" "JFK" ...
$ dest : chr "IAH" "IAH" "MIA" "BQN" ...
$ air_time : num 227 227 160 183 116 150 158 53 140 138 ...
$ distance : num 1400 1416 1089 1576 762 ...
$ hour : num 5 5 5 5 6 5 6 6 6 6 ...
$ minute : num 15 29 40 45 0 58 0 0 0 0 ...
$ time_hour : POSIXct, format: "2013-01-01 05:00:00" "2013-01-01 05:00:00" ...
>
> # only 10 rows - first variable
> diagnose_category(flights)
# A tibble: 33 x 6
variables levels N freq ratio rank
1 carrier UA 336776 58665 17.4 1
2 carrier B6 336776 54635 16.2 2
3 carrier EV 336776 54173 16.1 3
4 carrier DL 336776 48110 14.3 4
5 carrier AA 336776 32729 9.72 5
6 carrier MQ 336776 26397 7.84 6
7 carrier US 336776 20536 6.10 7
8 carrier 9E 336776 18460 5.48 8
9 carrier WN 336776 12275 3.64 9
10 carrier VX 336776 5162 1.53 10
# … with 23 more rows
>
> # all rows - all variables, this tbl_df
> diagnose_category(flights) %>%
+ print(n = 40)
# A tibble: 33 x 6
variables levels N freq ratio rank
1 carrier UA 336776 58665 17.4 1
2 carrier B6 336776 54635 16.2 2
3 carrier EV 336776 54173 16.1 3
4 carrier DL 336776 48110 14.3 4
5 carrier AA 336776 32729 9.72 5
6 carrier MQ 336776 26397 7.84 6
7 carrier US 336776 20536 6.10 7
8 carrier 9E 336776 18460 5.48 8
9 carrier WN 336776 12275 3.64 9
10 carrier VX 336776 5162 1.53 10
11 tailnum NA 336776 2512 0.746 1
12 tailnum N725MQ 336776 575 0.171 2
13 tailnum N722MQ 336776 513 0.152 3
14 tailnum N723MQ 336776 507 0.151 4
15 tailnum N711MQ 336776 486 0.144 5
16 tailnum N713MQ 336776 483 0.143 6
17 tailnum N258JB 336776 427 0.127 7
18 tailnum N298JB 336776 407 0.121 8
19 tailnum N353JB 336776 404 0.120 9
20 tailnum N351JB 336776 402 0.119 10
21 origin EWR 336776 120835 35.9 1
22 origin JFK 336776 111279 33.0 2
23 origin LGA 336776 104662 31.1 3
24 dest ORD 336776 17283 5.13 1
25 dest ATL 336776 17215 5.11 2
26 dest LAX 336776 16174 4.80 3
27 dest BOS 336776 15508 4.60 4
28 dest MCO 336776 14082 4.18 5
29 dest CLT 336776 14064 4.18 6
30 dest SFO 336776 13331 3.96 7
31 dest FLL 336776 12055 3.58 8
32 dest MIA 336776 11728 3.48 9
33 dest DCA 336776 9705 2.88 10
>
> # all rows - all variables, this data.frame
> diagnose_category(flights) %>%
+ data.frame()
variables levels N freq ratio rank
1 carrier UA 336776 58665 17.4195905 1
2 carrier B6 336776 54635 16.2229494 2
3 carrier EV 336776 54173 16.0857662 3
4 carrier DL 336776 48110 14.2854598 4
5 carrier AA 336776 32729 9.7183291 5
6 carrier MQ 336776 26397 7.8381476 6
7 carrier US 336776 20536 6.0978217 7
8 carrier 9E 336776 18460 5.4813882 8
9 carrier WN 336776 12275 3.6448559 9
10 carrier VX 336776 5162 1.5327696 10
11 tailnum 336776 2512 0.7458964 1
12 tailnum N725MQ 336776 575 0.1707366 2
13 tailnum N722MQ 336776 513 0.1523268 3
14 tailnum N723MQ 336776 507 0.1505452 4
15 tailnum N711MQ 336776 486 0.1443096 5
16 tailnum N713MQ 336776 483 0.1434188 6
17 tailnum N258JB 336776 427 0.1267905 7
18 tailnum N298JB 336776 407 0.1208518 8
19 tailnum N353JB 336776 404 0.1199610 9
20 tailnum N351JB 336776 402 0.1193672 10
21 origin EWR 336776 120835 35.8799321 1
22 origin JFK 336776 111279 33.0424377 2
23 origin LGA 336776 104662 31.0776302 3
24 dest ORD 336776 17283 5.1318978 1
25 dest ATL 336776 17215 5.1117063 2
26 dest LAX 336776 16174 4.8025988 3
27 dest BOS 336776 15508 4.6048412 4
28 dest MCO 336776 14082 4.1814144 5
29 dest CLT 336776 14064 4.1760696 6
30 dest SFO 336776 13331 3.9584175 7
31 dest FLL 336776 12055 3.5795306 8
32 dest MIA 336776 11728 3.4824334 9
33 dest DCA 336776 9705 2.8817374 10
>
> # top 3 levels for each categorical variables
> diagnose_category(flights, top = 3)
# A tibble: 12 x 6
variables levels N freq ratio rank
1 carrier UA 336776 58665 17.4 1
2 carrier B6 336776 54635 16.2 2
3 carrier EV 336776 54173 16.1 3
4 tailnum NA 336776 2512 0.746 1
5 tailnum N725MQ 336776 575 0.171 2
6 tailnum N722MQ 336776 513 0.152 3
7 origin EWR 336776 120835 35.9 1
8 origin JFK 336776 111279 33.0 2
9 origin LGA 336776 104662 31.1 3
10 dest ORD 336776 17283 5.13 1
11 dest ATL 336776 17215 5.11 2
12 dest LAX 336776 16174 4.80 3
>
from dlookr.
Perfectly understood! Yet I thought the result of diagnose_category would be summarized in a similar way to that of diagnose_numeric. Obviously it has more sense the way it works now.
from dlookr.
There are differences in how aggregate categorical and numeric data is aggregated.
I will consider what additional information I should provide.
from dlookr.
Related Issues (20)
- `diagnose_category(flights)` yields error HOT 1
- errors installing on linux-ubuntu HOT 1
- dplyr::bind_rows for character and factors HOT 2
- Error occurred in binning() during CRAN test HOT 1
- Many Import, Suggests packages HOT 3
- Replace example dataset
- New data medicost HOT 1
- CRAN submit errors HOT 1
- CRAN submit error 2 HOT 1
- checking HTML version of manual ... NOTE HOT 1
- Package ‘dlookr’ was removed from the CRAN repository HOT 1
- Reduce example execution time HOT 1
- Fix English grammatical errors in vignettes HOT 1
- Blank pages in EDA paged reports HOT 1
- Submit version 0.6.3 to CRAN HOT 3
- Instalation Issue HOT 2
- Tasks for submitting to CRAN HOT 2
- fix CRAN submit for 0.6.3.9005 HOT 1
- Typos in diagnose_web_report
- how to binning other vector from an existing "bins" object? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dlookr.