Giter VIP home page Giter VIP logo

lc-spreadsheets's People

Contributors

annajiat avatar avolkov avatar doujoudc avatar duchessanne avatar elichad avatar elk2klein avatar emcaulay avatar erikamias avatar erinbecker avatar fmichonneau avatar froggleston avatar jcoliver avatar jcszamosi avatar jezcope avatar jt14den avatar kenlacey avatar morskyjezek avatar mpfl avatar ndporter avatar niamhwallace avatar philreeddata avatar scottcpeterson avatar serahkiburu avatar shlake avatar tobyhodges avatar tpatwood avatar yvonnemery avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lc-spreadsheets's Issues

Episode 0 - Intro Title Change

@shlake has made these changes on her branch. When issue is agreed to be "fixed", then will ask for PR.

====================================

This lesson isn't about data "wrangling". It is really about data "organization" and best practices. Propose Title change from
Using spreadsheet programs for data wrangling
TO
Using spreadsheet programs for data organization

Episode 4: Input Message / Error Alert images switched

At least on a Mac - The "Error Alert" tab contains the text that is shown when an incorrect value is entered.

The "Input Message" is a popup message that appears in the cell that has "Data Validation" set - notifying what the correct values for entry are.

Dates as data - text is outdated

"If Excel was to be believed, this person had been collecting bugs IN THE FUTURE. Now, we have no doubt this person is highly capable, but I believe time travel was beyond even his grasp."
--The most recent example in the table is 2017. Include a new example table, maybe with a date of Jul-31 - the "How it was interpreted" column will be 1-Jul-31 so it'll be a few years before this is outdated again. :)

Episode 6: Data Format

Things to consider:

1 - keep as "episode" and update episode page that this is "Reference"
2 - Move out to Discussion or the Reference section and include brief paragraph on Episode 5

Link to wrong data file

In the Caveats of popular data and file formats part of the lesson (https://librarycarpentry.org/lc-spreadsheets/06-data-formats-caveats/index.html) under the heading Dealing with commas as part of data values in .csv files there is a link to an example data file (https://ndownloader.figshare.com/files/3299483). This linked file is different from the excerpt described in this section.

I would suggest removing the link. The lesson could just work with the data given in text form in this section.

"Let’s try with a simple challenge."

I noticed that Dates as data includes this line: "Let’s try with a simple challenge."

Can we eliminate the word "simple"?

Please delete the text below before submitting your contribution.


Thanks for contributing! If this contribution is for instructor training, please send an email to [email protected] with a link to this contribution so we can record your progress. You’ve completed your contribution step for instructor checkout just by submitting this contribution.

Please keep in mind that lesson maintainers are volunteers and it may be some time before they can respond to your contribution. Although not all contributions can be incorporated into the lesson materials, we appreciate your time and effort to improve the curriculum. If you have any questions about the lesson maintenance process or would like to volunteer your time as a contribution reviewer, please contact Kate Hertweck ([email protected]).


Screen shots for Windows

Submitted by email for Instructor Training checkout:

In reviewing some of the Library Carpentry lessons as part of the check-out process for the instructor training I noticed an issue.

In the following lesson:
https://librarycarpentry.org/lc-spreadsheets/05-exporting-data/index.html

and this may be the case in others, it would be useful to include screen snapshots of Windows version of software. I think it would be especially useful in these lessons where Excel is being used, the Mac and Windows interface can be different (depending on what you're doing). Also, my experience is that a good number of people taking Library Carpentry are developing their computer skills and it is a stretch for some to figure out the difference between the Mac and Windows interfaces.

Understanding that the page is already loaded with information, I'm not sure where this would fit on the page. It may not be necessary for all screen snapshots, just for the ones where the Windows and Mac versions differ.

Another alternative is that when teaching the lessons live, suggest that the instructors have access to and are able to display different software versions (Mac v. Windows).

I hope this makes sense. It's based off of my experience working with people who are novices.

Sorting Question: Puzzled about one of the questions

Hi, we're planning to teach this on Tuesday, but there's an exercise I'm not sure if I understand myself under Sorting :O

The question is "When you do this sort, do you notice anything strange?", but when I sort the date column, it seems to sort perfectly, I don't see anything strange. Is it perhaps just supposed to be strange that it sorts perfectly without showing the year..?

Or could it be that the exercise does not match the current version of the spreadsheet? The screenshot does not match the current spreadsheet at all.

Quality Control exercises a bit unclear

The Quality Control section contains a couple points of confusion. Under the Sorting exercise, it says to sort the date by smallest to largest and also the image shows a different variable (wgt). Additionally, for the Conditional Formatting section it would be helpful to say which tab format is under and to clarify the rule for the conditional formatting. However, this may have been a difference between MS Excel on Mac OS which is what I was using for the tutorial.

Caveats of popular data and file formats - broken links

In the "Caveats of popular data and file formats" episode there are 3 broken links:

  1. Under "Commas as part of data values in *.csv files"
    In the previous lesson we discussed...
    The link for previous lesson is broken.
  2. Under "Dealing with commas as part of data values in .csv files"
    This example data file applies this rule...
    The link for example data file is broken.
  3. Under "Dealing with commas as part of data values in *.csv files"
    Open the species.csv file in Excel (or Calc in Libre Office).
    The link for species.csv is broken.

Conflicting recommendations about text documentation

The recommendations pertaining to text documentation about the data are inconsistent.

About "Keeping track of your analyses," module 2, Formatting data tables in Spreadsheets states:

[K]eep track of the steps you took in your clean up or analysis. You should track these steps as a scientist would each step in an experiment. You can do this in another text file, or a good option is to create a new tab in your spreadsheet with your notes. This way the notes and data stay together.

About "Inclusion of metadata," module 3, Formatting problems states:

Unlike a table in a paper or a supplemental file, metadata (in the form of legends) should not be included in a data file since this information is not data, and including it can disrupt how computer programs interpret your data file. Rather, metadata should be stored as a separate file in the same directory as your data file, preferably in plain text format with a name that clearly associates it with your data file.

While not referring to exactly the same documentation, both pieces of advice do refer to text documentation about the dataset rather than the data themselves. It seems like best practice would be to store documentation about the dataset in a single location.

Dead links in 03-dates-as-data

There are broken links in 'Preferred date format' section under "Notes" subsection:

  • 1900 date system
  • dates must be checked for accuracy when exporting data from Excel

I will create pull request fixing this.

Training attendance data includes RANDBETWEEN formula

In the second sheet, "2017", column J includes a formula which looks to be used to generate data. It fills the value of the cell with a random number between 15 and the number of registrants (column I). I couldn't find a place in the lesson where this formula is used, and in copy/pasting, it seemed to just cause confusion in the workshop. Perhaps the spreadsheet could be updated so numerical values, rather than a formula, are present in cells I4:I12 of sheet "2017"?

Broken links in Contributing.md

Here are set of broken links on this page:

  1. 'code of conduct' - under Contributor Agreement section
  2. 'creating an issue' - under How to contribute section
  3. 'https://github.com/swcarpentry/FIXME' - both the links referencing this website in the first numbered point under the Where to Contribute section
  4. 'bug reports' - under What to Contribute section
  5. 'list of issue for this repository' - under What to Contribute section
  6. 'master repository' - Both the references under the Using Github section
  7. 'discussion mailing list' - Not sure if it broken but goes to a blank page with 'No route defined for this request...' in an empty box and clicking the Back home box didn't reroute.

Change Formatting Color on Data File

The formatting color is "red" which may not be seen by some. Changing the color to "grey". I noticed this is what the Ecology data file now has as a formatting color.

Dates as data - adding ISO standard

In the "storing dates as a single string" section, I would suggest modifying the last sentence as follows:

"You could also use the 'YYYYMMDD' format. YYYYMMDD is an international standard (ISO 8601) and can eliminate ambiguity when sharing spreadsheets. Such strings will be correctly sorted in ascending or descending order, and by knowing the format they can then be correctly processed by the receiving software."

The word ascending is also currently misspelled (missing the "i").

Change Delimiter on PDR/PDRA/other Column

When trying to use the Text-to-Columns feature to separate the PGR/PDRA/Other column into three new columns, Excel get confused and thinks the "/" is part of a date (and we know how dates are not easy or totally understood in Excel).
Suggesting that a new delimiter, possibly "|" is used.

June 2019 Lesson Release checklist

If your Maintainer team has decided not to participate in the June 2019 lesson release, please close this issue.

To have this lesson included in the 18 June 2019 release, please confirm that the following items are true:

  • Example code chunks run as expected
  • Challenges / exercises run as expected
  • Challenge / exercise solutions are correct
  • Call out boxes (exercises, discussions, tips, etc) render correctly
  • A schedule appears on the lesson homepage (e.g. not “00:00”)
  • Each episode includes learning objectives
  • Each episode includes questions
  • Each episode includes key points
  • Setup instructions are up-to-date, correct, clear, and complete
  • File structure is clean (e.g. delete deprecated files, insure filenames are consistent)
  • Some Instructor notes are provided
  • Lesson links work as expected

When all checkboxes above are completed, this lesson will be added to the 18 June lesson release. Please leave a comment on carpentries/lesson-infrastructure#26 or contact Erin Becker with questions ([email protected]).

New maintainers needed!

As we look to move Library Carpentry to becoming a formal part of the Carpentries community, we need to make sure that each of our lessons are properly supported, including 3 maintainers for each one. It will also be obvious to anyone paying attention that I haven't had much time to work on this lesson recently, and that will be the case for a while, so I'm looking to step down from that responsibility for now.

I'm particularly pinging the following people who have contributed issues and pull requests to this repository but anyone with an understanding of the content and the library world would be welcome.

@duchessanne @cmacdonell @niamhwallace @shlake @yvonnemery @libcce @fmichonneau @stephlopezq @StephanJanosch @fionajones @jcoliver

Why not use KNIME for data manipulation

Hi,

did you ever consider using KNIME as tool for data manipulation? I think the approach via Knime is way more reproducible and also visually more appealing.

Just my 2c,
Stephan

Suggested new title for "Basic Quality Assurance and Control, and Data Manipulation in Spreadsheets"

Hi there! Thank you for all the work you do to make this resource available to everyone!
This is my first issue submission as part of the Carpentries trainer check-out process.

For the episode "Basic Quality Assurance and Control, and Data Manipulation in Spreadsheets", I have a few suggestions. (Edit: At the guidance of the Carpentries onboarding team I'm breaking this up into separate issues 1/2/2020.)

Episode title:

  • My first suggestion would be to simplify the title. Compared to the others, this one runs long. Perhaps eliminate "basic" for the same reason we avoid "just", "simple," etc. Data manipulation doesn't add much, as the lesson itself is divided by the two sections, QA and QC. I think reinforcing the language in the title and headers helps learners.
  • I would suggest: "Ensuring Quality Assurance and Quality Control."

Suggestions for Conditional Formatting section of "Basic Quality Assurance and Control, and Data Manipulation in Spreadsheets"

For the episode "Basic Quality Assurance and Control, and Data Manipulation in Spreadsheets", I would suggest these edits to the Conditional Formatting section:

  • This subsection opens with, "Use with caution!" but does not elaborate why. I would suggest adding context on which situations this is an appropriate tool, and which it is not.
  • From my own experience teaching with Excel/Calc/Google Sheets is that if learners know this tool, they use it a lot. They would benefit from the "why" - that it's a great tool for in-process diagnosis, but isn't a long-term solution, because certain colors cannot be seen by everyone, etc.
  • Some of the "why" is already in the exercise section, so perhaps this could be moved up into the opening paragraph: "It is nice to do be able to do these scans in spreadsheets, but we also can do these checks in a programming language like Python or R, or in OpenRefine or SQL."

Thanks for your consideration!

Sorting screenshot does not match text in Quality Control episode

In the sorting section, the the exercise is:

Let’s try this with the Date tab in our messy spreadsheet. Go to that tab. Select Data then select Sort

Sort by date in the order Smallest to Largest

The screenshot though, shows sorting on Column "wgt". Ideally the screenshot would be updated with the "Date" column.

Dates as Data - confusing exercise

_This activity is confusing - and maybe distracting from the main learning objective of formatting dates correctly with month, day, and year in separate columns, especially because Excel functions aren't addressed in the earlier lessons.

I would put this text above the exercise:_
Preferred date format
Instead it is much safer to store dates with MONTH, DAY and YEAR in separate columns or as YEAR and DAY-OF-YEAR in separate columns.

And maybe add a faded example, or slightly more detailed example syntax:


Current time and date are best retrieved using the functions NOW(), which returns the current date and time, and TODAY(), which returns the current date. The results will be formatted according to your computer’s settings.

  • In an empty cell, try to extract the year, month and day from the current date and time string returned by the NOW() function.
    --=YEAR(NOW())
    --=MONTH( ____ ())
    --= ____ ( ____ ())

  • Calculate the current time using NOW()-TODAY(). Note that the default output will be a decimal, not in the hh:mm:ss format.

  • Try to extract the hour, minute and second from the current time using functions HOUR(), MINUTE() and SECOND().

  • press fn + F9 to force the spreadsheet to recalculate the NOW() function, and check that it has been updated.


And then keep the note about Excel and pre-1900 dates where it is.

Episode 2: More Than One piece of Info

Used the example of type of workshop attendees in place of multiple instructors.

Will now say this:
Example:
One table recorded attendance by the different types of attendees. This table recorded number of attendees of different types: post-graduate researcher (PGR), post-doctoral research associate (PDRA), and other.

Solution:
Never include more than one piece of information in a cell. Design your data sheet to include a column for each type of attendee, if this information is important to collect, rather than just a total number.

Episode 1: Added additional key points

The DC ecology lesson (that this LC lesson is based on) has the following key points, which should be added to this LC version.

  • Never modify your raw data. Always make a copy before making any changes.

  • Keep all of the steps you take to clean your data in a plain text file.

Add Libre office link to discussion on freezing rows

Since the tutorial mentiones LibreOffice as one of the tools to use in the workshop, I added the link to the documentation of Libre Office on how to freeze the tables, so if anyone is using LibreOffice or OpenOffice they would be able to keep up.

I'll open a pull request for this

Episode 4: New Title

Propose to not include the last part of the current title:

From: Basic quality assurance and control, and data manipulation in spreadsheets
To: Basic quality assurance and control

transforming serial number to dates in Excel

Submission for instructor training checkout:

Every time I talk to a student or a class about dates in Excel and explain how dates are stored as a number, it is inevitable that someone will ask, "What do you do if you want to get something that looks like a date [mm/dd/yyyy] and no matter what you do with formatting you still have a serial number?"
Librarians often see spreadsheets that have suffered many transformations whether it is from a person’s formatting and manipulation or when data is downloaded from older database programs or data sources.
In the “dates as data section”, I would suggest adding section of formatting dates in the other direction.
After the section, Dates stored as integers, I would add a small section on “Transforming serial numbers to dates”
Options to transform serial numbers to dates:

  1. Always try to reformat first:
    a. Windows Excel: Home-> Format _>Format cells -> Date OR Home->Number (on the ribbon) and choose a different format
    b. LibreOffice: Format->Number format->Date
    From here try different formats
  2. Text function: This function will convert a value to text in a number format:

image

Formatting Problems Lesson Needs New Screenshots

This lesson uses screen shots from the Data Carpentry sample file.

For this Library Carpentry lesson, the sample file is "training_attendance", so this lesson needs equivalent screen shots from the training_attendance file.

Suggestions for Quality Control section of "Basic Quality Assurance and Control, and Data Manipulation in Spreadsheets"

For the episode "Basic Quality Assurance and Control, and Data Manipulation in Spreadsheets", I would suggest these edits to the Quality Control section:

  • The tip about version control and the readme paragraph below are related, but it's not clear whether these are theoretical concepts to be explained to learners, or whether they are part of the hands-on process.
  • If time allows, I would suggest incorporating hands-on steps because then learners have practice with the idea presented. At the very least, an example in the narrative would help illustrate each of these steps, maybe broken into a numbered list because while brief, these paragraphs pack a lot of information.
  • Stylistically, I would suggest making these the same text format; one as a tip and one as a free-floating paragraph is a little disjointed.

Thank you for your consideration!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.