Current maintainers of this lesson are
- Chris Endemann
- Ann Hanlon
- Karl Holten
- Mariah A. Knowles
- Jennifer Patino
A list of contributors to the lesson can be found in AUTHORS
To cite this lesson, please consult with CITATION
Text Analysis with Python
Home Page: https://carpentries-incubator.github.io/python-text-analysis/
License: Other
Current maintainers of this lesson are
A list of contributors to the lesson can be found in AUTHORS
To cite this lesson, please consult with CITATION
The libraries episode currently uses the Python math library as an example - is there a better text-based library to use for this lesson?
Should we? and where?
Maybe using a dictionary/loop example to write out a file?
We might be able to steal some of this from here
Maybe move up Lists/loops/conditionals earlier and move libraries later.
Hi,
If possible, I would like to help in developing the lesson. How may I proceed?
Best
Annajiat
As I'm creating code examples for the various functions in my episode, it occurred to me that we may want to align the themes of the examples we are using throughout (either in process or at the end as we are doing the final edit).
I like to use silly examples (i.e. favoritePets = ['cat', 'dog', 'bird']) when I teach, but my impression is that Carpentries tend to use more serious examples. In fact, everyone might end up using different themes as they write their episodes, and it could end up a bit of a crazy quilt.
Should I just proceed with my silly examples, and we can modify them later to bring them into alignment with the rest of the episodes?
Move the episodes 1-13 from SWC Gapminder to this lesson.
Will need editing after move
Could use some opinions on this, I was thinking maybe we should add a section on using dictionaries. They are such a useful data structure. My concern is that maybe we don't need them for the end using nltk so maybe it isn't worth teaching?
Thoughts from others?
Add prerequisite language to the lesson homepage
Thanks for contributing! ❤️
If this contribution is for instructor training, please email the link to this contribution to
[email protected] so we can record your progress. You've completed your contribution
step for instructor checkout by submitting this contribution!
If this issue is about a specific episode within a lesson, please provide its link or filename.
Keep in mind that lesson maintainers are volunteers and it may take them some time to
respond to your contribution. Although not all contributions can be incorporated into the lesson
materials, we appreciate your time and effort to improve the curriculum. If you have any questions
about the lesson maintenance process or would like to volunteer your time as a contribution
reviewer, please contact The Carpentries Team at [email protected].
You may delete these instructions from your comment.
- The Carpentries
We should identify how the introductory Python lessons will culminate in the NLTK lesson - what aspects of Python are core to NLTK?
Which episode to introduce format function?
I suggest we use " " for strings consistently throughout the lesson.
Any objections? @Karl-Holten @caseyschacher @claremichaud @annhanlon @maijxyooj @maxgray20
Also we should maybe purposely run into a situation where you would need raw quotes.
Develop a small portion of a the Python lesson that reads the contents of a plain text file and iterates over its lines.
Learning objectives: gain experience with file system I/O, introduce the concept of loops by looping over the lines in the file.
Add an introduction to the lesson home page
hello Maintainers,
Your repository currently has the "hacktoberfest" tag which means that people will be able to make contributions to it to count towards their challenge (see the Hacktoberfest website for more info).
While the event is great to encourage people to contribute to open-source projects, in recent years, we have seen that there is sometimes an influx of low-quality pull requests on such repositories as people try to get their 4 pull requests with trivial changes. You will be able to mark these pull requests with the label "spam" or "invalid" so these PR won't count.
Alternatively, you can use an "opt-in" model where you don't need to have the "hacktoberfest" topic on your repository, and you can tag pull requests that are high-quality with the "hacktoberfest-accepted" label.
Let me know if you have any questions.
Add episode on reading in a text file for it to be cleaned for better analysis
Topic Modeling Theory and Practice - Day 4 (practice)
-Start building the code and importing the corpus
-Hyperparameters on a model, what they do (theoretical)
-Have them run through and create the model
-Make the visualization
-Pass in unknown document
There are several episodes that contain exercises/information about using numbers and floats. Given that we need to trim the fat as much as possible, is it reasonable to remove all of those bits?
Examples:
Episode 3 has a bunch on working with numbers. It might be useful to teach how to convert a number to string type maybe?
Episode 4 uses mathematics as examples. Perhaps all examples should use string data type instead?
Episode 6 introduces the Math library but probably shouldn't. We should introduce NLTK instead. This entire episode needs to be rewritten.
Episode 11: the "pressures" and "primes" examples should be converted to something string-based.
Episode 12: many examples use numbers. slight alteration to make them string based would be good.
Episode 13 is almost entirely integer based. Also, it introduces Pandas (is this what we need to do at this point?)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.