This is a simple and straight forward implementation to the h1b statistics challenge. I implemented the challenge is in native python, without any added modules frameworks or libraries.
- Read the input file header, and locate all the desired columns.
- Read the input file, line by line, counting for the relevant data, using a dictionary as a counter.
- Output the needed output, according to the given format.
- I'm using a dictionary to count data. This works well empirically on the given examples. I can easily explain this for the states data, since there are only 50 states. The number of occupations is larger, but not too large (e.g. there are less than 1000 occupations for the largest data set). Some other simple alternative could have been using python collections, in particular counter, but there was no need to. Using python collections might have made the code clearer, but given your instructions (use basic data structures), I decided not to use it.
- I read the entire file into memory. Again, I could have used some kind of serialization, going iteratively line by line. Since my approach worked well empirically, even on large input files, I didn't find any need to do something more sophisticated.
- I filter for CERTIFIED status. I ignore all other status, including CERTIFIED-WITHDRAWN.