The system creates 1D Parsons puzzles with help of ChatGPT. The language model is used as database of programming exercises and distractors (incorrect fragments). User is asked to rearrange the fragments to form the correct order and to trash the distractors.
Example:
Public web interface Python programming puzzles.
- DONE: Python puzzles. Reorder fragments to create solution for given exercise. Trash wrong code lines.
- TBD: Historical puzzles. Reorder fragments to form correct time sequence according to puzzle topic. Trash wrong facts.
- TBD: Chain of reasoning. Distinct premises and conclusions. Trash statements that does not follow
LLM output have to be validated. Instructor UI is hosted at /config route and requires API key to perform manipulations that involves AI service. The generated content is validated manualy and with help of tools:
- DONE: Compilation of generated code, running unit tests that were also generated
- TBD: Validate generated links to Wikipedia and compare contents to confirm generated fact
- TBD: Validate symbolic representation of premises and conclusions with help of z3 solver.
Validated and approved exercises example:
- Compute statistics of LLM model performance on task of generating the exercises (generaton of errors, complexity of the code, duplication). Currently removing of duplicates is implemented with "avoid" part of prompt.
- Register move event of fragments rearrangement in DB for analysis of behavior.
- Frontend: static website (hosted in Azure blob)
- Backend: Azure functions (python)
- CosmosDB NoSQL
- AI: Azure ML Prompt Flow, deployed as App Service.
Generation of one exercise with OpenAI API requires from 10 to 40 seconds. Therefore, it happens asynchronously and is initiated by instructor. Another reason for this decision is the cost of calling ChatGPT endpoint.