I would like to conduct a quantitative content analysis of Instagram posts using AI/ML techniques. So far, researchers have created a codebook and hired labelers to detect various features in the images manually. I'm interested in exploring comparable pre-existing AI methods that can detect these features as good as humans do and reduce the time required for analysis. The objective is to find Github pages, related papers, and API services that can assist in identifying stylistic and compositional features of images for critical visual discourse analysis.
Below are the desired features and some resources:
-
Camera angle: high angle/regular angle/low angle
-
Presence of government/police/law enforcement (or occupation detection):
- a. The number of uniformed police/law enforcement/security personnel shown
- b. No police
-
Number of people:
- a. Individuals: 1-3 identifiable human subjects shown
- b. Group: 2-9 identifiable human subjects shown in focus
- c. Crowd: 10+ identifiable human subjects shown in focus
- https://github.com/pjreddie/darknet
- source paper: https://arxiv.org/pdf/1506.02640.pdf
- Model name: YOLO
- Annotation method: bounding box annotation
- use case: https://opencv-tutorial.readthedocs.io/en/latest/yolo/yolo.html
- Main developer of YOLO: Joseph Redmon
-
Presence of eye contact:
- https://github.com/vita-epfl/looking
- source paper: https://arxiv.org/abs/2112.04212
- Model architecture itself it derivative of others, e.g. can use alexnet or resnet etc as base model
- Annotation method: keypoint annotation
- Each detected object is annotated with its key points, resulting in a "stick figure" like representation of each object rather than a box.
- Authors created and a domain specific dataset: LOOK
- Also uses publically available dataset: JAAD, and PIE
- https://github.com/vita-epfl/looking
-
Gender:
- a. Female
- b. Male
- c. Mix of male and female
- d. Indeterminate (face is not shown)
- Resources:
- paper: https://arxiv.org/abs/2004.10934
- code: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
- Model: YOLO with custom object detection
-
Age:
- a. Child: <18 years of age
- b. Young adult: 18-34 years of age
- c. Mid adult: 35-50 years of age
- d. Mix
- e. Indeterminate
- Resources:
- https://github.com/mowshon/age-and-gender (object detection: age, gender)
-
Distance (camera shot type):
- a. Closeup: face and shoulders
- b. Mid-range: waist up or occupying almost full frame
- c. Long-range: person's fill half the picture frame or less
- Resources:
- https://rsomani95.github.io/ai-film-1.html
- Project is no longer open source, now need permission from creator to use it...
- https://anyirao.com/projects/ShotType.html
- This project doesn't provide a model to use, but does provide a dataset of short video clips and a corresponding annotation file in JSON format.
- https://rsomani95.github.io/ai-film-1.html
-
Facial expression:
- Smile/anger/disappointed/...
- Resources:
- https://github.com/richmondu/libfaceid (object classification (unclear of detection performance): facial expression, age, gender)
- https://github.com/juan-csv/Face_info (object detection: emotion, race, gender, age)
-
Race or skin color or ethnicity:
- a. White
- b. Black
- c. Asian
- Resources:
- https://github.com/wondonghyeon/face-classification (object detection: race, gender)
-
Object detection: Further research is needed to find specific resources for this feature. Resources:
- General purpose object recognition:
- Amazon Rekognition: https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/what-is.html
- Uses AWS API and streamlined web-browser console to create custom models for object detection tasks.
- Use Cases: Basically any object recognition task, at the expense of manually labeling and preparing training dataset.
- OpenCV: https://opencv.org/
- One of the most popular libraries for computer vision tasks. A rich library with tools allowing people to develop your own ml algorithms, or use pre-trained models.
- Use cases: Detecting number of people - OpenCV YOLO algorithm: https://opencv-tutorial.readthedocs.io/en/latest/yolo/yolo.html
- Amazon Rekognition: https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/what-is.html
- General purpose object recognition:
-
The size of the object/subject: Further research is needed to find specific resources for this feature.
-
Color contrast/diversity: Further research is needed to find specific resources for this feature.
- color analysis methods:
- Colorimetrics:
- Color histograms
- Edge Detection Algorithms
- Color space transformation
- Machine Learning methods:
- Classification tasks where target is the property of color like color contrast, and its labels are
discrete values of the property.
- E.g. target = color contrast, label1 = 'little contrast', label2 = 'moderate contrast', label3 = 'high contrast'
- K-means clustering to cluster images with similar color attributes.
- Classification tasks where target is the property of color like color contrast, and its labels are
discrete values of the property.
- Colorimetrics:
Please note that while some features have existing Github pages or papers, others require additional exploration. The github pages I refer to should also be checked whether we can use their sources for our own purposes. I recommend reviewing the mentioned sources and continuing the search for additional resources to ensure a comprehensive analysis of all desired visual features for critical visual discourse analysis.
Relevant Researches:
- Jain, S., Pulaparthi, K., & Fulara, C. (2015). Content based image retrieval. Int. J. Adv. Eng. Glob. Technol, 3, 1251-1258.