In this project we created an end to end system that analysed videos files, detecting Objects in each frame of Video, Face Analysis in the video and even the audio transcribing. Rest API's were created for each of the analysis feature and React web app was made to showcase the result.
Object Detection & Image classification were done for Analysing Video for Objects in each frame & face expression analysis - Sad, Happy, etc. Audio if present in the video were transcribed to Texts on which we further did NLP to get context and sentiment using Word2Vec Model.
OpenPose was used to detect multiple keypoints across body so as to determine poses from Video frames. We trained on COCOA Dataset of Human poses so as to get keypoints. Model was quantised so as to decrease model size so that it could be deployed as Core ML model in IOS application. FPS achieved was 30.
Keypoints detected were used analyse video frames for different angles b/w joints of body so as to determine the recovery post surgery.
We performed OCR on KYC documents : PAN , Aadhaar, Passport etc to extract user information. We also extracted Face Image from the ID cards and made feature attributes of the face & OCR data. Face data was used for One Shot Learning to do Video KYC of user by doing face Matching b/w face embeddings from Video frame and embedding already stored from extracted data of IDs.
Facenet was used for face matching. To create our own OCR pipeline we used CNN based text localisation with Tesseract for Text classification. Tesseract uses BiLSTM based model to classify texts. A pre processing pipeline was created to detect ID cards classification and different rotated order so as to align it perfectly for OCR to be performed accurately.
Deep learning ,model was created to detect objects in images for various Fashion Commerce category. Multi Output Multi Loss CNN model was trained to achieve it. Dataset used was Fashionet. Attributes like Color, Texture were also used in Labelling so as for model to be trained and do the inference. We had to outsource the data annotation part for this project. So the model used to output multi labels when an image was fed to the model.
We even did some Tag extraction based on the Product Description using Word2Vec based model. Hashtags were generated based on entities extracted from product description.