Deep learning watch, live article.
Technical monitoring of the latest or most popular architectures, pipelines, tools and datasets related to deep learning. It will be updated continuously.
Don’t hesitate to share your own references in the comments, I’ll maybe add it. ;)
Summary
- Object recognition
- Audio synthesis
- Image generation
- Natural language processing
- Pose estimation
- Outliers
- AI solutions
Object recognition
Allows object recognition in a scene, it can be 2D classification, 2D detection (bounding boxes), segmentation (predict pixel by pixel which class an object belongs to), 3D detection (3D bounding boxes with orientation in space), etc…
Models:
Object classification (from RGB image)
- SimCLR (Paper) — 2020
Object detection (from RGB image)
- Faster-RCNN (Article, paper) — 2015
- SSD (Article, paper) — 2015
- YOLOv3 (Website) — 2019
- ATSS (Paper, code) — 2019
- CBNet (Paper, code) — 2019
Unique object detection (from RGB image)
- [Palm] BlazePalm by Google (Article) — 2019
- [Face] BlazeFace by Google (Article, Paper) — 2019
- [Solar panel] SolarNet (Paper) — 2019
Object segmentation (from RGB image)
- [Human] Bodypix (Article) — 2019
Tools:
Object detection and segmentation
- Detectron (Article, papers, code) — 2018
Modèles : Mask R-CNN, RetinaNet, Faster R-CNN, RPN, Fast R-CNN, R-FCN
Data augmentation
- Learning in the Frequency Domain (Paper) — 2020
Datasets:
Audio synthesis
The idea is to generate sound, it can be voice or music for exemple.
Models:
Mel spectrogram
- Tacotron 2 by Nvidia (Paper, repo Github, article) — 2017
- FastSpeech (Paper) — 2019
- FastSpeech 2 (Paper) — 2020
Vocal
- WaveGlow by NVidia (Made from Glow and WaveNet. Article) — 2018
- Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (Made from Tacotron 2. Paper, article) — 2018
Audio
- WaveRNN (Paper) — 2018
- GANSynth (Paper) — 2019
- [Music] DDSP (Article) — 2019
- [Music + Vocal] Jukebox by OpenAI (Article, code, paper) — 2020
Datasets:
Tools:
“Deep neural network that can generate 4-minute musical compositions with 10 different instruments”
- MuseNet by OpenAI (Article) — 2019
Image generation
Same idea than the previous section, but here, it will be visual 2D content. On a image or video format.
Models:
2D
- StyleGAN2 T.Karras et Al. (Code, paper, video) — 2019
- SinGan (Article, paper) — 2019
- Semantic Image Synthesis with Spatially-Adaptive Normalization (Code, paper, video, demo GauGAN) — 2019
- StyleGAN2 Y. Viazovetskyi et Al. (Code, paper) — 2020
Depth map
- Depth Map Estimation of Dynamic Scenes Using
Prior Depth Information (Paper) — 2020
Super resolution
Image enhancing
Pipelines:
From black and white choppy video to 4K 60 FPS (Article).
- [Resolution] ERSGAN Augmentation de résolution (Paper) — 2018
- [Colorisation] DeOldify (Article) — 2018
- [FPS] Depth-Aware Video Frame Interpolation (Article) — 2019
Tools:
Replace face in a video (Deep fake).
- DeepFaceLab (Repo Github) — 2018
Modèles : Quick96, SEAHD, FANSeg, XSeg.
Natural language processing
NLP allows a model to mimic human comprehension of words, or at least analyse the text structure and words positions in a sentence to extract a meaning. For different purposes like translation, relation extraction, summarisation, named entity extraction, …
Models:
- Universal Sentence Encoder (Paper) — 2018
- BERT by Google (Blog, paper) — 2018
- RoBERTa by Facebook (Article, code, paper) — 2019
- CamenBERT (Paper) — 2019
- GPT-2 by OpenAI (Article, code, paper) — 2019
- XLNet (Paper) — 2019
- ALBERT (Paper) — 2019
- T5 by Google (Article, code, paper) — 2019
- ELECTRA by Google (Article, code, paper) — 2020
- Reformer, a more powerful version (O(L2) vers O(L log L)) of Transformer (Paper) — 2020
- [Chatbot] Meena (Paper) — 2020
- Ensemble of BERT models, centralised by Google (Code) — 2020
- MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers (Article, paper) — 2020
- GPT-3 by OpenAI (API, paper) — 2020
Datasets:
- Wikipedia articles (Article)
- C4, Common Crawl’s web crawl corpus on TensorFlow (Article)
- CORD-19, COVID-19 Open Research Dataset (Article)
- Taskmaster-2 (Article)
- The Big Bad NLP Database (Website)
Tools:
References:
Pose estimation
The image above is reflecting well the idea of pose estimation. Provide a “skeleton” of a person, a hand, or something else. Each junction (extremity of each segment) represents here, a joint.
Models:
3D Hand pose estimation (from RGB image)
- Using a single RGB frame for real time 3D hand pose estimation in the wild (Paper) — 2017
- REGNet (Website) — 2018
- 3D Hand Shape and Pose Estimation from a Single RGB Image (Website) — 2019
- Hand Landmark de Google (Article) — 2019
2D Hand pose estimation (from RGB image)
- Attention! (Paper) — 2020
2D Human pose estimation (from RGB image)
- Towards Accurate Multi-person Pose Estimation in the Wild (Paper) — 2017
3D Human pose estimation (from RGB image)
3D Face mesh generation (from RGB image)
Datasets:
- GANerated (Website)
Outliers
An ensemble of articles out of category, until they are part of a category.
Tools:
Motion transfert
Predict molecules to create new antibiotics
- Chemprop (Article) — 2020
Model type : Ensemble of GNNs
Convert brain waves to sentences
- Machine translation of cortical activity to text with an encoder-decoder framework (Code, Paper BioRxiv, Paper Nature) — 2020
Discovering machine learning algorithms from scratch
- AutoML-Zero: Evolving Machine Learning Algorithms From Scratch (Paper) — 2020
Data augmentation for 3D cloud points
- Improving 3D Object Detection through Progressive Population Based Augmentation (Article, paper) — 2020
Anomaly detection
Visual object tracking
- RANet: Ranking Attention Network for Fast Video Object Segmentation (Paper) —2019
Datasets:
AI Solutions
- Doc.ai — A solution for tracking your health
- PacketAI.co —” ROI driven ITOps solution”
- Blazar.ai — “Immunotherapies using the immune system to fight cancer“
- UpStride.io — “Train with up to 10x less data”
- Flowlity.com — “Your Supply Chain, Simplified. Synchronized. Reinvented.”
- iRhythm.com — Mobile cardiac telemetry
- Synthesia.io — “… a powerful tool to create engaging video content without the need for actors, film crews and studios.”