Deep learning watch, live article.

5 min readMay 29, 2020

Technical monitoring of the latest or most popular architectures, pipelines, tools and datasets related to deep learning. It will be updated continuously.

Don’t hesitate to share your own references in the comments, I’ll maybe add it. ;)

Summary

Object recognition
Audio synthesis
Image generation
Natural language processing
Pose estimation
Outliers
AI solutions

Object recognition

Allows object recognition in a scene, it can be 2D classification, 2D detection (bounding boxes), segmentation (predict pixel by pixel which class an object belongs to), 3D detection (3D bounding boxes with orientation in space), etc…

Models:

Object classification (from RGB image)

SimCLR (Paper) — 2020

Object detection (from RGB image)

Faster-RCNN (Article, paper) — 2015
SSD (Article, paper) — 2015
YOLOv3 (Website) — 2019
ATSS (Paper, code) — 2019
CBNet (Paper, code) — 2019

Unique object detection (from RGB image)

[Palm] BlazePalm by Google (Article) — 2019
[Face] BlazeFace by Google (Article, Paper) — 2019
[Solar panel] SolarNet (Paper) — 2019

Object segmentation (from RGB image)

[Human] Bodypix (Article) — 2019

Tools:

Object detection and segmentation

Detectron (Article, papers, code) — 2018
Modèles : Mask R-CNN, RetinaNet, Faster R-CNN, RPN, Fast R-CNN, R-FCN

Data augmentation

Learning in the Frequency Domain (Paper) — 2020

Datasets:

[Face] LFW (Website)
ImageNet (Website)

Audio synthesis

The idea is to generate sound, it can be voice or music for exemple.

Source : https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html

Models:

Mel spectrogram

Tacotron 2 by Nvidia (Paper, repo Github, article) — 2017
FastSpeech (Paper) — 2019
FastSpeech 2 (Paper) — 2020

Vocal

WaveGlow by NVidia (Made from Glow and WaveNet. Article) — 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (Made from Tacotron 2. Paper, article) — 2018

Audio

WaveRNN (Paper) — 2018
GANSynth (Paper) — 2019
[Music] DDSP (Article) — 2019
[Music + Vocal] Jukebox by OpenAI (Article, code, paper) — 2020

Datasets:

NSynth (Article)
LJSpeech (Article)

Tools:

“Deep neural network that can generate 4-minute musical compositions with 10 different instruments”

MuseNet by OpenAI (Article) — 2019

Image generation

Same idea than the previous section, but here, it will be visual 2D content. On a image or video format.

Source : https://deepdreamgenerator.com/

Models:

StyleGAN2 T.Karras et Al. (Code, paper, video) — 2019
SinGan (Article, paper) — 2019
Semantic Image Synthesis with Spatially-Adaptive Normalization (Code, paper, video, demo GauGAN) — 2019
StyleGAN2 Y. Viazovetskyi et Al. (Code, paper) — 2020

Depth map

Depth Map Estimation of Dynamic Scenes Using
Prior Depth Information (Paper) — 2020

Super resolution

[Face] PULSE (Article, paper) — 2020

Image enhancing

Learning to See Through Obstructions (Article, paper, code) — 2020

Pipelines:

From black and white choppy video to 4K 60 FPS (Article).

[Resolution] ERSGAN Augmentation de résolution (Paper) — 2018
[Colorisation] DeOldify (Article) — 2018
[FPS] Depth-Aware Video Frame Interpolation (Article) — 2019

Tools:

Replace face in a video (Deep fake).

DeepFaceLab (Repo Github) — 2018
Modèles : Quick96, SEAHD, FANSeg, XSeg.

Natural language processing

NLP allows a model to mimic human comprehension of words, or at least analyse the text structure and words positions in a sentence to extract a meaning. For different purposes like translation, relation extraction, summarisation, named entity extraction, …

Source : http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/

Models:

Universal Sentence Encoder (Paper) — 2018
BERT by Google (Blog, paper) — 2018
RoBERTa by Facebook (Article, code, paper) — 2019
CamenBERT (Paper) — 2019
GPT-2 by OpenAI (Article, code, paper) — 2019
XLNet (Paper) — 2019
ALBERT (Paper) — 2019
T5 by Google (Article, code, paper) — 2019
ELECTRA by Google (Article, code, paper) — 2020
Reformer, a more powerful version (O(L2) vers O(L log L)) of Transformer (Paper) — 2020
[Chatbot] Meena (Paper) — 2020
Ensemble of BERT models, centralised by Google (Code) — 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers (Article, paper) — 2020
GPT-3 by OpenAI (API, paper) — 2020

Datasets:

Wikipedia articles (Article)
C4, Common Crawl’s web crawl corpus on TensorFlow (Article)
CORD-19, COVID-19 Open Research Dataset (Article)
Taskmaster-2 (Article)
The Big Bad NLP Database (Website)

Tools:

DeepPavlov (Website)
Hugging Face (Website)

References:

https://github.com/sebastianruder/NLP-progress

Pose estimation

The image above is reflecting well the idea of pose estimation. Provide a “skeleton” of a person, a hand, or something else. Each junction (extremity of each segment) represents here, a joint.

Source : https://www.youtube.com/watch?v=mxKlUO_tjcg

Models:

3D Hand pose estimation (from RGB image)

Using a single RGB frame for real time 3D hand pose estimation in the wild (Paper) — 2017
REGNet (Website) — 2018
3D Hand Shape and Pose Estimation from a Single RGB Image (Website) — 2019
Hand Landmark de Google (Article) — 2019

2D Hand pose estimation (from RGB image)

Attention! (Paper) — 2020

2D Human pose estimation (from RGB image)

Towards Accurate Multi-person Pose Estimation in the Wild (Paper) — 2017

3D Human pose estimation (from RGB image)

DensePose (Article, code, paper) — 2018

3D Face mesh generation (from RGB image)

FaceMesh (Article, paper) — 2019

Datasets:

GANerated (Website)

Outliers

An ensemble of articles out of category, until they are part of a category.

Tools:

Motion transfert

Everybody Dance Now (Vidéo, Paper) — 2018

Predict molecules to create new antibiotics

Chemprop (Article) — 2020
Model type : Ensemble of GNNs

Convert brain waves to sentences

Machine translation of cortical activity to text with an encoder-decoder framework (Code, Paper BioRxiv, Paper Nature) — 2020

Discovering machine learning algorithms from scratch

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch (Paper) — 2020

Data augmentation for 3D cloud points

Improving 3D Object Detection through Progressive Population Based Augmentation (Article, paper) — 2020

Anomaly detection

G2D: Generate to Detect Anomalies (Article, paper) — 2020

Visual object tracking

RANet: Ranking Attention Network for Fast Video Object Segmentation (Paper) —2019

Datasets:

Waymo open dataset (lidar/ radar data) (Website) — 2019
Russian Open Speech To Text (STT/ASR) Dataset (Code)

AI Solutions

Doc.ai — A solution for tracking your health
PacketAI.co —” ROI driven ITOps solution”
Blazar.ai — “Immunotherapies using the immune system to fight cancer“
UpStride.io — “Train with up to 10x less data”
Flowlity.com — “Your Supply Chain, Simplified. Synchronized. Reinvented.”
iRhythm.com — Mobile cardiac telemetry
Synthesia.io — “… a powerful tool to create engaging video content without the need for actors, film crews and studios.”

Deep learning watch, live article.

Summary

Object recognition

Models:

Tools:

Datasets:

Audio synthesis

Models:

Datasets:

Tools:

Image generation

Models:

Pipelines:

Tools:

Natural language processing

Models:

Datasets:

Tools:

References:

Pose estimation

Models:

Datasets:

Outliers

Tools:

Datasets:

AI Solutions

Written by Steve Gabet