Deep learning watch, live article.

Steve Gabet
5 min readMay 29, 2020

Technical monitoring of the latest or most popular architectures, pipelines, tools and datasets related to deep learning. It will be updated continuously.

Don’t hesitate to share your own references in the comments, I’ll maybe add it. ;)

Summary

  1. Object recognition
  2. Audio synthesis
  3. Image generation
  4. Natural language processing
  5. Pose estimation
  6. Outliers
  7. AI solutions

Object recognition

Allows object recognition in a scene, it can be 2D classification, 2D detection (bounding boxes), segmentation (predict pixel by pixel which class an object belongs to), 3D detection (3D bounding boxes with orientation in space), etc…

Models:

Object classification (from RGB image)

Object detection (from RGB image)

Unique object detection (from RGB image)

  • [Palm] BlazePalm by Google (Article) — 2019
  • [Face] BlazeFace by Google (Article, Paper) — 2019
  • [Solar panel] SolarNet (Paper) — 2019

Object segmentation (from RGB image)

Tools:

Object detection and segmentation

  • Detectron (Article, papers, code) — 2018
    Modèles : Mask R-CNN, RetinaNet, Faster R-CNN, RPN, Fast R-CNN, R-FCN

Data augmentation

  • Learning in the Frequency Domain (Paper) — 2020

Datasets:

Audio synthesis

The idea is to generate sound, it can be voice or music for exemple.

Source : https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html

Models:

Mel spectrogram

Vocal

  • WaveGlow by NVidia (Made from Glow and WaveNet. Article) — 2018
  • Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (Made from Tacotron 2. Paper, article) — 2018

Audio

Datasets:

Tools:

“Deep neural network that can generate 4-minute musical compositions with 10 different instruments”

  • MuseNet by OpenAI (Article) — 2019

Image generation

Same idea than the previous section, but here, it will be visual 2D content. On a image or video format.

Source : https://deepdreamgenerator.com/

Models:

2D

Depth map

  • Depth Map Estimation of Dynamic Scenes Using
    Prior Depth Information (Paper) — 2020

Super resolution

Image enhancing

Pipelines:

From black and white choppy video to 4K 60 FPS (Article).

  • [Resolution] ERSGAN Augmentation de résolution (Paper) — 2018
  • [Colorisation] DeOldify (Article) — 2018
  • [FPS] Depth-Aware Video Frame Interpolation (Article) — 2019

Tools:

Replace face in a video (Deep fake).

  • DeepFaceLab (Repo Github) — 2018
    Modèles : Quick96, SEAHD, FANSeg, XSeg.

Natural language processing

NLP allows a model to mimic human comprehension of words, or at least analyse the text structure and words positions in a sentence to extract a meaning. For different purposes like translation, relation extraction, summarisation, named entity extraction, …

Source : http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/

Models:

  • Universal Sentence Encoder (Paper) — 2018
  • BERT by Google (Blog, paper) — 2018
  • RoBERTa by Facebook (Article, code, paper) — 2019
  • CamenBERT (Paper) — 2019
  • GPT-2 by OpenAI (Article, code, paper) — 2019
  • XLNet (Paper) — 2019
  • ALBERT (Paper) — 2019
  • T5 by Google (Article, code, paper) — 2019
  • ELECTRA by Google (Article, code, paper) — 2020
  • Reformer, a more powerful version (O(L2) vers O(L log L)) of Transformer (Paper) — 2020
  • [Chatbot] Meena (Paper) — 2020
  • Ensemble of BERT models, centralised by Google (Code) — 2020
  • MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers (Article, paper) — 2020
  • GPT-3 by OpenAI (API, paper) — 2020

Datasets:

  • Wikipedia articles (Article)
  • C4, Common Crawl’s web crawl corpus on TensorFlow (Article)
  • CORD-19, COVID-19 Open Research Dataset (Article)
  • Taskmaster-2 (Article)
  • The Big Bad NLP Database (Website)

Tools:

References:

Pose estimation

The image above is reflecting well the idea of pose estimation. Provide a “skeleton” of a person, a hand, or something else. Each junction (extremity of each segment) represents here, a joint.

Source : https://www.youtube.com/watch?v=mxKlUO_tjcg

Models:

3D Hand pose estimation (from RGB image)

  • Using a single RGB frame for real time 3D hand pose estimation in the wild (Paper) — 2017
  • REGNet (Website) — 2018
  • 3D Hand Shape and Pose Estimation from a Single RGB Image (Website) — 2019
  • Hand Landmark de Google (Article) — 2019

2D Hand pose estimation (from RGB image)

  • Attention! (Paper) — 2020

2D Human pose estimation (from RGB image)

  • Towards Accurate Multi-person Pose Estimation in the Wild (Paper) — 2017

3D Human pose estimation (from RGB image)

3D Face mesh generation (from RGB image)

Datasets:

Outliers

An ensemble of articles out of category, until they are part of a category.

Tools:

Motion transfert

Predict molecules to create new antibiotics

  • Chemprop (Article) — 2020
    Model type : Ensemble of GNNs

Convert brain waves to sentences

Discovering machine learning algorithms from scratch

  • AutoML-Zero: Evolving Machine Learning Algorithms From Scratch (Paper) — 2020

Data augmentation for 3D cloud points

  • Improving 3D Object Detection through Progressive Population Based Augmentation (Article, paper) — 2020

Anomaly detection

Visual object tracking

  • RANet: Ranking Attention Network for Fast Video Object Segmentation (Paper) —2019

Datasets:

  • Waymo open dataset (lidar/ radar data) (Website) — 2019
  • Russian Open Speech To Text (STT/ASR) Dataset (Code)

AI Solutions

  • Doc.ai — A solution for tracking your health
  • PacketAI.co —” ROI driven ITOps solution”
  • Blazar.ai — “Immunotherapies using the immune system to fight cancer“
  • UpStride.io — “Train with up to 10x less data”
  • Flowlity.com — “Your Supply Chain, Simplified. Synchronized. Reinvented.”
  • iRhythm.com — Mobile cardiac telemetry
  • Synthesia.io — “… a powerful tool to create engaging video content without the need for actors, film crews and studios.”

--

--

Steve Gabet

Data scientist working for Data League in Paris.