Deep learning watch, live article.

Summary

  1. Object recognition
  2. Audio synthesis
  3. Image generation
  4. Natural language processing
  5. Pose estimation
  6. Outliers
  7. AI solutions

Object recognition

Allows object recognition in a scene, it can be 2D classification, 2D detection (bounding boxes), segmentation (predict pixel by pixel which class an object belongs to), 3D detection (3D bounding boxes with orientation in space), etc…

Models:

Object classification (from RGB image)

  • [Palm] BlazePalm by Google (Article) — 2019
  • [Face] BlazeFace by Google (Article, Paper) — 2019
  • [Solar panel] SolarNet (Paper) — 2019

Tools:

Object detection and segmentation

  • Detectron (Article, papers, code) — 2018
    Modèles : Mask R-CNN, RetinaNet, Faster R-CNN, RPN, Fast R-CNN, R-FCN
  • Learning in the Frequency Domain (Paper) — 2020

Datasets:

Audio synthesis

The idea is to generate sound, it can be voice or music for exemple.

Source : https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html

Models:

Mel spectrogram

  • WaveGlow by NVidia (Made from Glow and WaveNet. Article) — 2018
  • Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (Made from Tacotron 2. Paper, article) — 2018

Datasets:

Tools:

“Deep neural network that can generate 4-minute musical compositions with 10 different instruments”

  • MuseNet by OpenAI (Article) — 2019

Image generation

Same idea than the previous section, but here, it will be visual 2D content. On a image or video format.

Source : https://deepdreamgenerator.com/

Models:

2D

  • Depth Map Estimation of Dynamic Scenes Using
    Prior Depth Information (Paper) — 2020

Pipelines:

From black and white choppy video to 4K 60 FPS (Article).

  • [Resolution] ERSGAN Augmentation de résolution (Paper) — 2018
  • [Colorisation] DeOldify (Article) — 2018
  • [FPS] Depth-Aware Video Frame Interpolation (Article) — 2019

Tools:

Replace face in a video (Deep fake).

  • DeepFaceLab (Repo Github) — 2018
    Modèles : Quick96, SEAHD, FANSeg, XSeg.

Natural language processing

NLP allows a model to mimic human comprehension of words, or at least analyse the text structure and words positions in a sentence to extract a meaning. For different purposes like translation, relation extraction, summarisation, named entity extraction, …

Source : http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/

Models:

  • Universal Sentence Encoder (Paper) — 2018
  • BERT by Google (Blog, paper) — 2018
  • RoBERTa by Facebook (Article, code, paper) — 2019
  • CamenBERT (Paper) — 2019
  • GPT-2 by OpenAI (Article, code, paper) — 2019
  • XLNet (Paper) — 2019
  • ALBERT (Paper) — 2019
  • T5 by Google (Article, code, paper) — 2019
  • ELECTRA by Google (Article, code, paper) — 2020
  • Reformer, a more powerful version (O(L2) vers O(L log L)) of Transformer (Paper) — 2020
  • [Chatbot] Meena (Paper) — 2020
  • Ensemble of BERT models, centralised by Google (Code) — 2020
  • MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers (Article, paper) — 2020
  • GPT-3 by OpenAI (API, paper) — 2020

Datasets:

  • Wikipedia articles (Article)
  • C4, Common Crawl’s web crawl corpus on TensorFlow (Article)
  • CORD-19, COVID-19 Open Research Dataset (Article)
  • Taskmaster-2 (Article)
  • The Big Bad NLP Database (Website)

Tools:

References:

Pose estimation

The image above is reflecting well the idea of pose estimation. Provide a “skeleton” of a person, a hand, or something else. Each junction (extremity of each segment) represents here, a joint.

Source : https://www.youtube.com/watch?v=mxKlUO_tjcg

Models:

3D Hand pose estimation (from RGB image)

  • Using a single RGB frame for real time 3D hand pose estimation in the wild (Paper) — 2017
  • REGNet (Website) — 2018
  • 3D Hand Shape and Pose Estimation from a Single RGB Image (Website) — 2019
  • Hand Landmark de Google (Article) — 2019
  • Attention! (Paper) — 2020
  • Towards Accurate Multi-person Pose Estimation in the Wild (Paper) — 2017

Datasets:

Outliers

An ensemble of articles out of category, until they are part of a category.

Tools:

Motion transfert

  • Chemprop (Article) — 2020
    Model type : Ensemble of GNNs
  • AutoML-Zero: Evolving Machine Learning Algorithms From Scratch (Paper) — 2020
  • Improving 3D Object Detection through Progressive Population Based Augmentation (Article, paper) — 2020
  • RANet: Ranking Attention Network for Fast Video Object Segmentation (Paper) —2019

Datasets:

  • Waymo open dataset (lidar/ radar data) (Website) — 2019
  • Russian Open Speech To Text (STT/ASR) Dataset (Code)

AI Solutions

  • Doc.ai — A solution for tracking your health
  • PacketAI.co —” ROI driven ITOps solution”
  • Blazar.ai — “Immunotherapies using the immune system to fight cancer“
  • UpStride.io — “Train with up to 10x less data”
  • Flowlity.com — “Your Supply Chain, Simplified. Synchronized. Reinvented.”
  • iRhythm.com — Mobile cardiac telemetry
  • Synthesia.io — “… a powerful tool to create engaging video content without the need for actors, film crews and studios.”

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store