2016 -
Research Scientist at OpenAI
Deep Learning, Generative Models, Reinforcement Learning
Summer 2015
DeepMind Internship
Deep Reinforcement Learning group
Summer 2013
Google Research Internship
Large-Scale Supervised Deep Learning for Videos.
2011 - 2015
Stanford Computer Science Ph.D. student
Deep Learning, Computer Vision, Natural Language Processing. My adviser was Fei-Fei Li.
Summer 2011
Google Research Internship
Large-Scale Unsupervised Deep Learning for Videos.
2009-2011
University of British Columbia: Master's Degree
I worked with Michiel van de Panne on learning Compositional Controllers for Physically-simulated Articulate Figures.
2005-2009
University of Toronto: Bachelor's Degree
Double major in Computer Science and Physics.
Bio. As of Summer 2016 I am a Research Scientist at OpenAI working on Deep Learning, Generative Models and Reinforcement Learning. Previously I was a Computer Science PhD student at Stanford, working with Fei-Fei Li. My research centered around Deep Learning and its applications in Computer Vision, Natural Language Processing and their intersection. In particular, I was interested in fully end-to-end learning with Convolutional/Recurrent Neural Networks architectures and recent advances in Deep Reinforcement Learning. Over the course of my PhD I squeezed in two internships at Google where I worked on large-scale feature learning over YouTube videos, and last summer I interned at DeepMind and worked on Deep Reinforcement Learning and Generative Models. Together with Fei-Fei, I designed and taught a new Stanford undergraduate-level class on Convolutional Neural Networks for Visual Recognition (CS231n). The class was the first Deep Learning course offering at Stanford and has grown from 150 enrolled students last year to 330 students this year.

On a side for fun I blog, tweet, and maintain several Deep Learning libraries written in Javascript (e.g. ConvNetJS, RecurrentJS, REINFORCEjs, t-sneJS). I am also sometimes jokingly referred to as the reference human for ImageNet (post :)), and I create those nice-looking conference proceedings LDA visualization pages each year (NIPS 2015 example). I also recently expanded on this with arxiv-sanity.com, which lets you search and sort through 20,000+ Arxiv papers on Machine Learning over the last 3 years in the same pretty format.

Publications

DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Efficiently identify and caption all the things in an image with a single forward pass of a network. Our model is fully differentiable and trained end-to-end without any pipelines. The model is also very efficient (processes a 720x600 image in only 240ms), and evaluation on a large-scale dataset of 94,000 images and 4,100,000 region captions shows that it outperforms baselines based on previous approaches.
Justin Johnson*, Andrej Karpathy*, Li Fei-Fei
CVPR 2016 (Oral)
Visualizing and Understanding Recurrent Networks
We study both qualitatively and quantitatively the performance improvements of Recurrent Networks in Language Modeling tasks compared to finite-horizon models. Our analysis sheds light on the source of improvements , and identifies areas for further potential gains. Among some fun results we find LSTM cells that keep track of long-range dependencies such as line lengths, quotes and brackets.
Andrej Karpathy*, Justin Johnson*, Li Fei-Fei
ICLR 2016 Workshop
Deep Visual-Semantic Alignments for Generating Image Descriptions
We present a model that generates natural language descriptions of full images and their regions. For generating sentences about a given image region we describe a Multimodal Recurrent Neural Network architecture. For inferring the latent alignments between segments of sentences and regions of images we describe a model based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding. This work was also featured in a recent New York Times article.
Andrej Karpathy, Li Fei-Fei
CVPR 2015 (Oral)
ImageNet Large Scale Visual Recognition Challenge
Everything you wanted to know about ILSVRC: data collection, results, trends, current computer vision accuracy, even a stab at computer vision vs. human vision accuracy -- all here! My own contribution to this work were the human accuracy evaluation experiments.
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei
IJCV 2015
Deep Fragment Embeddings for Bidirectional Image-Sentence Mapping
We train a multi-modal embedding to associate fragments of images (objects) and sentences (noun and verb phrases) with a structured, max-margin objective. Our model enables efficient and interpretible retrieval of images from sentence descriptions (and vice versa).
Andrej Karpathy, Armand Joulin, Li Fei-Fei
NIPS 2014
Large-Scale Video Classification with Convolutional Neural Networks
We introduce Sports-1M: a dataset of 1.1 million YouTube videos with 487 classes of Sport. This dataset allowed us to train large Convolutional Neural Networks that learn spatio-temporal features from video rather than single, static images.
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei
CVPR 2014 (Oral)
Grounded Compositional Semantics for Finding and Describing Images with Sentences
Our model learns to associate images and sentences in a common We use a Recursive Neural Network to compute representation for sentences and a Convolutional Neural Network for images. We then learn a model that associates images and sentences through a structured, max-margin objective.
Richard Socher, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, Andrew Y. Ng
TACL 2013
Emergence of Object-Selective Features in Unsupervised Feature Learning
We introduce an unsupervised feature learning algorithm that is trained explicitly with k-means for simple cells and a form of agglomerative clustering for complex cells. When trained on a large dataset of YouTube frames, the algorithm automatically discovers semantic concepts, such as faces.
Adam Coates, Andrej Karpathy, Andrew Ng
NIPS 2012
Locomotion Skills for Simulated Quadrupeds
We develop an integrated set of gaits and skills for a physics-based simulation of a quadruped. The controllers use a representation based on gait graphs, a dual leg frame model, a flexible spine model, and the extensive use of internal virtual forces applied via the Jacobian transpose.
Stelian Coros, Andrej Karpathy, Benjamin Jones, Lionel Reveret, Michiel van de Panne
SIGGRAPH 2011
show more
Object Discovery in 3D scenes via Shape Analysis
Wouldn't it be great if our robots could drive around our environments and autonomously discovered and learned about objects? In this work we introduce a simple object discovery method that takes as input a scene mesh and outputs a ranked set of segments of the mesh that are likely to constitute objects.
Andrej Karpathy, Stephen Miller, Li Fei-Fei
ICRA 2013
Curriculum Learning for Motor Skills
My UBC Master's thesis project. My work was on curriculum learning for motor skills. In particular, I was working with a heavily underactuated (single joint) footed acrobot. The acrobot used a devised curriculum to learn a large variety of parameterized motor skill policies, skill connectivites, and also hierarchical skills that depended on previously acquired skills. Almost all of it from scratch. The project was heavily influenced by intuitions about human development and learning (i.e. trial and error learning, the idea of gradually building skill competencies). The ideas in this work were good, but at the time I wasn't savvy enough to formulate them in a mathematically elaborate way. The video is a fun watch!
Andrej Karpathy, Michiel van de Panne
AI 2012

Teaching

Winter 2015/2016: I was an intructor for CS231n: Convolutional Neural Networks for Visual Recognition.

Have a look at the class notes, the lecture slides on the course syllabus page, and on reddit r/cs231n. The lecture videos were available, but had to be taken down (we're working to bring them back).

Talks

CVPR 2016 Deep Learning Workshop

CVPR 2015 Oral

RE•WORK Deep Learning Summit 2016





Pet Projects

Arxiv Sanity Preserver
There are way too many Arxiv papers. This project is an attempt to make them searchable and sortable in the pretty interface. The sort by tfidf similarity feature works very well and can be quite useful. My aim is to expand on this project over time, e.g. add a social layer, or create custom paper classifiers / notifications, etc.
ConvNetJS
ConvNetJS is Deep Learning / Neural Networks library written entirely in Javascript. This enables nice web-based demos that train Convolutional Neural Networks (or ordinary ones) entirely in the browser. Many web demos included. I did an interview with Data Science Weekly about the library and some of its back story here.
REINFORCEjs
REINFORCEjs is a Reinforcement Learning library that implements several common RL algorithms supported with fun web demos. The library includes DP,TD,DQN algorithms and sketches of stochastic/deterministic Policy Gradients.
show more
ulogme
ulogme tracks your active windows / keystroke frequencies / notes throughout the entire day and visualizes the results in beautiful d3js timelines. Check out my blog post introducing the project to learn more.
Pretty Accepted Papers
I was dissatisfied with the format that conferences use to announce the list of accepted papers (e.g. NIPS2012 here). This led me to process the page into a much nicer and functional form, with LDA topic analysis etc. The page became quite popular so I continued to make it for NIPS 2013, CVPR 2014, NIPS2014, NIPS2015, CVPR 2015. Others have picked up the Github code and adapted it to ICML 2013 and CVPR 2013.
Research Lei
Research Lei is an Academic Papers Management and Discovery System. It helps researchers build, maintain, and explore academic literature more efficiently, in the browser. (deprecated since Microsoft Academic Search API was shut down :( )
ScholarOctopus
ScholarOctopus takes ~7000 papers from 34 ML/CV conferences (CVPR / NIPS / ICML / ICCV / ECCV / ICLR / BMVC) between 2006 and 2014 and visualizes them with t-SNE based on bigram tfidf vectors. In general, it should be much easier than it currently is to explore the academic literature, find related papers, etc. This hack is a small step in that direction at least for my bubble of related research.
tsnejs
tsnejs is a t-SNE visualization algorithm implementation in Javascript. I also computed an embedding for ImageNet validation images here. Pretty! You can also use tsnejs to embed (almost) arbitrary CSV data in this web interface.
iOS apps
I'we written an iOS app that helps people access and remember Rubik's Cube algorithms. I've later also ported it to Android. There's also my little humble 2-4 player iPad game called Loud Snakes :)
Glass Winners
This page was a fun hack. Google was inviting people to become Glass explorers through Twitter (#ifihadclass) and I set out to document the winners of the mysterious process for fun. I didn't expect that it would go on to explode on internet and get me mentions in TechCrunch, Verge, and many other places.
Tetris AI
I think I enjoy writing AIs for games more than I like playing games myself - Over the years I wrote several for World of Warcraft, Farmville, Chess, and Tetris. On somewhat related note, I also wrote a super-fun Multiplayer Co-op Tetris.
even more
Even more various crappy projects I've worked on long time ago.

Misc

My (mostly) Academic Blog. I wish all researchers had one.
Hacker's Guide to Neural Networks is my attempt at explaining Neural Nets from "Hacker's perspective", relying more on code and physical intuitions than mathematics. I wrote this because I felt there were many people (e.g. some software engineers) who were interested in Deep Nets but who lacked the mathematical background to learn the basics through the usual channels.
I helped create the Programming Assignments for Andrew Ng's CS229A (Machine Learning Online Class) - this was the precursor to Coursera. At UBC I also TA'd CPSC540 (Graduate Probabilistic Machine Learning) and three times UBC's CPSC 121 (Discrete Mathematics), where I taught at tutorials.
I like to go through classes on Coursera and Udacity. I usually look for courses that are taught by very good instructor on topics I know relatively little about. Last year I decided to also finish Genetics and Evolution (statement of accomplishmnet) and Epigenetics (statement, + my rough notes).
Find me on Twitter, Github, Google+, Goodreads.
A long time ago I was really into Rubik's Cubes. I learned to solve them in about 17 seconds and then, frustrated by lack of learning resources, created YouTube videos explaining the Speedcubing methods. These went on to become quite popular. There's also my cubing page badmephisto.com. Oh, and a video of me at a Rubik's cube competition :)
Advice for doing well in undergrad classes, for younglings.
In media:
- NVIDIA donating a DGX-1 to OpenAI. Official post, and one more.
- The New York Times article on using deep networks for automatically captioning images with sentences.
- Wired article on my efforts to evaluate human accuracy on ImageNet
- The Verge articles on NeuralTalk, first here and then here. Several inaccuracies, but by now quite used to it.
Still more unsorted misc
- an efficient, batched Python/numpy implementation of an LSTM forward and backward pass.
- a Minimal character-level Recurrent Neural Network language model, writted in Python/numpy. About 100 lines suffice! :)
::...
免责声明:
当前网页内容, 由 大妈 ZoomQuiet 使用工具: ScrapBook :: Firefox Extension 人工从互联网中收集并分享;
内容版权归原作者所有;
本人对内容的有效性/合法性不承担任何强制性责任.
若有不妥, 欢迎评注提醒:

或是邮件反馈可也:
askdama[AT]googlegroups.com


订阅 substack 体验古早写作:


点击注册~> 获得 100$ 体验券: DigitalOcean Referral Badge

关注公众号, 持续获得相关各种嗯哼:
zoomquiet


自怼圈/年度番新

DU22.4
关于 ~ DebugUself with DAMA ;-)
粤ICP备18025058号-1
公安备案号: 44049002000656 ...::