2016-08-20

Our Technical Chair, Nick Vasiloglou, recently interviewed Alex Korbonits, Data Scientist at Remitly, about his thoughts on artificial intelligence as it relates to the arts.

NV) What kind of features do you need to use when you train a model that generates art?

AK) Art is perceived. Since AlexNet in 2012, deep neural networks have come to be synonymous with the state-of-the-art in computer vision and a range of other perception tasks, pushing the boundaries of what machine learning models can achieve much further than ever before. Hand-crafted features are out and distributed feature representations are in. In other words, you don’t create specific features, instead, you create the model architecture. Stephen Merity made an excellent point about this recently in a blog post: “In deep learning, architecture engineering is the new feature engineering“.

In that sense, the question then becomes: what kinds of architectures do you need to use when you train a model that generates art? First of all, if you want to generate art you’d better use a generative model :). After that your choice of architecture should follow the data it’s modeling. For visual art you probably would want convolutions. For music you’d want recurrence. For film you’d want both.

NV) Give us an overview of generative models that have been successful in generating art.

AK) There have been so many exciting and interesting projects in this space within the last year alone that I sadly have to limit myself to just the ones I know.

First of all, Google’s “Deep Dream” took the world by storm last summer. Google took their well-understood discriminative classifier, the Inception network built for the 2014 ImageNet competition (also known as GoogLeNet in an homage to Yann LeCun’s LeNet), and decided to go deeper, as it were, by building a generative visualization tool to give intuition for the kinds of features/concepts the classifier was learning at different layers of the network. They wrote up a blockbuster Google Research blog post about it, along with releasing a gitHub repo and an iPython Notebook demonstrating Deep Dream. Entire startups/homespun projects were created around Deep Dream to take any image and effectively create a version of it that would be reminiscent of a scene from Fear and Loathing in Las Vegas. The horror, the horror.

Second, Andrej Karpathy released his own code in a gitHub repo for a character-level recurrent neural network implementation (called char-rnn) in Torch whose creative properties he demonstrated in a very popular blog post. Among other things, he trained a character-level LSTM (long short term memory network) on different corpora, including War and Peace, the completed works of Shakespeare, an open source set of LaTeX files of papers in algebraic geometry, and the source code for Linux. This kicked off a series of humorous web applications such as an ironical clickbait headline generator reminiscent of BuzzFeed as well as myriad Twitter bots generating sophisticated fake tweets for any given well-known persona for whom it is easy to mock ;). Traditionally, a lot of the generative models used for this kind of thing were simple markov chains, but LSTMs learn long-term dependencies that are comparatively impressive. Even training an LSTM on James Joyce’s Ulysses doesn’t look too far off from the real thing.

Third, an interesting area of using generative models to create art is in music. There’s a lot of work to be done here in terms of richness and training on audio recordings, but some low-hanging fruit has already been found. A couple of implementations of this that I know of thus far: (1) using an LSTM (even Karpathy’s char-rnn) to take music with a text encoding (such as MIDI files) to generate music from training data (which I had some fun playing with a year ago training on MIDI files of Beethoven piano sonatas but didn’t post); and (2) an awesome blog post at Daniel Johnson’s blog hexahedria, wherein he describes an implementation of what he calls a “Biaxial Recurrent Neural Network for Music Composition” that seems to generate results far superior to mine.

Fourth, I would be remiss not to discuss the well-known “A Neural Algorithm of Artistic Style”. This has now spawned an app called Prisma that, while not quite as white-hot popular as Pokemon Go, is nonetheless very much in vogue as you are reading this. A generative model is trained on a base image and a style image, whose output is a composition that resembles the base image in the “style” of the style image. Now you can even transfer style while keeping the original colors, which was not the case in the original implementation. Functionally this means that, for example, I can take a normal photograph of the Space Needle in Seattle and generate a version of it inspired by a favorite cubist painting by Braque.

Last, and perhaps most recently, we saw that on Ars Technica a group of artists released a short film called Sunspring whose script was entirely composed by “an AI” (which I have a hunch was trained on a derivative of Karpathy’s char-rnn), trained on a large corpora of science fiction scripts and presumably given different plot-related prompts/primers to generate short sequences that made up the entirety of the final script. We now see real artists using state-of-the-art machine learning models to assist in the creative process. How cool is that?

NV) Can deep learning embeddings provide dimensions that are associated with art measures, that humans pick up?

AK) Yes.

A recent paper, “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings” explores this idea. First, the authors determine semantically meaningful directions within the embeddings to locate and expose bias. They then exploit the existence of these directions to combat bias. Super cool.

In this sense, I think it is totally possible to create/discover dimensions/directions within an embedding that are associated with human-interpretable art measures. You could stay within the medium you’ve embedded to explore and interpret the space — e.g., paintings — or you could marry up your embedding of paintings with some word embeddings (perhaps via descriptions of your paintings or painting metadata) to help understand how you’re moving around it. Chris Moody from Stitch Fix gave a great talk at Data Day Seattle this summer combining word embeddings and images of clothing items to make recommendations to customers. There’s no reason you couldn’t do this with art. For example, you could explain the artistic/aesthetic differences in style between two similar paintings by different artists (such as cubist portraits by Picasso and Braque) by comparing color palette, brushstrokes, shading, or other aspects you want to inspect along the relevant directions in your embedding.

NV) Is there a word2vec embedding for art so that you can transform a piece of music by just adding a vector to it?

AK) Mozart + Metallica – Beatles = Beethoven? As long as you can take a piece of music and properly model it, there’s no reason in principle why you couldn’t embed a specific piece of music into some kind of metric space (either as a single point or perhaps more intuitively as a sequence of points) that you could then manipulate within that space to transform along specific directions. E.g., you could modulate the key of a piece or a specific passage from major to minor, or perhaps along a direction that changes the instrument playing from clarinet to oboe.

I haven’t come across music2vec yet but I eagerly await its arrival. I imagine that this would come about via sequence-to-sequence models such as LSTMs since music, like text, is inherently sequential.

For visual art embeddings you could use convolutional neural networks.

I think autoencoders could be useful here too.

NV) If we can computerize art by massively generating it automatically, don’t we create an inflation? What is the value of artificial art? Isn’t art supposed to be rare and unique?

AK) This question was posed 80 years ago in a very famous essay titled “The Work of Art in the Age of Mechanical Reproduction” by Walter Benjamin, wherein the art historical properties of prints of famous works of art are examined. At issue was the value of art in an age where it was — all of the sudden — possible to print a poster of Picasso’s Les Demoiselles d’Avignon you could sell to anyone trying to find something to put up on the walls of their bedrooms, dorm rooms, apartments, homes, offices, restaurants… you name it. The mechanical reproduction of art doesn’t devalue art itself simply by virtue of increasing access and awareness of rare and unique art. However, it does affect the typical experience of art from active engagement to passive consumption.

Another famous take on this question is in Clement Greenberg’s essay “Avante-garde and Kitsch“, wherein two diametrically opposed categories of art — avante-garde art and kitsch art — are described and contrasted to highlight the purposes and properties of each.

Greenberg paraphrases Aristotle in suggesting that if all art and literature are imitation (of reality), then avante-garde art is the imitation of imitation: it is art concerned with the process of creating art for art itself, independent of external meaning. He then contrasts this with kitsch by saying that kitsch is not concerned with the process of art but with the effect of art (on a consumer). At first glance, it would seem as though generating art with artificial intelligence is sort of both avante-garde and kitsch. However, this process is very mechanistic and literally formulaic. We’re not at a point where ML models are all of the sudden generating art that is as original or as emblematic of artistic genius as, e.g., the first Pollock. Sure, generative machine learning models trained on art are mimetic w.r.t. the process of creating art (e.g., DRAW: A Recurrent Neural Network For Image Generation), but these models create art that has the effect of looking like art we already know about. I.e., it’s definitely kitschy to generate a Van Gogh styled photo of a landscape someone took on their cell phone. Or a Warhol or Lichtenstein selfie. At this point, it’s kitschy to generate a work of art end-to-end from a model.

Let me be clear: kitschy art generated by a model is not artificial art. It’s art. Just because it is possible to massively generate art in an automatic way does not mean that all other art is thereby devalued. If anything, I would argue that creating huge quantities of kitschy art helps highlight the uniqueness, rarity, and value of art that is not automatically generated.

We’re beginning to see the power of artists using artificial intelligence as a part of the overall creative/artistic process. As we saw with Sunspring, it’s possible to use AI as a tool in this process, not to replace and automate the process itself. You can use AI as part of the process of creating art and still be avante-garde. AI is adding value to the creation of new art and is highlighting the importance of art in a society that otherwise seems to be increasingly and singularly obsessed with STEM.

This is an exciting time where new tools are being developed and artists are trying them out. Can’t wait to see what comes next.



Alex Korbonits is a Data Scientist at Remitly, Inc., where he works extensively on feature extraction and putting machine learning models into production. Outside of work, he loves Kaggle competitions, is diving deep into topological data analysis, and is exploring machine learning on GPUs. Alex is a graduate of the University of Chicago with degrees in Mathematics and Economics.

The post Interview With Alex Korbonits, Data Scientist, Remitly appeared first on The Machine Learning Conference.

Show more