Abstract
In this project we’ve implemented a music-generating neural network based on Variational Autoencoder (VAE) architecture. The network was trained on a dataset with 500 MIDI jazz files, which were represented as piano rolls.
Model
Data Representation
Each Midi file is converted to a piano-roll: a matrix representation of music with time and pitch axes.
Then, the piano-roll is divided to slices with the same size. The slices are paired to input and expected output pairs. The expected output (label) is the input piano roll shifted by a few notes.
This way, the network learns to continue the input, while also learning the structure and timely features of each piano-roll.
The Network
The VAE is an neural network architecture which learns the probability distribution of the data. During training, the network encodes the input to mean and variance layers, and then reconstructs the input back from a sample of the estimated distribution based on the mean and variance layers.
After training, the network is capable of generating it’s own data- in our case, original jazz songs.
The network also consists of 2D Convolutional layers – for learning patterns in the piano-rolls.
Results
The network has composed by now 20 songs. Example for a note sheet of one of the songs:
Link to another song: https://www.youtube.com/watch?v=nlk7_X97vS0

