
Which is the better feature extractor?īy setting include_top=False, you can get 256-dim ( MusicTaggerCNN) or 32-dim ( MusicTaggerCRNN) feature representation. To reduce the size, change number of feature maps of convolution layers. I would use MusicTaggerCRNN after downsizing it to, like, 0.2M parameters (then the training time would be similar to MusicTaggerCNN) in general. If you wanna train by yourself, it's up to you. Therefore, if you just wanna use the pre-trained weights, use MusicTaggerCNN. With MusicTaggerCNN, you will see the performance decrease if you reduce down the parameters. The MusicTaggerCRNN still works quite well in the case - i.e., the current setting is a little bit rich (or redundant). Actually you can even decreases the number of feature maps.
Memory Usage: MusicTaggerCRNN have smaller number of trainable parameters. Prediction: They are more or less the same. Training: MusicTaggerCNN is faster than MusicTaggerCRNN (wall-clock time). UPDATE: The most efficient computation, use compact_cnn. '60s', 'rnb', 'indie pop', 'sad', 'House', 'happy'] Which is the better predictor? 'sexy', 'catchy', 'funk', 'electro', 'heavy metal', 'Progressive rock', 'female vocalist', 'guitar', 'Hip-Hop', '70s', 'party', 'country', 'easy listening', 'punk', 'oldies', 'blues', 'hard rock', 'ambient', 'acoustic', 'experimental', 'chillout', 'male vocalists', 'classic rock', 'soul', 'indie rock', 'dance', '00s', 'alternative rock', 'jazz', 'beautiful', 'metal', I would recommend you to crawl your colleagues.Ĭonvnet: Automatic Tagging using Deep Convolutional Neural Networks, Keunwoo Choi, George Fazekas, Mark Sandlerġ7th International Society for Music Information Retrieval Conference, New York, USA, 2016ĬonvRNN : Convolutional Recurrent Neural Networks for Music Classification, Keunwoo Choi, George Fazekas, Mark Sandler, Kyunghyun Cho, arXiv:1609.[ 'rock', 'pop', 'alternative', 'indie', 'electronic', 'female vocalists', Audio file: find someone around you who happened to have the preview clips. A repo for split setting for an identical setting of experiments in two papers. It includes some results that are not in the paper. Also please take a look on the slide at ismir 2016. If the dimension size would not matter, it's worth choosing 256-dim ones. Probably the 256-dim features are redundant (which then you can reduce them down effectively with PCA), or they just include more information than 32-dim ones (e.g., features in different hierarchical levels). 05 (dim: 32->24) - which don't seem good enough. I thought of using PCA to reduce the dimension more, but ended up not applying it because mean(abs(recovered - original) / original) are.
I haven't looked into 256-dim feature but only 32-dim features. In general, I would recommend to use MusicTaggerCRNN and 32-dim feature as for predicting 50 tags, 256 features actually sound bit too large.
[ 'rock', 'pop', 'alternative', 'indie', 'electronic', 'female vocalists',