emoji-learn

What is emoji-learn? πŸ€”

emoji-learns is a simple library to create neural networks - in Emojicode!
How can this not be awesome??

Motivation πŸ’ͺ🏼

Screenshot 5

Look and Feel πŸ‘€

So…how does machine learning in Emojicode actually look like?? Here are some screenshots to save you an extra click to my GitHub repo:

Screenshot 1

Screenshot 2

Yeah, it really is something.

Prerequisites πŸ“

Emojicode version 0.9+ is required. Go here to find out how to install it.
If you want to find out how to use Emojicode go through the excellent language reference or package index.

After you cloned the repository you at first have to build the numlol package. Go to packages/numlol and run the command

emojicodec -p numlol numlol.emojic

This will generate the numlol package.The emoji-learn package depends on this. So afterwards go to packages/emoji-learn and run

emojicodec -p emoji-learn -S /path/to/GitRepo/emoji-learn/packages emoji-learn.emojic

Documentation πŸ€“

numlol πŸ’―

Machine learning and especially neural nets rely heavily on the use of linear algebra (LinAlg) - we need to add, subtract and multiply matrices and vectors. Unfortunately, the only data structure available in the standard Emojicode package that comes close to what we need are lists.

πŸ’­ A list of integers
🍨 1 2 3 4 5πŸ†

Because of this emoji-learn ships with a dedicated LinAlg-library called numlol (not to be confused with the imposters from numpy).

Creating a new numlol array is as simple as one-two-three, just hand over an embedded list to its constructor.

πŸ’­ Create a 3*2 matrix and store it in the variable 'matrix'
  πŸ†•πŸŽπŸ†• 🍨🍨 1.0 2.0 πŸ†
            🍨 3.0 4.0 πŸ†
            🍨 5.0 6.0 πŸ†πŸ† β—βž‘οΈ matrix
            
πŸ’­ Create a 3*1 vector and store it in the variable 'vector'
  πŸ†•πŸŽπŸ†• 🍨🍨 1.0 πŸ†
            🍨 3.0 πŸ†
            🍨 5.0 πŸ†πŸ† β—βž‘οΈ vector

In the numlol package all kind of matrix and vector operations are implemented: (Note: most methods are implemented as static methods so that the method name itself is followed by πŸ‡πŸŽ)

Want to add two vectors? No problem!

πŸ’­ Elementwise addition of two arrays with the same shape
πŸ‹πŸ‡πŸŽ array01 array02β—βž‘οΈ sum_array

You suddenly need to transpose a matrix? Here you go!

πŸ’­ Transpose array
πŸ”πŸ‡πŸŽ arrayβ—βž‘οΈ transposed_array

It is necessary to perform a matrix-matrix/matrix-vector multiplication? Sure!

πŸ’­ Do matrix-matrix/matrix-vector multiplication
πŸ₯πŸ‡πŸŽ array01 array02β—βž‘οΈ multiplied_array

It also contains some more complicated functions we will need for machine learning later, e.g. the mean squared error function (πŸ‘¬) or the logistic activation function (πŸ“Έ).

It is also easy to print arrays to the console.

πŸ’­ Print array to console
πŸ“  array❗

This will produce an output similar to the following

Screenshot 4

emoji-learn πŸ“š

Ah, machine learning! The stuff all the cool kids do. So let’s dive right into it.

First of all we need data. A lot of it. At the moment the only way to get data in our net is via a .csv file. So your data should look like this

Screenshot 3

Then this file content can be transferred into a numlol array with just one method call

πŸ’­ Read .csv file into numlol array
πŸ¦‹πŸ‡πŸ•Έ πŸ”€datasets/iris.csvπŸ”€β— ➑️ data

This method does not just read the data, but already normalizes the features and performs a one-hot encoding on the labels. You don’t have any datasets you can try out? No problem, we got you! There are already some toy datasets included.

πŸ’­ Read the iris dataset into a numlol array
πŸŒΊπŸ‡πŸ•Έ ❗ ➑️ data

πŸ’­ Read the Pima Indians diabetes dataset into a numlol array
πŸ›ΆπŸ‡πŸ•Έ ❗ ➑️ data

πŸ’­ Read the sonar dataset into a numlol array
πŸš’πŸ‡πŸ•Έ ❗ ➑️ data

Unfortunately an Emojicode method cannot return multiple values. Therefore β€˜data’ is a value type that holds the features and the labels. There are two methods to access them both.

πŸ’­ Access features and labels from the 'data' value type
🐬 data❗ ➑️ X
🦈 data❗ ➑️ y

Next step is to split the data in a train and a test set. There is also a handy method for this.

πŸ’­ Split in train and test data
πŸ’­ 0.333 means that (about) a third of all the data will be in the test data
πŸ…πŸ‡πŸ•Έ X y 0.333❗ ➑️ train_test_data

Afterwards we can create our net. The constructor for the net expects just one parameter, a list with integers where each integer defines the number of neurons in each layer.

πŸ’­ Create a neural net. This one has:
πŸ’­ - 60 neurons in the input layer
πŸ’­ - 12 neurons in the hidden layer
πŸ’­ - 2 neurons in the output layer
πŸ’­ The number of hidden layers is NOT limited to one.
πŸ’­ It can be arbitrary (big)
πŸ†•πŸ•ΈπŸ†• 🍨 60 12 2πŸ† β—βž‘οΈ neural_net

And now the magic takes place, we can train the net!

πŸ’­ Train the net with the given data over 100 epochs with a learning rate of 0.1
πŸ¦„ neural_net X_train y_train 100 0.1❗

To evaluate the performance we can finally calculate and print the accuracy.

πŸ’­ Train the net with the given data over 100 epochs with a learning rate of 0.1
  🦍 neural_net X_test y_test❗ ➑️ accuracy_test
  πŸ˜€πŸͺ πŸ”€Accuracy test data: πŸ”€ πŸ”‘accuracy_testβœ–οΈ100 2β—πŸ”€%πŸ”€ πŸͺ❗

This method can then be used to make predictions.

πŸ’­ Get the 3rd sample from the array X, make a prediction and print the result
πŸ“  πŸ¦• neural_net πŸ₯«πŸ‡πŸŽ X 3❗❗❗

Performance πŸ“ˆ

So, how does an Emojicode net performs on different datasets? I made a 10 fold crossvalidation training each net over 100 epochs, and here are the results (accuracies):

X Iris dataset Sonar dataset Pima Indian diabetes dataset
Train 0.966 +/- 0.011 0.9986 +/- 0.0030 0.833 +/- 0.015
Test 0.980 +/- 0.023 0.784 +/- 0.029 0.746 +/- 0.026

Wow, not too bad!