emoji-learn
What is emoji-learn? π€
emoji-learns is a simple library to create neural networks - in Emojicode!
How can this not be awesome??
Motivation πͺπΌ
Look and Feel π
Soβ¦how does machine learning in Emojicode actually look like?? Here are some screenshots to save you an extra click to my GitHub repo:
Yeah, it really is something.
Prerequisites π
Emojicode version 0.9+ is required. Go here to find out how to install it.
If you want to find out how to use Emojicode go through the excellent language reference or package index.
After you cloned the repository you at first have to build the numlol package. Go to packages/numlol and run the command
emojicodec -p numlol numlol.emojic
This will generate the numlol package.The emoji-learn package depends on this. So afterwards go to packages/emoji-learn and run
emojicodec -p emoji-learn -S /path/to/GitRepo/emoji-learn/packages emoji-learn.emojic
Documentation π€
numlol π―
Machine learning and especially neural nets rely heavily on the use of linear algebra (LinAlg) - we need to add, subtract and multiply matrices and vectors. Unfortunately, the only data structure available in the standard Emojicode package that comes close to what we need are lists.
π A list of integers
π¨ 1 2 3 4 5π
Because of this emoji-learn ships with a dedicated LinAlg-library called numlol (not to be confused with the imposters from numpy).
Creating a new numlol array is as simple as one-two-three, just hand over an embedded list to its constructor.
π Create a 3*2 matrix and store it in the variable 'matrix'
πππ π¨π¨ 1.0 2.0 π
π¨ 3.0 4.0 π
π¨ 5.0 6.0 ππ ββ‘οΈ matrix
π Create a 3*1 vector and store it in the variable 'vector'
πππ π¨π¨ 1.0 π
π¨ 3.0 π
π¨ 5.0 ππ ββ‘οΈ vector
In the numlol package all kind of matrix and vector operations are implemented: (Note: most methods are implemented as static methods so that the method name itself is followed by ππ)
Want to add two vectors? No problem!
π Elementwise addition of two arrays with the same shape
πππ array01 array02ββ‘οΈ sum_array
You suddenly need to transpose a matrix? Here you go!
π Transpose array
πππ arrayββ‘οΈ transposed_array
It is necessary to perform a matrix-matrix/matrix-vector multiplication? Sure!
π Do matrix-matrix/matrix-vector multiplication
π₯ππ array01 array02ββ‘οΈ multiplied_array
It also contains some more complicated functions we will need for machine learning later, e.g. the mean squared error function (π¬) or the logistic activation function (πΈ).
It is also easy to print arrays to the console.
π Print array to console
π arrayβ
This will produce an output similar to the following
emoji-learn π
Ah, machine learning! The stuff all the cool kids do. So letβs dive right into it.
First of all we need data. A lot of it. At the moment the only way to get data in our net is via a .csv file. So your data should look like this
Then this file content can be transferred into a numlol array with just one method call
π Read .csv file into numlol array
π¦ππΈ π€datasets/iris.csvπ€β β‘οΈ data
This method does not just read the data, but already normalizes the features and performs a one-hot encoding on the labels. You donβt have any datasets you can try out? No problem, we got you! There are already some toy datasets included.
π Read the iris dataset into a numlol array
πΊππΈ β β‘οΈ data
π Read the Pima Indians diabetes dataset into a numlol array
πΆππΈ β β‘οΈ data
π Read the sonar dataset into a numlol array
π’ππΈ β β‘οΈ data
Unfortunately an Emojicode method cannot return multiple values. Therefore βdataβ is a value type that holds the features and the labels. There are two methods to access them both.
π Access features and labels from the 'data' value type
π¬ dataβ β‘οΈ X
π¦ dataβ β‘οΈ y
Next step is to split the data in a train and a test set. There is also a handy method for this.
π Split in train and test data
π 0.333 means that (about) a third of all the data will be in the test data
π
ππΈ X y 0.333β β‘οΈ train_test_data
Afterwards we can create our net. The constructor for the net expects just one parameter, a list with integers where each integer defines the number of neurons in each layer.
π Create a neural net. This one has:
π - 60 neurons in the input layer
π - 12 neurons in the hidden layer
π - 2 neurons in the output layer
π The number of hidden layers is NOT limited to one.
π It can be arbitrary (big)
ππΈπ π¨ 60 12 2π ββ‘οΈ neural_net
And now the magic takes place, we can train the net!
π Train the net with the given data over 100 epochs with a learning rate of 0.1
π¦ neural_net X_train y_train 100 0.1β
To evaluate the performance we can finally calculate and print the accuracy.
π Train the net with the given data over 100 epochs with a learning rate of 0.1
π¦ neural_net X_test y_testβ β‘οΈ accuracy_test
ππͺ π€Accuracy test data: π€ π‘accuracy_testβοΈ100 2βπ€%π€ πͺβ
This method can then be used to make predictions.
π Get the 3rd sample from the array X, make a prediction and print the result
π π¦ neural_net π₯«ππ X 3βββ
Performance π
So, how does an Emojicode net performs on different datasets? I made a 10 fold crossvalidation training each net over 100 epochs, and here are the results (accuracies):
X | Iris dataset | Sonar dataset | Pima Indian diabetes dataset |
---|---|---|---|
Train | 0.966 +/- 0.011 | 0.9986 +/- 0.0030 | 0.833 +/- 0.015 |
Test | 0.980 +/- 0.023 | 0.784 +/- 0.029 | 0.746 +/- 0.026 |
Wow, not too bad!