Image classification is the oldest problem in Computer Vision, with the first network being AlexNet and the latest being the EfficientNetv2. Today, with all the state-of-the-art models available a click away, it becomes a herculean task to test every model and then, choose the best one. We shall cover everything from model selection dilemma to the finetuning frenzy one finds themselves in.
While, there are Transformer-based model available, the convolutional networks are still in demand because of the computational simplicity w.r.t. to transformers and the long usage of them in image domain has led to some strong practices in place…
Everybody today is finding themselves at the crossroads of data science where Python becomes as necessary as air itself. What is the best way to learn Python?
I learned Python 8 years ago and I have loved/lived it since. But, today I found the best way to learn Python. For the seasoned Pythonistas, it will refresh our memories as to why we came to love/live it the first time.
Without further ado, I present to you.
We all love PyTorch for many obvious reasons (i.e. ease of implementing our ideas). But, sometimes you run into an error:
CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.90 GiB total capacity; 15.17 GiB already allocated; 15.88 MiB free; 15.18 GiB reserved in total by PyTorch)
And then, starts the pain of trying to resolve this. What we start doing is to reduce our
batch_size . And yes, that should work but again you get the above error. Why?
Because the last time, you tried to run the code, the model got loaded into GPU memory…
Reddit is a goldmine for free-form text. Recently on Kaggle, a trending dataset has been Reddit Vaccine Myths. It includes the titles of the various myths propagated in this subreddit along with the number of comments on them. A worthy data to run some analysis on!
You can follow step-by-step by using this Kaggle notebook I created.
First, let’s read our data.
import pandas as pd
data = pd.read_csv('../input/reddit-vaccine-myths/reddit_vm.csv')
Now’s let’s explore the various columns it has in order for us to run an analysis on it.
Index(['title', 'score', 'id', 'url', 'comms_num', 'created', 'body', 'timestamp'], dtype='object')
We all love and use Pandas. It is our daily companion while we, data explorers, go on our journey deep into the mysteries of the data. Usually, we work with a huge amount of data and would love it if our
apply functions would run fast.
Well, wait no more, I present to you, Pandarallel.
We’ll see how we can go from this:
PyTorch is highly appreciated by researchers for its flexibility and has found its way into mainstream industries that want to stay abreast of the latest groundbreaking research.
In short, if you are a deep learning practitioner, you are going to be face to face with PyTorch sooner or later.
Today, I am going to cover some tricks that will greatly reduce the training time for your PyTorch models.
To load data for our models, we use
torch.utils.data.DataLoader, which creates a Python iterable over your dataset.
Let’s take a look at its signature:
DataLoader(dataset, batch_size=1, shuffle=False, sampler=None…
One would hope that at least NLP libraries will have instructions up-to-date to use them. While work is regularly going on AllenNLP (given their progenitor i.e. Allen Institute’s Longformer fame), it will do good to update their documentation also.
Information gain is what runs decision trees in the background. All decision trees take information gain as a deciding factor to branch upon an attribute. Without information gain, all decision trees would end up being very deep but USELESS!
Then, that begs the question that: What is information gain? But, before we even jump into that. There is a prerequisite for understanding Information Gain i.e. Entropy. To simply define entropy, you can say it is the measure of randomness inside a system. To further read upon Entropy, please read this.
Now, you have the background on entropy, let’s dive deep…
Machine learning is the bag of techniques that you can use to teach a machine to do some task like it is a dog or cat (classification) or what will be the temperature tomorrow (regression). Deep learning is doing machine learning but sticking specifically to using brain-like structures i.e. neural networks. At the core of it lies our inability today to analyze the big data we generate on a daily basis. In short, we need the machines to do our work. We are good at finding patterns in things, we just want the same from machines.
There are four types…
Pandas is a library that we all stumble upon at some point in our day as a data scientist. A quick refresher of it definitely feels like sipping a Virgin Mojito on a hot summer day or a cuppa joe when it snows. Let’s get started.
import pandas as pd
Who came up with pd, nobody knows. But, this is the face of pandas you look at in all the codes.
Dataframes are the instruments with which you can frame your data as you like to derive value from it, either in the form of models or analysis/visualizations you derive.
I am a Data Scientist. I like to write about concepts related to deep learning.