We all love PyTorch for many obvious reasons (i.e. ease of implementing our ideas). But, sometimes you run into an error:
CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.90 GiB total capacity; 15.17 GiB already allocated; 15.88 MiB free; 15.18 GiB reserved in total by PyTorch)
And then, starts the pain of trying to resolve this. What we start doing is to reduce our
batch_size . And yes, that should work but again you get the above error. Why?
Because the last time, you tried to run the code, the model got loaded into GPU memory…
Reddit is a goldmine for free-form text. Recently on Kaggle, a trending dataset has been Reddit Vaccine Myths. It includes the titles of the various myths propagated in this subreddit along with the number of comments on them. A worthy data to run some analysis on!
You can follow step-by-step by using this Kaggle notebook I created.
First, let’s read our data.
import pandas as pd
data = pd.read_csv('../input/reddit-vaccine-myths/reddit_vm.csv')
Now’s let’s explore the various columns it has in order for us to run an analysis on it.
Index(['title', 'score', 'id', 'url', 'comms_num', 'created', 'body', 'timestamp'], dtype='object')
We all love and use Pandas. It is our daily companion while we, data explorers, go on our journey deep into the mysteries of the data. Usually, we work with a huge amount of data and would love it if our
apply functions would run fast.
Well, wait no more, I present to you, Pandarallel.
We’ll see how we can go from this:
PyTorch is highly appreciated by researchers for its flexibility and has found its way into mainstream industries that want to stay abreast of the latest groundbreaking research.
In short, if you are a deep learning practitioner, you are going to be face to face with PyTorch sooner or later.
Today, I am going to cover some tricks that will greatly reduce the training time for your PyTorch models.
To load data for our models, we use
torch.utils.data.DataLoader, which creates a Python iterable over your dataset.
Let’s take a look at its signature:
DataLoader(dataset, batch_size=1, shuffle=False, sampler=None…
One would hope that at least NLP libraries will have instructions up-to-date to use them. While work is regularly going on AllenNLP (given their progenitor i.e. Allen Institute’s Longformer fame), it will do good to update their documentation also.
Information gain is what runs decision trees in the background. All decision trees take information gain as a deciding factor to branch upon an attribute. Without information gain, all decision trees would end up being very deep but USELESS!
Then, that begs the question that: What is information gain? But, before we even jump into that. There is a prerequisite for understanding Information Gain i.e. Entropy. To simply define entropy, you can say it is the measure of randomness inside a system. To further read upon Entropy, please read this.
Now, you have the background on entropy, let’s dive deep…
Machine learning is the bag of techniques that you can use to teach a machine to do some task like it is a dog or cat (classification) or what will be the temperature tomorrow (regression). Deep learning is doing machine learning but sticking specifically to using brain-like structures i.e. neural networks. At the core of it lies our inability today to analyze the big data we generate on a daily basis. In short, we need the machines to do our work. We are good at finding patterns in things, we just want the same from machines.
There are four types…
Pandas is a library that we all stumble upon at some point in our day as a data scientist. A quick refresher of it definitely feels like sipping a Virgin Mojito on a hot summer day or a cuppa joe when it snows. Let’s get started.
import pandas as pd
Who came up with pd, nobody knows. But, this is the face of pandas you look at in all the codes.
Dataframes are the instruments with which you can frame your data as you like to derive value from it, either in the form of models or analysis/visualizations you derive.
We ourselves feel that what we are doing is just a drop in the ocean. But the ocean would be less because of that missing drop.
— Mother Teresa
The hidden saints amongst us who work with meager means and are mostly overworked. Still, sporting a smile they go through helping anybody and everybody. Such is the magnanimity and the strength of heartfelt care for the woes of this world in the people I have come to work with recently.
I recently had the chance to work on an Omdena project with International Social Service (ISS). The goal was to…
To data scientists, Python is as essential as air. We walk, talk, think Python all day. But, Python is a vast ocean and we are bound to get lost sometimes. This article aims at being that lighthouse that will refresh your memory of where you are in the ocean of Python. Here’s a refresher of Python for all the data scientists out there.
I am a Data Scientist. I like to write about concepts related to deep learning.