Hai. I'm Aalok, an undergraduate student at the University of Richmond ('21), originally from Pune, India. This is my homepage on the web.
$ whoami | more
I am interested in computer science, math, linguistics, cognitive science, individually, as well as, in an increasing amount, in the intersection of all of these. As a somewhat related and parent field, I have also begun exploring 'complex systems', i.e., looking at questions such as how complex behavior emerges in systems made of relatively simple units. Other than these topics, I also find philosophy, music, journalism, etc. interesting, and occasionally take a course or two in them.
In my free time I find myself occupied in Hindustani classical music, football (soccer), bicycling, hiking, table tennis, clicking pictures, typesetting documents in LaTeX, and writing FOSS software.
Code-mixed and Multilingual NLI
libcolgraphA speedy and memory-efficient C++ based graphs library wrapped to provide a Python interface as well as a web frontend to construct, analyze, and visualize graphs of vertex colorings or 'coloring graphs' for spectral graph theory research. New OEIS sequences are added with the help of this library: A307334, A309315, A309379, A309380. Presented at Shenandoah Undergraduate Mathematics and Statistics conference (SUMS) 2019.
WikiFactCheck-en; Automated Fact-Checking of Claims from Wikipedia.We create a first of its kind fact-checking and natural language inference dataset using claims, context, and cited evidence from Wikipedia. We demonstrate that existing popular methods and models, such as those trained on the SNLI (Stanford NLI) dataset don't really shine on the WikiNLI dataset, highlighting the need for further exploration of methods of training NLI models. Full paper at LREC 2020, Marseilles, France, May 2020.
PURSUE Other Race Effect (ORE)I've been involved with the PURSUE project ever since I joined this lab. I help out by maintaining and debugging existing code, writing new code for new experiments to run, running experiments on human participants including capping and gelling them, and analyzing behavioral and physiological data. Part of a poster at the 2018 Annual APS Convention, San Francisco, CA, that presented partial results from the other race effect, a claimed effect by which we are more familiar with faces of our own race and may have biases towards faces of other races, demonstrably through behavioral data.
PURSUE Visual Perception of WordsPrimary student researcher on this project. This project deals with understanding how we visually process words. Whereas existing studies have relied on solely using the lexical decision task, this study attempts to collect behavioral as well as ERP data for phonological decision, and semantic decision one-back tasks
Automatic Laughter IdentificationAn extension of the existing Distant Viewing toolkit (dvt) to develop machinery to enable analysis of audio space of digital media. Development of a handy library to enable rapid and modular model training to identify specific audio events based on annotated data, and various algorithms for smoothing over raw predictions to get reliable predictions for digital humanities and cultural analytics. As an example use case, we demonstrate identifying laughter in the sitcom Friends using methods such as CNN-based deep learning, transfer learning using Google's VGGish, and logistic regression. We see a peak accuracy of about 90% using a transfer learning model.
pyMediaAnnotatorA Python and VLC-based GTK application to hand-annotate media with events precisely to the millisecond for machine learning tasks. License: GPL 3+.
Generating and recognizing facesPlaying around with the Chicago face database to train neural networks to identify various attributes about faces, such as gender, race, and emotion. Peak accuracy was achieved for gender at .96, race at .89, and emotion at .76. GANs were trained on the faces to reconstruct faces selectively for certain attributes. This helps to visualize hotspots for certain features that correlate with those attributes. For instance, one of the most telling features for gender was hairstyle. The more you train the GAN, the more telling about features it gets.
cfd-readerA tiny Python package to help sort through the Chicago Face Database given downloaded data, and to efficiently supply it as numpy arrays to your machine learning setup.
SanskritIPAA rule-based system for transcribing Sanskrit from Devanagari (the default script Sanskrit is written using) to IPA (the International Phonetic Alphabet). This new method is more or less lossless, unlike many other prevailing methods and applications on the internet. It preserves the phonological and prosodic features that depend on a sound's neighborhood. The algorithm uses an adaptation of the existing 'WWG' algorithm originally developed to syllabify Sinhalese. A FOSS Python program that is a demonstrative implementation of this system accompanies the system (GPL v3+). Presented at LREC-W26-CCURL 2018, Miyazaki, Japan.
Authorship detection using Markov model with varying contextIn this project I explored authorship detection using word N-grams, and looked at how changing the value of N (i.e., the context used to generate transition probabilities in the model for each word) affected the classification accuracy. Additionally, I modified a 'fall-back' to improve out-of-vocabulary prediction behavior in case a word was not present in the model based on the training data. It is generally seen that too little or too much context are both harmful, as one causes the model to be underspecified, whereas the other makes it overfit. The optimal context amount depends on the application domain.
Pet projects and misc.
Terminal connect 4A Python-based highly customizable Connect4 game with graphical UI within the terminal, constructed using fixed width spacing and the POSIX terminal color specifications. You can play in 2-player mode, with a board of custom height and width. You can change the winning condition to make it Connect-X, for X-in-a-row. If you're feeling adventurous, you can use the betting mode to bet on your move. Built for SpiderHacks 2019.
'epgen' random episode pickerThis program uses the TVDB API to pick a random episode from a TV show or otherwise some series of your choosing. Particularly helpful when you want to randomly rewatch episodes of the series but your selection isn't so random and you end up rewatching the same episodes over and over again.
Introductory Python workshop using BinderThis workshop was taught at SpiderHacks 2019 and is in the form of an iPython notebook that attendees could readily load into Binder and execute remotely, without the need for any installation.
Spring 2020 (University of Edinburgh) (current)
I'm involved with the ACM chapter at UR. As the ACM Chapter, we conduct events, such as the annual hackthon SpiderHacks, tech talks, alumni in tech visits, and workshops for beginners.
I enjoy coding solutions to competitive programming puzzles and problems. I was on a team with two others to compete at the ACM-ICPC 2018 regionals; we finished first at the testing site (Christopher Newport University), and 20th overall in the region (Mid-Atlantic USA). I highly recommend taking a shot at the ICPC because it is really fun and will keep your algorithms skills on their toes (and that helps almost everywhere).
After having participated in the Panini Linguistics Olympiad (PLO) and the International Linguistics Olympiad (IOL), I have been contributing to the PLO as a jury and problem design committee member, as well as to the new Asia-Pacific Linguistics Olympiad (APLO), also as a jury and problem design committee member. Being involved with PLO also means I help out with mentoring the selected team for the IOL.
Board game nights
The UR Math/CS department holds weekly board game nights on Tuesdays at 5:30pm, at Jepson 212C. We play strategy and logic-based board games rather than overly randomness-based board games such as Monopoly. Here are some of my personal favorites:
- Ricochet Robots
- Liar's Dice
- Resistance: Avalon
- 3D Connect 4
I like to bike around. Fortunately, both, Pune and Richmond are relatively bike-friendly towns (although Pune is getting worse at this metric). You can find some of my favorite/frequented routes on Strava.
I and a friend started a student organization called UR Pickup Soccer, which helps organize weekly pick-up soccer games every Friday evening, coordinate when people are available to play, make field reservations and procure equipment. It's going slow amidst all the other work and academic commitments, but it's something and it's helping.
A blog with some friends accessible here.
Get in touch!
If you prefer to directly email me, please use email@example.com
Alternatively, you can securely message me on Keybase chat. You can simply message me there, you do not need an account! You can also use Keybase to encrypt your message to me and then use the form above to email it to me if you prefer.
Please address it to
UR 2171, 410 Westhampton Way
University of Richmond, VA 23173, USA
Please request using one of the above-mentioned methods of reaching me, or append '/cv' to the URL above ('aalok-sathe.gitlab.io') to display it in a browser page. Alternatively, please visit the Gitlab repository 'cv' under my username for a compiled pdf.
Want a webpage like this for yourself?
Clicking on the hexagonal 'source' icon at the bottom of this page will lead
you to a repository called
simple-personal-website, which is a
stripped-down bare bones version of this webpage. Feel free to fork off of
it and build your own custom site with little effort; the code for rendering
logic and deployment is already built.
Thanks to @shardulc for the inspiration behind this design.