profile picture

$ whoami

Hai. I'm Aalok, an undergraduate student at the University of Richmond ('21), originally from Pune, India. This is my homepage on the web.

$ whoami | more

I am interested in computer science, math, linguistics, cognitive science, individually, as well as, in an increasing amount, in the intersection of all of these. As a somewhat related and parent field, I have also begun exploring 'complex systems', i.e., looking at questions such as how complex behavior emerges in systems made of relatively simple units. Other than these topics, I also find philosophy, music, journalism, etc. interesting, and occasionally take a course or two in them.

In my free time I find myself occupied in Hindustani classical music, football (soccer), bicycling, hiking, table tennis, clicking pictures, typesetting documents in LaTeX, and writing FOSS software.

Some of the current and past projects and research I've been involved with (click to reveal).
NLP and AI group
May 2020—present
Microsoft Research
  • Code-mixed and Multilingual NLI

Coloring Graphs group
Apr 2019—present
University of Richmond
  • libcolgraph
    A speedy and memory-efficient C++ based graphs library wrapped to provide a Python interface as well as a web frontend to construct, analyze, and visualize graphs of vertex colorings or 'coloring graphs' for spectral graph theory research. New OEIS sequences are added with the help of this library: A307334, A309315, A309379, A309380. Presented at Shenandoah Undergraduate Mathematics and Statistics conference (SUMS) 2019.
Argument Mining group
Dec 2018—present
University of Richmond
  • WikiFactCheck-en; Automated Fact-Checking of Claims from Wikipedia.
    We create a first of its kind fact-checking and natural language inference dataset using claims, context, and cited evidence from Wikipedia. We demonstrate that existing popular methods and models, such as those trained on the SNLI (Stanford NLI) dataset don't really shine on the WikiNLI dataset, highlighting the need for further exploration of methods of training NLI models. Full paper at LREC 2020, Marseilles, France, May 2020.
'Beyond Categories' Cognitive Neuroscience lab
Sep 2017—present
University of Richmond, Claremont McKenna College, Hampshire College
  • PURSUE Other Race Effect (ORE)
    I've been involved with the PURSUE project ever since I joined this lab. I help out by maintaining and debugging existing code, writing new code for new experiments to run, running experiments on human participants including capping and gelling them, and analyzing behavioral and physiological data. Part of a poster at the 2018 Annual APS Convention, San Francisco, CA, that presented partial results from the other race effect, a claimed effect by which we are more familiar with faces of our own race and may have biases towards faces of other races, demonstrably through behavioral data.
  • PURSUE Visual Perception of Words
    Primary student researcher on this project. This project deals with understanding how we visually process words. Whereas existing studies have relied on solely using the lexical decision task, this study attempts to collect behavioral as well as ERP data for phonological decision, and semantic decision one-back tasks
Distant Viewing lab
Aug 2018—Jul 2019
University of Richmond
  • Automatic Laughter Identification
    An extension of the existing Distant Viewing toolkit (dvt) to develop machinery to enable analysis of audio space of digital media. Development of a handy library to enable rapid and modular model training to identify specific audio events based on annotated data, and various algorithms for smoothing over raw predictions to get reliable predictions for digital humanities and cultural analytics. As an example use case, we demonstrate identifying laughter in the sitcom Friends using methods such as CNN-based deep learning, transfer learning using Google's VGGish, and logistic regression. We see a peak accuracy of about 90% using a transfer learning model.
  • pyMediaAnnotator
    A Python and VLC-based GTK application to hand-annotate media with events precisely to the millisecond for machine learning tasks. License: GPL 3+.
Chicago Faces
May 2018—Jul 2018
  • Generating and recognizing faces
    Playing around with the Chicago face database to train neural networks to identify various attributes about faces, such as gender, race, and emotion. Peak accuracy was achieved for gender at .96, race at .89, and emotion at .76. GANs were trained on the faces to reconstruct faces selectively for certain attributes. This helps to visualize hotspots for certain features that correlate with those attributes. For instance, one of the most telling features for gender was hairstyle. The more you train the GAN, the more telling about features it gets.
  • cfd-reader
    A tiny Python package to help sort through the Chicago Face Database given downloaded data, and to efficiently supply it as numpy arrays to your machine learning setup.
Sanskrit orthography
Nov 2017—May 2018
  • SanskritIPA
    A rule-based system for transcribing Sanskrit from Devanagari (the default script Sanskrit is written using) to IPA (the International Phonetic Alphabet). This new method is more or less lossless, unlike many other prevailing methods and applications on the internet. It preserves the phonological and prosodic features that depend on a sound's neighborhood. The algorithm uses an adaptation of the existing 'WWG' algorithm originally developed to syllabify Sinhalese. A FOSS Python program that is a demonstrative implementation of this system accompanies the system (GPL v3+). Presented at LREC-W26-CCURL 2018, Miyazaki, Japan.
Authorship detection
May 2016—Aug 2016
National Chemical Laboratory, Pune
  • Authorship detection using Markov model with varying context
    In this project I explored authorship detection using word N-grams, and looked at how changing the value of N (i.e., the context used to generate transition probabilities in the model for each word) affected the classification accuracy. Additionally, I modified a 'fall-back' to improve out-of-vocabulary prediction behavior in case a word was not present in the model based on the training data. It is generally seen that too little or too much context are both harmful, as one causes the model to be underspecified, whereas the other makes it overfit. The optimal context amount depends on the application domain.

Pet projects and misc.

A list of the courses I've taken, along with some thoughts or specialities about them (click to reveal).

Spring 2020 (University of Edinburgh) (current)

INFR 11157 Natural Language Understanding, Generation, and Machine Translation
in prog.
INFR 10059 Theoretical Computer Science
in prog.
LASC 10018 Simulating Language Evolution
in prog.
INFR 11022 Distributed Systems
in prog.
LASC Historical Linguistics (audit)
in prog.

Fall 2019

CS 326 Simulation (Hons.)
coming soon
Math 320 Real Analysis
coming soon
Math 395 Number Theory
coming soon
Psyc 333 Cognitive Science
coming soon
Ling 340 Prosody and Syntax in Vedic Sanskrit
coming soon
Music Ensemble 196 Schola Cantorum (choir)
coming soon

Spring 2019

CS 395 Advanced Algorithms (Hons.)
A course with a mixture of topics from typical graduate-level courses on randomized algorithms, approximation algorithms, and extended complexity theory
CS 301 Computer Organization
Typical course in computer architecture and design of processors
Phil 272 Modern Western Philosophy
An introduction to the Western philosophical tradition exploring a variety of modern topics including the philosophy of mind, the philosophy of science, free will, etc., from Aristotle to Hume, Descartes, Elisabeth, Bacon, Kant and beyond
Music Ensemble 196 Schola Cantorum (choir)
I was a second bass singer in a mixed auditioned choir of about 32-40 students. We toured and sang a repertoire in Slovenia, Croatia, and Italy, and performed the world premier of Reena Esmail's composition `she will transform you'.
CS 340 Speech and Audio Processing Research (independent)
I worked with Dr Taylor Arnold on the Distant Viewing project to research and develop a toolkit for audio event detection
CS 340 Fact Checking and Argument Mining Research (independent)
I worked with Dr Jon Park on the WikiFactCheck-en project. Published in LREC 2020, Marseilles, France.

Fall 2018

CS 315 Algorithms
A thorough introduction to basic algorithms with a taste of complexity theory and intractability
Math 245 Linear Algebra
A typical linear algebra course
Psyc 200 Methods and Analyses
Statistical methods for the social sciences
Classics 105 Introduction to Syntax
Syntax in natural languages; syntax trees; X-bar theory
CS 340 Machine Learning for NLP (independent)
Working through part 2 of Jurafsky and Marting (3rd ed.); a first introduction to Neural Networks

Spring 2018

CS 222 Discrete Structures
Proofwriting + intructor-chosen topics such as RSA and graph theory
Math 300 Fundamentals of Abstract Math
A first course in mathematical proofwriting
FYS 100 'Groupology' Group Dynamics
Group dynamics and social psychology. We studied how groups of people behave as a system
Ling 350 Introductory Linguistics
An introductory course in linguistics giving an overview of the salient fields, culminating with a research paper
CS 240 Software Systems Development
A thorough introduction to software development through C++, UNIX, testing, and using tools such as git, gdb, valgrind
CS 340 Introductory NLP (independent)
Worked through the foundational chapters from Jurafsky and Martin (3rd ed.)

Fall 2017

CS 221 Data Structures
This course is generally called CS2, since it comes after the introductory CS class, 'CS1'. The course was a good overview of data structures and algorithms. It helped me pay attention to the details of implementation, and really think about the costs of various operations involving data structures when designing and coding algorithms for specific tasks. It introduced me to big-oh much more rigorously, rather than my self-taught general idea, and taught me to think about ease of implementation and performance trade-offs.
Math 235 Multivariate Calculus
A good, basic extension of high school calculus (calc1 and calc2). The course made me more confident of dealing with multiple variables and dealing with integrals and derivatives over them.
Psyc 100 Introduction to Psychological Science
This was my largest course at UR by class size, by far. It was a good general-purpose tour of the scope of modern psychology and its recent developments. The two co-instructors did a great job of covering most important areas. Naturally, they focused on their own specialities more than other arbitrary areas.
FYS 100 Civic Journalism and Social Justice
I took this course as part of a first-year general education requirement, but ended up quite enjoying it. It put me into the habit of keeping up to date with the happenings in the world, as well as taught the basics of news writing, which, when I now read news articles, jumps out at me. The course had an interesting selection of readings, including books about the social issues of America with far-reaching historical roots.
MSAP 160 Voice
My first ever Western voice class; or any Western music class at all. Switching notations from Hindustani to Western was a challenge, but it was rewarding to get introduced to a whole new kind of classical music. I wonder what other kinds of classical music traditions there are, if any.

ACM Chapter

I'm involved with the ACM chapter at UR. As the ACM Chapter, we conduct events, such as the annual hackthon SpiderHacks, tech talks, alumni in tech visits, and workshops for beginners.

Competitive programming

I enjoy coding solutions to competitive programming puzzles and problems. I was on a team with two others to compete at the ACM-ICPC 2018 regionals; we finished first at the testing site (Christopher Newport University), and 20th overall in the region (Mid-Atlantic USA). I highly recommend taking a shot at the ICPC because it is really fun and will keep your algorithms skills on their toes (and that helps almost everywhere).

Linguistics olympiad

After having participated in the Panini Linguistics Olympiad (PLO) and the International Linguistics Olympiad (IOL), I have been contributing to the PLO as a jury and problem design committee member, as well as to the new Asia-Pacific Linguistics Olympiad (APLO), also as a jury and problem design committee member. Being involved with PLO also means I help out with mentoring the selected team for the IOL.

Board game nights

The UR Math/CS department holds weekly board game nights on Tuesdays at 5:30pm, at Jepson 212C. We play strategy and logic-based board games rather than overly randomness-based board games such as Monopoly. Here are some of my personal favorites:


I like to bike around. Fortunately, both, Pune and Richmond are relatively bike-friendly towns (although Pune is getting worse at this metric). You can find some of my favorite/frequented routes on Strava.

Football (Soccer)

I and a friend started a student organization called UR Pickup Soccer, which helps organize weekly pick-up soccer games every Friday evening, coordinate when people are available to play, make field reservations and procure equipment. It's going slow amidst all the other work and academic commitments, but it's something and it's helping.


A blog with some friends accessible here.

All images are (C) all rights reserved.

Get in touch!


If you prefer to directly email me, please use


Alternatively, you can securely message me on Keybase chat. You can simply message me there, you do not need an account! You can also use Keybase to encrypt your message to me and then use the form above to email it to me if you prefer.


Please address it to

Aalok Sathe

UR 2171, 410 Westhampton Way

University of Richmond, VA 23173, USA

Resume, CV

Please request using one of the above-mentioned methods of reaching me, or append '/cv' to the URL above ('') to display it in a browser page. Alternatively, please visit the Gitlab repository 'cv' under my username for a compiled pdf.

Want a webpage like this for yourself?

Clicking on the hexagonal 'source' icon at the bottom of this page will lead you to a repository called simple-personal-website, which is a stripped-down bare bones version of this webpage. Feel free to fork off of it and build your own custom site with little effort; the code for rendering logic and deployment is already built. Thanks to @shardulc for the inspiration behind this design.