Posts

Volunteering and mentoring

Over the past few months, I've been spending a lot of my time volunteering and giving back to the community. The highlight? Mentoring high school students and having career conversations with local school kids. A huge shoutout to: • Mountain House High School for hosting Mountain Hacks 2024 • CSBase organizers for putting together a Climate Hackathon

Research review - 'Clues in Tweets: Twitter-Guided Discovery and Analysis of SMS Spam'

 Review of the research paper ' Clues in Tweets: Twitter-Guided Discovery and Analysis of SMSSpam, 2022 ' As is often the case with spam, collecting the dataset is very challenging. This paper proposes a novel idea to collect and keep updating the spam dataset - leverage Social media chatter to track sms spam trends. The authors have observed that some Twitter users voluntarily post the spam messages that they receive on Twitter and try to warn others. The authors have built an automation pipeline called SpamHunter that compiles such Tweets and extracts the spam message content. Their dataset is published at https://sites.google.com/corp/view/twitterspamsms . With this dataset, the authors have tried to categorize the type of spam. This section was a good read. The authors also try to see if the spammy URLs in their dataset could be used to stop spam in real-time. Seems possible. I found their evaluation study on anti-spam services faulty. The evaluation framework and test set ...

Research review - 'Jettisoning Junk Messaging in the Era of End-to-End Encryption: A Case Study of WhatsApp'

Review of the research paper " Jettisoning Junk Messaging in the Era of End-to-End Encryption: A Case Study of WhatsApp " Overview This paper does a case study of spam identification in WhatsApp. WhatsApp is a particularly interesting case because it offers end-to-end encryption, which means the message content is not accessible to WhatsApp servers. At a high level, the paper describes the spam data that they have collected, makes some observations on the data, then goes on to propose techniques for spam identification. Data I was curious to see how the authors obtained data for the study. The paper claims that the data was obtained from 'public' WhatsApp groups. It defines Public' WhatsApp groups as openly accessible groups, frequently publicised on well known websites, and typically themed around particular topics, like politics, football, music, etc. WhatsApp FAQs do not explain it using the exact terminology but I found the steps to create one described here h...