Type in a word, select a language (only spanish and english supported), and see what words are most commonly associated with it in Twitter.
When you submit a search, a Python serverless function is triggered that uses the Twitter Search API to collect the ~500 most recent tweets (maximum history available is ~7 days) that meet the criteria and returns a clean list of all the individual words (or tokens) used in the tweets—removing punctuation, common stopwords and the search term provided. Then, I plot a bubble chart with the counts of each word using the nivo data viz library for React—which I hadn’t used before and really liked!
I made it as an exercise to learn more about AWS Lambda serverless functions—taking advantage of the twitter search python code I had written and covered previously in this spanish post. This was my first time making/deploying serverless functions (outside of following along a tutorial), and was very keen to see how I could use them to integrate Python code—which I’m most familiar with and would like to continue using for data science stuff—with frontend development frameworks like React and the JAMstack in general. The process was made pretty smooth thanks to the python-lambda package, and my only struggle was figuring out the CORS issues (short story: handle OPTIONS pre-flight request as part of the function’s logic). You can see the code for the lambda function in this repo.
I then deployed this with the AWS API Gateway, so that I’d had a URL endpoint I could send HTTP requests to run the code, and turned that into a GraphQL node with Apollo server as show in my other previous post (also in spanish) so that I could use the Apollo client to query the data while also “protecting” the API endpoint. I’d like to make a follow up post to this one with the “making off’, so stay tuned!
I think there’s a lot of cool stuff you can do with this approach, and I’m interested to hear your opinions in the comments. I’d like to train a text classifier with spaCy, something like giving users the ability to search for a term and have the model classify the results into buckets like positive-neutral-negative, politics-sports-tech-other, etc. I don’t think it would take much more than what I did here, besides spending a couple hours hand labeling tweets 🤯.
Tell me what you think!