William Spoth

Computer Scientist with a passion for learning, building, and everything Raspberry Pi

About Me

I am currently a PhD student at the University at Buffalo, where my primary research area is Databases. My focus being schema resolution and discovery for semi-structured data, and data fusion. This allows me to explore my second hobby, machine learning, as well as research various niche algorithms and their application. When I'm away from my computer, I enjoy cooking, exercising, and watching basketball.

Education

Doctor of Philosophy - Computer Science

University at Buffalo Expected May 2021

Masters of Science - Computer Science

University at Buffalo December 2018

GPA: 3.95/4.0
Coursework: Applied Cryptography, Computer Security, Differential Privacy, Languages and Runtime for Big Data, Data-Oriented Computing, Algorithms for Mordern Computing Systems, Software Engineering Concepts, Pattern Recognition, Computer Architecture, Data Intensive Computing.

Bachelor of Science - Computer Science

University at Buffalo May 2016

GPA: 3.64/4.0
Coursework: Database Concepts, Machine Learning, Operating Systems, Realtime Embeded Systems, Theory of Computation, Programming Languages, Algorithms, Data Structures.

Bachelor of Arts - Psychology

University at Buffalo May 2016

GPA: 3.64/4.0

Awards/
Accomplishments

Teaching Assistant of the Year (2018): Helping project teams implement and understand core database concepts and create an autograder to assess correctness, performance, and report meaningful feedback.
Undergraduate Research Assistant (2013-2016): Provide research ideas and help in testing.
Magna Cum Laude
Eagle Scout Scholarship Award

Projects

Schema Drill

Querying JSON data is especially difficult due to its lack of global schema, multiple record versionings, heterogenous records, and optional fields. JSON data piplines often require extensive cleaning and manipluation that resembles python more than it does SQL, which shouldn't be the case. Schema Drill takes in a set of JSON records, and instead of outputting one single schema that poorly fits the dataset, outputs a small number of schemas that better matches data relationships, as well as partitions disimiliar records. These tighter schemas avoid many of the "IF JSON contains PATH" expressions that are often required by domain experts during preprocessing.

GitHub Paper

json-schema-scala

A scala native json-schema draft-07 parser using FastParse. This tools shreds json-schemas into easy to manipluate scala objects that can be quickly serialized and deserialized. This tool additionally supports JSON validation and calculates the number of possible accepted schemas for a subset of the specification.

GitHub

MESS

Full database delpoyments are over-kill for most user needs as exibited by Microsoft Excel's per user market dominance. However simple tasks such as importing, joining, and programatically manipulating multiple csv files often requires more expertice than the average Excel user has. MESS aims to allow non-experts to easily manipulate, clean, and combine both csv and JSON data, without the learning curve of databases.

Adaptive Databases

Relational databases are some of the most notoriously fickle pieces of user facing software on the market. Beyond the common gripes of the uninitiated, preforming innocuous tasks such as fusing relational and JSON data becomes seemingly impossible. Often requiring inline IF statements, sanity/null checks, and a mish-mosh of type conversions. Adaptive databases attempt to fuse the structured and performant world of relational databases, with the unstructed and user friendly world of No-SQL. By removing strict typing, predicting joins, and handle index maintence behind the scenes, we bridge both the preformance and usability gap between these two worlds.

Paper