The Impact of Programming Language’s Type on Probabilistic Machine Learning Models

Publication Year2021

0
Citations
334
Usage
0
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Usage
334
- Downloads
  248
- Abstract Views
  86

Thesis / Dissertation Description

Software development is an expensive and difficult process. Mistakes can be easily made, and without extensive review process, those mistakes can make it to the production code and may have unintended disastrous consequences.This is why various automated code review services have arisen in the recent years. From AWS’s CodeGuro and Microsoft’s Code Analysis to more integrated code assistants, like IntelliCode and auto completion tools. All of which are designed to help and assist the developers with their work and help catch overlooked bugs.Thanks to recent advances in machine learning, these services have grown tremen- dously in sophistication to a point where they can catch bugs that often go unnoticed even with traditional code reviews.This project investigates the use of code2vec [1], which is a probabilistic machine learning model on source code, in correctly labeling methods from different program- ming language families. We extend this model to work with more languages, train the created models, and compare the performance of static and dynamic languages.As a by-product we create new datasets from the top stared open source GitHub projects in various languages. Different approaches for static and dynamic languages are applied, as well as some improvement techniques, like transfer learning. Finally, different parsers were used to see their effect on the model’s performance.

Bibliographic Details

DOI10.31979/etd.ferw-3a7j

REPOSITORY URLhttps://scholarworks.sjsu.edu/etd_projects/1050

URL IDhttps://scholarworks.sjsu.edu/etd_projects/1050; http://dx.doi.org/10.31979/etd.ferw-3a7j; https://scholarworks.sjsu.edu/cgi/viewcontent.cgi?article=2051&context=etd_projects; https://dx.doi.org/10.31979/etd.ferw-3a7j; https://scholarworks.sjsu.edu/etd_projects/1050/

AUTHOR(S)

Sherif Elsaid

PUBLISHER(S)

San Jose State University Library

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know