Description

Guide: Dr. Tushar Sharma (Assistant Professor, Faculty of Computer Science, Dalhousie University)

I worked at the SMART Lab led by Dr. Tushar Sharma (Assistant Professor) for more than a year. My experience here has helped me gain an immersive research experience and bolstered my theoretical concepts and fundamentals in Machine Learning.

The time I spent here had been dedicated to working on two projects:

  • Machine Learning Techniques Applied to Source Code  
    🔗 (https://www.sciencedirect.com/science/article/pii/S0164121223003291) [published at Journal of Systems and Software (Open Science)]  
     

    (1) I surveyed approximately 200 papers in the second phase of this study; paper wholly identifies 479 primary studies from both the phases for its further analysis. I worked on the categories of Program Comprehension, Program Synthesis, Quality Assessment, Refactoring, Testing and Vulnerability Analysis.

    (2) Summarized the findings highlighting the steps involving dataset collection and preparation, feature extraction and model training and tuning.

    (3) Sorted these studies on the basis of category and sub-category, as well as ML Techniques used.  
     
    Some of my key findings and learnings:

    Being the only undergraduate researcher in this project helped me step up my skills to deliver high-quality work. Reading and dissecting so many different papers changed my perspective about how literature should be read and analyzed, and what methods work best for me. When I read a paper with the intent to extract targeted and useful information, I understand the content of the subject-matter more in-depth. It also enabled me to understand the applications of Machine Learning in Software Engineering, and helped me understand how source code is assessed.

     
     

  • Trend analysis of popular Java repositories for analysis of Code Smell Lifetime  
    🔗 GitHub Repository  
     
    (1) Wrote script to download 994 Java repositories from GitHub onto a remote server.

    These repositories had the following selection criteria:

    at least 500 number of commits, at least 500 number of issues, at least 100 number of stars, last commit after 01/01/2021, excludes forks, has open issues, has license

    (2) Wrote a script that creates a folder structure wherein there is a folder created for each repository and then subfolders for every commit in that repository.

    (3) Wrote a script that runs DesigniteJava for every commit of every repository and stores results in the designated folders, and extended the functionality of DesigniteJava to include Test and Testability smells.

     
     
    Some of my key findings and learnings:

    This project taught me a lot about dealing with large amounts of data and the best ways to process it. It also helped me develop inquisitiveness, formulate ideas of my own and find out how to implement them. I also gained more proficiency on using Git and Linux commands through this project.