Online Training: Parallel programming in Julia

16 Sep, 2020

Bioinformatics is evolving rapidly into a data-oriented field, especially with the rise of omics and multi-omics studies where a single experiment can produce a few gigabytes of data. A second field that also relies heavily on data is Artificial-intelligence, specifically, machine-learning and deep learning where huge datasets are usually needed to properly train these models. Over the last couple of years, others in the community and we, have been developing and applying AI and DL tools to analyze big biological datasets, with an extremally promising results.

Nevertheless, at a deeper level, the working horses of the analysis & modeling of big data is high performance computing, HPC. Currently, two languages are dominating the field of data analysis and modeling, namely Python and R, with a rich ecosystem of analysis libraries. However, a major drawback of these languages has been the performance. As both of these languages are interpreted languages they are not usually regarded as fast languages

Different solutions have been proposed to mitigate this problem. One is the development of a new programming language, and hence Julia was born. Julia is a modern programming language first released in 2012 that combines the readability and rapid prototyping of Python with the speed of C. It utilizes Just-in-time compilation using LLVM compiler, and offers support for different parallel-computing techniques, like multithreading, multiprocessing, and distributed computing. Even more interestingly, it can also use the rich ecosystem of other languages like, C, FORTRAN, R and Python through it is interfaces.

In the current training which was offered online by the Julia-academy, I got to know Julia better to understand its strength and limitation with regard to parallel computing. I also got to learn how to select the most suitable parallelism technique to use for the problem at hand and how to combine multiple-dispatch mechanisms in Julia with the correct data-structure to achieve high performance. I truly believe that these skills will help me built a more performant and maintainable code, especially as I am integrating more omics layers into my research to better understand the role of HLA-II in inflammatory bowel disease. 

Hesham El Abd