Parsing Git logs with regular expressions
One of the very common tasks in data science is parsing logs produced by some application. In this recipe, we will write a simple snippet that presents how we can analyze the contributions of committers to the Git repository.
Getting ready
In order to run this recipe, you need to have the DataFrames.jl
and DataFramesMeta.jl
packages installed. If they are missing run the following commands to add them:
julia> using Pkg julia> Pkg.add("DataFrames") julia> Pkg.add("DataFramesMeta")
Also, you need to have Git installed. You can get it from https://git-scm.com/.
When you run the git log --stat
command on a repository, it prints output that looks similar to this:
$ git log --stat
commit 14f30ad448a5d38be38c7b0e7274f0f7b0a951ee (HEAD -> master, upstream/master)
Author: Bogumił Kamiński <bkamins@sgh.waw.pl>
Date: Mon Jun 18 21:40:46 2018 +0200
Allow aggregate to use column number for aggregation (#1426)
src/groupeddataframe/grouping...