For all details and references of the project one can visit darcs-GSoC-2014-History-Reordering-Performance-and-Features.

martes, 19 de agosto de 2014

Last Few Weeks

In the last three weeks I was working in a couple of things, unfortunately I couldn't complete any to 100% :

- Finishing the command minimize-context.
- Implement the "show dependencies" for darcsden.
- Solve the issue2405.

Finish the command minimize-context

I can start telling that the command itself is implemented, but making tests I find out a case when making minimize-context fails to update correctly the files of repository. I comment the problem in here, however here goes a little summary.

Some "preconditions" to take into account:
1. If exist a tag, I search for dependencies in the patchs after the tag.
2. If happends that not exist dependencies after the tag, the only patch in the context of the bundle to send is the tag. This helps the command darcs apply to merge the patches and seems good to have at least a "tag patch" dependency.
3. (In relation with 1) If not exist a tag, the search is made in all the repository. One problem here is that with repositories that have huge amount of patches (darcs for example with 11000~ patches) the command maybe not finish of calculate the dependencies. Hopefully, having so many patches without tags seems a little odd. Nevertheless, I suppose that even the search in 700~ patches of 10~ patches to send should have a decent performance.

Well, passing to comment the problem. In some older post I mention the way we calculate the dependencies, so with that in mind, suppose we have to repositories $r_1$, $r_2$ with the following patches, where $m_x$ and $a_x$ represents the $x$-th modification and adding respectively.

$r_1$,$r_2$ = $m_2$ $tag$ $m_1$ $a_1$

suppose now that $r_2$ makes a third modification $m_3$ that adds lines without touch any existing line in the only file adding for $a_1$ and sequentially modified for $m_x$. With the idea of send a bundle to $r_1$. Now if we compute the dependencies to find out what is the context to send. For (1) we need to try to commute $m_2$ with $m_3$, which success but of course the end result is a $m'_3$ and a $m'_2$ because the lines which modifies every one now are different. But beyond that for our definition of dependency we can throw $m_2$, so the final context of the bundle with minimize context is the patch $tag$.

The problem now is that if we make darcs apply of the minimized bundle the merge is made not considering $m_2$ and the final state of the file isn't right. For example, if we are talking about that $m_3$ adds a function in some empty place, this could end in $m_3$ adding the function in the middle of an existing function.

My, more or less, mature thought is something like have Direct Dependency and  Indirect Dependency (the names could be differents $\smile$). The Direct Dependency is a dependency as already we know it and the Indirect Dependency is that if we have,

$p_1$ and $p_2$ patches, and $p'_1$ and $p'_2$  are the results of commute the patches, then if $p_1 \ne p'_1$ we conclude that $p_1$ indirectly depends of $p_2$.

I have almost implemented this idea, but I change task and still missing a couple of details. In particular, I'm not so sure of a couple of things,

- How to correctly compare the patches ($p_1 \ne p'_1$)
- In my unclean implementation I have to carry on $MyEq$ for all over the code and it's very ugly I think.

Finishing with this topic, here is a script (improved by Guillaume) that shows the problem:
Link: http://hub.darcs.net/alegadea/ExamplesRepos
SH: getDepsDoesntAlwaysWork.sh

Implement the "show dependencies" for darcsden

An idea that quickly comes up to my mind is that, considering the last part of the previous topic, since we have two types of dependencies it would be nice generate the graph of dependencies with two types of arrows to differentiate the type of dependency between patches. Talking about the implementation itself, I advanced in the drawing of the graph and some different presentations, but very little about the integration with darcsden, which I think is more or less direct having implemented the $js$ that draw the graph. Would have been nice have more time for finish this but I swap to the task of the following topic.

I hope be able to work in the last two topics in the next few months outside of the GSoC.

Solve the issue2405

I would like to start saying, "what a shame...". I can't solve the issue, I improve a little bit the memory but nothing too much significant. Maybe the good news is that, more o less, I found the "place" in the code where the memory triggers "to the sky" :)

Now, the reality is that I tried so many things that I'm a little bit confused about everything. So, I gonna comment the last feelings and tests that I make and, any question(or idea) for the future person (maybe me) that would try to solve the issue, it would well received.

So things that I tried to change:

- Strict version of Map, Monad, etc.
- More strict version of mapM_, gets and modify.
- Some partials changes in the data type PatchIndex.
- Using darcs as library; differents uses of the functions...

The "place" that I suspect the code has the memory glitch is in $applyPatchMods \rightarrow go (pid, PTouch fn)$I think that my lack of experience dealing with this kind of problem comes to light :) the
bright side is that I learn many things on the way of understand the problem and try different solutions.

Closing, maybe this could be my last post in a long time. I would like to thanks Guillaume and the darcs community in general. It was great to contribute to the development of darcs!

Again, I hope be able to work a little outside the GSoC, in particular in the "dependency problems and tasks".

Thanks!

martes, 29 de julio de 2014

Other Week (21-26 July)

Good news!

I almost finish to implement the option minimize-context for the command darcs send, I say almost because making some examples I find out that somethings could go wrong. But before entering in that, I will comment a little the main the test that without minimize-context doesn't work and with the option passes(when the implementation is ready the most likely is I will upload the example and some more in ExamplesRepos), and some "tentatives options" for darcs send that I think could be useful.

So, the minimal example:

Suppose we have repositories $r_1$, $r_2$ and $r_3$; $r_2$ a direct clone of $r_1$ where
originally $r_1$ has only one patch, 

$r_1$,$r_2$ $=$ $p_0$

now, we add a patch with a new file in $r_1$ and make a clone, $r_3$ which leave the repositories like this,

$r_1$,$r_3$ $=$ $p_0$ $p_1$

Suppose that we make a modification in the file that add $p_0$ and make a patch $p'_0$ in $r_1$ and we send a bundle (with darcs send) to $r_2$ and $r_3$. What ends up happening is what one imagine,
the bundle can be applied to $r_3$ but not to $r_2$, this because the bundle has de following "shape":

Some message

patch [Some hash]
Author: ...
Date: ...
  * [The modification to $p_0$]

New patches:

$p'_0$

Context:

$p_1$
$p_0$

Patch bundle hash: [Some hash]

and $r_2$ doesn't have the patch $p_1$ despite the fact that $p_1$ nothing has to do with $p'_0$. Using the option minimize-context the bundle is the same but the "Context" is:

Context:

$p_0$

and now there is no problem.

Now, I find out that reducing the context is not always the "best option" for send a bundle. Here I will expose something that could go wrong. Take for example the darcs-screened repository, in total has 11000~ patches, suppose we add a file and make a bundle (using minimize-context) with only the patch that make the add, the context then is empty. So, if we try to apply this bundle to, say the remote repository, this could never end...
What go wrong?; because the context is empty, making darcs apply [TheBundle] in some point try to merge the entire repository versus the "set" of patches to apply, this is very costly if we have 11000~ patches.

I write "best option" before because with my current solution is not always is better reduce the context, but the final idea is that always be the best option. So, I think is necessary to have some care for example in the last implementation if exist a tag, this tag is added to the context. This simple solution solves the problem and seems correct, on the other hand having more than 600~ or 1000~ patches without a tag seem a little extreme :)

Ending, I'm making some tests of two interesting options that came to mind:

. --last-deps=NUMBER
. --deps-from-tag=REGEXP

The first searches dependencies in the first N patches, and the second search since a given tag.

In conclusion, I'm still doubtful about how solve that problem. More if I have consider the two options. For the end of the week I hope have all this solved.

jueves, 24 de julio de 2014

Some Week (14-19 July)

Hi all!

I was finishing understanding and implementing the command darcs send --minimize-context using "optimize reorder" when I begin to suspect that doesn't solve the problem described in here. The thing is, despite the fact that the context in the bundle to send is reduced if before we send we make "optimize reorder", this doesn't solve the problem of dependencies. Guillaume finished of evacuate my doubts, and so after read:

[darcs-users] darcs cannot apply some patch bundles
irclog
issue1514 (which is the issue which "replace" issue2044 darcs send should do optimize --reorder)

I convince myself of what needs to be done, and it's calculate the "exact" dependencies of the patches to send so such dependencies be the context in the bundle to send. "Exact" because for big repositories can be very costly and calculate till certain tag seem appropriate.

Now, one concern is the cost of doing the search of dependencies. About this I can first comment some of the things I was doing during the week and later show, what I think are, encouraging examples. So first, maybe the most relevant thing of the week it's the implementation of the command darcs show dependencies with the following "description":

Usage: darcs show dependencies [OPTION]... 
Generate the graph of dependencies.


Options:
              --from-match=PATTERN  select changes starting with a patch matching PATTERN
              --from-patch=REGEXP   select changes starting with a patch matching REGEXP
              --from-tag=REGEXP     select changes starting with a tag matching REGEXP
              --last=NUMBER         select the last NUMBER patches
              --matches=PATTERN     select patches matching PATTERN
  -p REGEXP   --patches=REGEXP      select patches matching REGEXP
  -t REGEXP   --tags=REGEXP         select tags matching REGEXP
              --disable             disable this command
  -h          --help                shows brief description of command and its arguments

till the moment the command returns a graph described in dot language, this can eventually change. But with the current returned value one can do:

$\$$ darcs show dep | dot -Tpdf -o deps.pdf

to draw the graph in a pdf. Finally, in summary to calculate the dependencies I use more or less the idea which describes Ganesh in here.

Moving to the examples is interesting, thinking in the performance of the implementation of darcs send --minimize-context using this approach, to see the followings results:

1. Show the dependencies after the tag 2.9.9 (75 patches)
$ time darcs show dep --from-tag=2.9.9
real 0m0.397s
user 0m0.373s
sys 0m0.026s
darcsDesps299.pdf

2. Show the dependencies after the tag 2.9.8 (133 patches)
$ time darcs show dep --from-tag=2.9.8
real 0m2.951s
user 0m2.865s
sys 0m0.082s
darcsDesps298.pdf

3. Show the dependencies after the tag 2.9.7 (288 patches)
$ time darcs show dep --from-tag=2.9.7
real 0m26.654s
user 0m26.003s
sys 0m0.511s
darcsDesps297.pdf

4. Show the dependencies after the tag 2.9.6 (358 patches)
$ time darcs show dep --from-tag=2.9.6
real 0m39.019s
user 0m38.302s
sys 0m0.666s
darcsDesps296.pdf

5. Show the dependencies after the tag 2.9.5 (533 patches)
$ time darcs show dep --from-tag=2.9.5
real 1m53.730s
user 1m51.343s
sys 0m1.939s
darcsDesps295.pdf

Rushed conclusion, seems the performance is quite good even more if we think that for compute the graph dependencies we calculate the dependencies of "all the selected patches against all the selected patches" and in the case of the option for send would only require to calculate "patches to send against all the selected patches".

lunes, 14 de julio de 2014

Month of June

Here goes a little summary of what I been doing between late june (9~21) and early july (1~11).

First and easy, I have been documenting Darcs.misplacedPatches (old name chooseOrder), D.P.W.Ordered and D.P.W.Sealed. Something to comment is that the semantics of misplacedPatches, not always can clean a tag doing darcs optimize reorder. For example; Suppose we have a repository, $r_1$ with the following patches;

$r_1$ $=$ $t_{1,0}$ $p_{1,0}$ $t_{1,1}$

here all tags are clean, but if we make another repository, say $r_2$, and we pull from $r_1$ of the
following way

$\$$ darcs pull -a -p $p_{1,0}$ $r_1$ (we want to pull the patch $p_{1,0}$, we assume that the name of the patch is $p_{1,0}$ for the matching with -p option)
$\$$ darcs pull -a $r_1$

so now we have,

$r_2$ $=$ $p_{1,0}$ $t_{1,0}$ $t_{1,1}$

and we see that $t_{1,0}$ is dirty. Doing darcs optimize reorder not reorder nothing. What is going on is that to know what reorder, misplacePatches takes the first tag, in our case $t_{1,1}$, and
"search" for what patches he don't tag. But $p_{1,0}$ and $t_{1,0}$ are tagged by $t_{1,1}$ so there is nothing to reorder despite $t_{1,0}$ is dirty. Therefore there is no way of clean $t_{1,0}$ because misplacePatches always takes the first tag, so if a tag is tagging one or more dirty tags, this tags never be available to get clean.

"Second", using the implementation of "reorder" one can get almost for free the option --reorder for the commands pull, apply and rebase pull. The behavior for the case of pull (for the others commands is the same basic idea) is that our local patches remain on top after a pull from a remote repository, e.i. suppose we have the followings $l$(ocal) and $r$(emote) repositories,

$l$ $=$ $p_1$ $p_2$ $\ldots$ $p_n$ $lp_{n+1}$ $\ldots$ $lp_m$

$r$ $=$ $p_1$ $p_2$ $\ldots$ $p_n$ $rp_{n+1}$ $\ldots$ $rp_k$

where $lp$ are the local patches that don't belong to $r$, and vice versa for $rp$. Make darcs pull, leaves $l$ as follow,

$l$ $=$ $p_1$ $p_2$ $\ldots$ $p_n$ $lp_{n+1}$ $\ldots$ $lp_m$ $rp_{n+1}$ $\ldots$ $rp_k$

meanwhile make darcs pull --reorder, leaves $l$,

$l$ $=$ $p_1$ $p_2$ $\ldots$ $p_n$ $rp_{n+1}$ $\ldots$ $rp_k$ $lp_{n+1}$ $\ldots$ $lp_m$

making more easy to send the $lp$ patches later.

"Third", beginning a new task, implement option minimize-context for command darcs send. Still no much to comment, I have almost finished implementing the option but with some doubts, I hope that for the end of the week have a more "prettier" implementation as well as a better understanding.

jueves, 12 de junio de 2014

Third Week (02-06 june)

Well, well... Now with the solution already implemented here are a couple of time tests that show the improvement.

For the repository of the issue2361:

"let it run for 2 hours and it did not finish"

real    0m5.929s
user    0m5.683s
sys     0m0.260s

For the repository generated by forever.sh, that in summarize has 12600~ patches, a bundle unrevert and doing reorden implies move 1100~ patches forward passing by 11500~ patches.

(Interrupted!)
real    73m9.894s
user    71m28.256s
sys     1m11.439s

real    2m23.405s
user    2m17.347s
sys     0m6.030s

The repository generated by bigRepo.sh has 600~ patches, with only one tag and a very small bundle unrevert.

real        0m34.049s
user        0m33.386s
sys         0m0.665s

real        0m1.053s
user        0m0.960s
sys         0m0.152s

One last repository generated by bigUnrevert.sh, has 13 patches and a really big bundle unrevert (~10MB).

real    0m1.304s
user    0m0.499s
sys     0m0.090s

real    0m0.075s
user    0m0.016s
sys     0m0.011s

The repository with more examples is in here: ExamplesRepos.

martes, 3 de junio de 2014

Second Week (26-30 may)

Luckily, this week with Guillaume we found a "solution" for the issue 2361. But before of entering in details, let's review how the command darcs optimize --reorder does for reorder the patches.

So, suppose we have the following repositories than, reading it from left to right we have the first patch till the last patch, besides with $p_{i,j}$ we denote the $i$-th patch who belongs to the $j$-th repository, and when we want to specify that a patch $p_{i,j}$ is a tag we write $t_{i,j}$.

$r_1$ $=$ $p_{1,1}$ $p_{2,1}$ $\ldots$ $p_{n,1}$ $p_{n+1,1}$ $\ldots$ $p_{m,1}$

$r_2$ $=$ $p_{1,1}$ $p_{2,1}$ $\ldots$ $p_{n,1}$ $p_{1,2}$ $\ldots$ $p_{k,2}$ $t_{1,2}$ $p_{k+1,2}$ $\ldots$ $p_{l,2}$

where the red part represent when $r_2$ was cloned from $r_1$, and the rest is how each repository was evolved. Now, suppose we make a merge of $r_1$ and $r_2$ in $r_1$ making a bundle of the patches of $r_2$ and appling it in $r_1$. Thus, after the merge we have that

$r_1$ $=$ $p_{1,1}$ $p_{2,1}$ $\ldots$ $p_{n,1}$ $p_{n+1,1}$ $\ldots$ $p_{m,1}$ $p_{1,2}$ $\ldots$ $p_{k,2}$ $t_{1,2}$ $p_{k+1,2}$ $\ldots$ $p_{l,2}$

and we found the situation where the tag $t_{1,2}$ is dirty because the green part is in the middle. And now we are in conditions of finding out how darcs does the reorder of patches.
So, the first task is to select the first tag seeing $r_1$ in the reverse way, suppose $t_{1,2}$ is the first (ie, $p_{k+1,2}$ $\ldots$ $p_{l,2}$ are not tags), and split the set of patches (the repository) in

$ps_{t_{1,2}}$ $=$ $p_{1,1}$ $p_{2,1}$ $\ldots$ $p_{n,1}$ $p_{1,2}$ $\ldots$ $p_{k,2}$ $t_{1,2}$

and the rest of the patch set,

$rest$ $=$ $p_{n+1,1}$ $\ldots$ $p_{m,1}$ $p_{k+1,2}$ $\ldots$ $p_{l,2}$

this is done by splitOnTag, which I don't totally understand yet, so for the moment... simply do the above :) Then, the part that interest us now is $rest$, we want to delete all the patches of $rest$ that exist in $r_1$ and then add them again, causing that they show up to the right. This job is done by tentativelyReplacePatches, which first calls tentativelyRemovePatches and then calls tentativelyAddPatches.

So, tentativelyRemovePatches of $r_1$ and $rest$ makes,

$r_{1}'$ $=$ $p_{1,1}$ $p_{2,1}$ $\ldots$ $p_{n,1}$ $p_{1,2}$ $\ldots$ $p_{k,2}$ $t_{1,2}$

and, tentativelyAddPatches of $r_{1}'$ and $rest$,

$r_{1}''$ $=$ $p_{1,1}$ $p_{2,1}$ $\ldots$ $p_{n,1}$ $p_{1,2}$ $\ldots$ $p_{k,2}$ $t_{1,2}$ $p_{n+1,1}$ $\ldots$ $p_{m,1}$  $p_{k+1,2}$ $\ldots$ $p_{l,2}$

leaving $t_{1,2}$ clean.

Well, all of this was for understanding the "solution" for the issue, we are almost there but before let's look at the function tentativelyRemovePatches. It attempts to remove patches with one special care: when one does darcs revert, a special file is generated, called unrevert in _darcs/patches, which is used for darcs unrevert in case that one makes a mistake with darcs revert. One important difference with unrevert is that unlike all the other files in _darcs/patches, unrevert in not a patch but a bundle, that contains a patch and a context. This context allows to know if the patch is applicable. So when one removes a patch (running for example oblitarete, unrecord or amend) that patch has to be removed from the bundle-revert (bundle of the file _darcs/patches/unrevert). It's now always possible to adjust the unrevert bundle, in which case, the operation continues only if the user agrees to delete the unrevert bundle.

But now a question emerge. Is it necessary to accommodate the bundle-revert in the case of reorder?; the answer is no, and it's because we don't delete any patch of $r_1$ so we still can apply the bundle-revert in $r_{1}''$.

So, finally! we find out that for reorder we need a special case of removing, which doesn't try to update the unrevert bundle. And this ends up being the "solution" for the issue, since the reorder blocks in that function. But! beyond this solves the issue something weird is happening, that is the reason of the double quotes for solution :)

This is more o less the step forward for now. The tasks ahead are, documenting the code in various parts and make the special case for the function tentativelyRemovePatches. On the way I will probably understand more about some of the functions that I mention before so probably I will add more info and rectify whatever is needed.

lunes, 26 de mayo de 2014

First Week (19-23 may)

Sadly, a first slow week, I lost the monday with problems with my notebook for which I have to reinstall ghc, cabal, all the libraries, etc.. but! in the end this helped :)

The list of taks of the week include:

1. Compile and run darcs with profiling flags
2. Write scripts to generate dirty-tagged big repositories
3. Check memory usage with hp2any for the command optimize --reorder for the
generated repositories and repo-issue2361
4. Check performance difference with and without patch-index
5. Document reorder implementation on wiki
6. Actually debug/optimize reorder of issue2361 (Stretch goal)

1. Compile and run darcs with prolfiling flags

This seems pretty easy at first, but turned somewhat annoying because one have to install all the libraries with the option profiling. So a mini-step-by-step of the my installation of darcs with profiling
flags is (i'm using ubuntu 14.04, ghc-7.6.3 and cabal-install-1.20.0.2) :

- Install ghc-prof package, in my case with sudo apt-get install ghc-prof
- Install depencencies of darcs with enable-library-profiling, doing:
    - $ cabal install LIB --enable-library-profiling ( for each library :) )
    - or setting in ~/.cabal/config, library-profiling: True
- Finaly install darcs with enable-library-profiling and enable-executable-profiling

2. Write scripts to generate dirty-tagged big repositories

About this no much to say, I did some libraries to make the scripts that generates the repositories more straightforward. And I wrote some examples, but still in search of interesting examples. A long the week probably I will add examples, hopefully interesting.

3, 4 and 5 all together and mixed

Now, when finally start to generate the examples repositories and play with hp2ps to check differents things, I started to think about others things and I ended up studing the implementation of the command optimize --reorder, in particular I start to write a version which print some info during the ordering of patches, but for now is very dirty implementation.