Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'd like to collaborate #29

Closed
mildsunrise opened this issue Jul 31, 2012 · 30 comments
Closed

I'd like to collaborate #29

mildsunrise opened this issue Jul 31, 2012 · 30 comments

Comments

@mildsunrise
Copy link
Contributor

Let me know if I'm doing anything that's already in progress!
I'll be updating this when I complete tasks.
Fix the following tasks in TODO:

  • Provide a GitObject class which the other object types (Commit, Tree, Blob, etc.) will subclass.
  • Work on memory and caching (same pointer, same JS object)
  • Let objects keep a reference to their owning repo, to avoid segfaults.
  • Make sure all create/get stuff in repo is returning local copies of handles.
  • Improve spelling, grammar and some confusing explanations at the docs.

As well as add support for the following features:

  • A generic Repository.get(...) method that returns the object wrapped with the appropiate type.
    We're on a dynamically-typed language!

  • Support for peeling objects.

  • Support for revparse (see Need some kind of high level "rev-parse" functionality. #8)

  • Support for ODB managing and streaming (see Offer a ReadableStream/WritableStream interface to work getRawObject blobs... #4)

  • High-level interface to branches: a Branch type which subclasses Reference. Add the following methods:

    Repository.createBranch
    Repository.getBranch
    Repository.listBranches
    
    Branch.branchName
    Branch.move (in contrast to Reference.rename)
    Branch.getTracking
    
    BRANCH_LOCAL
    BRANCH_REMOTE
    
    Reference.hasLog
    Reference.isLocalBranch
    Reference.isPacked
    Reference == Reference (comparison)
    

And some other utilities:

  • Support for getting libgit2 version as well as node-gitteh version.

  • Determining the merge base of two or more commits.

  • Convenience methods for working with the HEAD and refs:

    Repository.getHead
    RevisionWalker.pushHead
    RevisionWalker.pushRef
    RevisionWalker.hideHead
    RevisionWalker.hideRef
    
  • Support for prettifying a Message.

That's all for now ;)
Anyway, it should be great to have a wiki page that people
can edit to add and assign himself tasks, instead of a TODO.md.

@samcday
Copy link
Contributor

samcday commented Aug 1, 2012

@jmendeth

Holy shit, you're pretty on the ball here, which is fantastic to see! I just moved house so I've fallen a bit behind on stuff, I'm definitely interested in catching up and working with you to get this stuff done on Gitteh.

From the documentation perspective, hold up on that because I have some uncommitted stuff with a new docco system to document the JS (well, coffee-script) bridge I've written to the native bindings, so I won't need to maintain a separate fake-JS file for documentation generation.

I'm at work right now, but I'll definitely get back to you ASAP!

@mildsunrise
Copy link
Contributor Author

OK, one thing less. ;)

@TooTallNate
Copy link

+1 I would love to see this project be revived. The current "reading from a git repo" situation in Node.js is sad right now, and this was my favorite project from the past related to git+node.

Let me know if you guys need any help converting from node-waf to node-gyp for the build!

@mildsunrise
Copy link
Contributor Author

@TooTallNate thanks for the 👍, I'll [try to] get on it.

Before starting working, I suggest we do a cleanup:

Done  Bundle LibGit2 (I mean, bundling, keeping a local copy along with the submodule)
In every computer I want to build / install Gitteh, I get untermined errors of this, which is really frustrating.

Done  Move lib to doc, set up directories in package.json.

Done  Update LibGit2 to latest version.

Done  Use only Node-GYP to build (to support latest versions)

Done  Finish refactoring the code (if it isn't done already), then remove the old* folders.

Done  Stick up to the coding convention used by Google (V8) and Joyent (Node) in their code.
That is, two spaces indentation, braces on the same line, etc.

Done  Node.JS encourages to use classes (we're on an object-oriented language!)
to wrap our types. Classes with static methods.

Done  I suggest to use the V8U utilities to simplify wrapping (from now on).
They make things a lot easier (see link).

@mildsunrise
Copy link
Contributor Author

@samcday If you agree,
I can make everything of the above, except the refactoring.
If you want me to refactor, please explain me what and how has to be done,
coz I'm very confused.

@samcday
Copy link
Contributor

samcday commented Aug 17, 2012

Hey guys, still very interested in getting a new release of this out with the heavy refactoring + Node 0.8/libgit2 0.17.0 compat, most of the work is there, just needs stitching up.

I moved house 3 weeks ago, and my ISP messed up pretty bad, so won't have internet until next week, so I;ll be getting back into the swing of things then ;)

@TooTallNate I have some basic gyp support already, using the pull requests you sent to a couple of other projects as inspiration. It's pretty crap right now though, was trying to make use of pkgconfig on Linux, but not sure how to locate stuff in OSX, any advice/help here would be much appreciated!

@jmendeth I'll finish off the refactoring, regarding most of your other points there, I think I was already moving in that direction (see the Coffeescript bridge I wrote in src/gitteh.coffee). Bundling libgit2 might make sense, though I haven't figured out the best way to do that, since libgit2 has a hard dependency on cmake, which will suck a bit for deployments to Heroku/and other PaaS systems.

@mildsunrise
Copy link
Contributor Author

Bundling libgit2 might make sense, though I haven't figured out the best way to do that, since libgit2 has a hard dependency on CMake.

My approach on other projects has been: convert the CMake files into Gyp ones, then use them to depend on libgit2.
I'll try to do that.

@mildsunrise
Copy link
Contributor Author

@TooTallNate Do I have to include node-gyp as a dependency on package.json?

@TooTallNate
Copy link

@jmendeth Nope, npm has it bundled so when you execute npm install in a repo with a binding.gyp file then it'll automatically be used (you shouldn't have a "preinstall" or "install" phase in the package.json either).

@mildsunrise
Copy link
Contributor Author

What about older versions of NPM (such as the one bundled with Node v0.6 or even v0.4)?

This was referenced Aug 17, 2012
@TooTallNate
Copy link

What about older versions of NPM (such as the one bundled with Node v0.6 or even v0.4)?

The npm with v0.6.4 and up (somewhere around there) started bundling node-gyp, so v0.6 users should pretty much be fine.

Nobody cares about v0.4 users (in fact it's a disservice to do so), they need encouragement to update, or they can use an old version of node-gitteh.

P.S. I'm still very excited about this project getting life again! I'm using node-ffi to bind to libgit2 currently and while it works fine, I'd like for this dedicated solution to come back so I can asyncify things again (the mutexes uses in gitteh fix threading issues that async node-ffi exhibits): https://github.com/TooTallNate/n8.io/blob/1cc5ebdf32d3f1c7e08659a103691ff6934460e3/lib/git.js Cheers!

@mildsunrise
Copy link
Contributor Author

OK. One last question:
how do I statically link an already build library (libgit2 in this case) to my .node module (node-gitteh)?

@TooTallNate
Copy link

@jmendeth You need a "libraries" section in the binding.gyp file. See: https://github.com/rbranson/node-ffi/blob/47b6d750e98739cc12285ecf950d395a12180c7b/binding.gyp#L26-28

@mildsunrise
Copy link
Contributor Author

Yes, that's what I was looking for. Thanks!

PS: I'm busy with Robotskirt (Sundown for Node.JS),
but I'll try to have a minimal working gitteh for soon.

@mildsunrise
Copy link
Contributor Author

I know, my "soon" isn't what you'd expect.
But hey, neither is @vmg's "by the end of the year" (don't tell that to him ;).

The fact is, I'm now enough documented about
libuv, libgit2 and V8 to start serious development.

I expect to be pushing to my fork quite often,
and I'll alert you when minimal features are ready to be tested. 🚀

@mildsunrise
Copy link
Contributor Author

Changes in phylosophy

@samcday These changes are related to speed, comformity with other
modules and the end-user API. If you disagree with them, please tell me
and we'll discuss them. Here they are:

  1. There is a notion of OIDs. In fact, they have its own type.
    Why? Primarily, because _it is much faster than having to format/parse/validate the hex-strings each time_.

    Secondly, because libgit2 has its own git_oid structure, and a bunch of methods
    that work with it. So, it makes sense to have a JS type associated with this structure.

    Finally, because in 90% of the apps there aren't hardcoded OIDs, nor the user inputs them (unless you're building a CLI): the OIDs come from libgit2 instead. Usually, reference names are hardcoded instead, and libgit2 resolves them... you guessed it, to OID structures.

    So, why to take the pain of formatting the structures into hex-strings, if they'll likely be passed untouched to another libgit2 function (after being parsed again to get the original OID)?
    Why not passing the OID structure directly?
     

  2. Your practice is to generate all the fields of the objects and
    set() them, just assigning properties once.

    However, if you look at the Node docs and the tutorials on C++ addons,
    they use Getters/Setters. By only having a getter only, writes are
    ignored, and properties cannot be changed.

    Let's compare both approaches:

    Comparison of the two methods

    I'm taking an hybrid approach: I use getters (so I "call into C code")
    but I cache the values' handles so they're only generated the first time.
    IMHO, this has the best of both worlds.
     

  3. Docs are being manually written.
    No comments.
    Just markdown files under docs/.

    This is Node and many other projects' approach,
    and after trying many other tools, I advocate for it also.

    A tool bundled in Node then parses this markdown and
    generates beautiful HTML pages and JSON for IDEs to parse.

    Why? Because JS is so flexible, so abstract, so dynamic that
    it is better to have a Markdown file (following some conventions)
    than a strict comment parser.

    It allows you to express the API's concepts in your own way.

@samcday
Copy link
Contributor

samcday commented Jan 8, 2013

Thanks for your thoughts @jmendeth, I'll do my best to address them now.

To begin with 1, I definitely agree that wrapping git_oid would be a little more performant IF API users were only ever dealing with oids coming back from libgit2, but I don't think it's worth wrapping them for 2 reasons. First, I don't want to assume that API users only ever deal with refs, there is plenty of valid use cases where end users would be working with raw OIDs. Secondly, the cost of converting char* to git_old is as cheap as a memcpy(), so the additional complexity of creating a whole new JS wrapping type for git_oid doesn't seem worth the very small improvement in speed to me.

#2, the reason I opted to not use V8 getter/setter callbacks in C code at all is because I actually did to begin with, and then realized that all git objects are immutable. Think about it, if you modify a git object in any way, its oid will change and thus you'll have a new object. For this reason I decided that managing the state of a git object in user space (read: the CoffeeScript bridge) is the better approach because it results in a single entry into the C code when it's time to save the object.

Just a little more background on my reasoning behind #1 and #2, if you look at the oldsrc dir (soon to be deleted), you'll notice there used to be rather a lot more C code. When I picked up gitteh after a hiatus mid last year I realized that writing a lot of C code is just making the project arbitrarily complex when I could delegate most of the boilerplate to the JS, and just call into the C for the "good bits" (read: the actual libgit2 code). I've taken this to the extreme with the new CoffeeScript bridge code, by being VERY aggressive in not leaking any native code calls, or pointers to native objects, this way I can do all the painful things like validation, type checking etc in JS rather than C, which is less code and a whole lot less headaches :)

#3 is a great idea, writing escaped coffeescript comments to pass through to the compiled .js file to then be parsed by a JS documentation generator written in Java was a dumb idea, and I feel stupid for not thinking of Markdown earlier. That said, I definitely do not want to use something like docco because whole "comments next to code" thing is not a good fit for a bindings library like gitteh. Do you have experience with something that might suit us better?

@mildsunrise
Copy link
Contributor Author

I appreciate your feedback, thanks @samcday!

Regarding 1.

But I repeat, you probably never hardcore or input raw OIDs.
What you do, 99% of the time (on git, GitHub, ...), is to specify commit IDs.
This is different, because it doesn't have to be an OID, but can also be a reference, branch or tag name.
What you want to do in this cases is to call revparse which will give you an OID structure ready-to-use.

Think about it: when do you see OIDs at github, other than when they are commits (a tree, a blob, a tag, etc.)?


You said:

[...] Secondly, the cost of converting char* to git_old is as cheap as a memcpy(), [...]

Actually, not. The OIDs are stored in memory as raw data, as it should
be (20 bytes). [see here] Thus, every time you wanna convert them to HEX, libgit2
represents them in characters.

So, it's not a memcpy, because currently you have to:

  1. Get the char* by using v8::String:Utf8Value.
  2. Allocate the structure: new git_oid
  3. Parse and populate it with git_oid_fromstr[n](*utf_value, utf_value.length())

And you have the structure ready (&oid). Don't forget to delete it later!
With an OID type, it's as simple as:

  1. Call Oid::Unwrap(...) with the handle, and you get the object.

And you have the structure ready! Do &object->oid to get a pointer to it.
And, as long as you keep a reference to the handle (via Persisted),
the OID will be so.

No new/delete. No parse/format. The same structure is passed, which
is why it was made. To be reused.

The API

A git_oid struct needs to be populated with raw data.
That's why the constructor of the OID type accepts only raw data (20 bytes minimum):

> data = new Buffer(20)
<Buffer 43 a9 37 ...>
> oid = new git.Oid(data)
<Oid 43a937f8f6>

If for some reason a user wants to parse a HEX string:

> oid = git.Oid.parse('93fe0')
<Oid 93fe000000>

The rest of the OID API is intuitive as well, and won't impact the end-user experience:

> oid.toString()
'93fe000000000000000000000000000000000000'
> oid.empty
false
> git.Oid.zero
<Oid 0000000000>
> git.Oid.zero.empty
true

@mildsunrise
Copy link
Contributor Author

Regarding 2.

Yeah, I know Git objects are immutable, that's why I thought of getters as a good idea;
they help create the effect that the values are also immutable (assignments are ignored),
in contrast to normal properties where they can be changed.

But I read your coffee bridge and you've convinced me:
your method is a better approach!

So, it's now official: I'll use normal properties on my rewrite, just as before.
However, because the JS API will change quite a lot, the coffeescript bridge
will also need to be rewritten. I'll try to do it myself, but I've never had a cup of Coffee!
(literally too, I'm sixteen :).

Regarding 3.

Sorry, I don't get what you're asking for.
If you want a tool that's able to generate HTML from the Markdown files then
there you have it, it comes bundled with Node. I think it's the best method
that suits, especially on native addons.

Other changes

  • As I said, the API's been changed to make it more structurated, conformant and intuitive.
    For example, instead of Repository.createBranch(...), we have the static method Branch.create(repository, ...).
    This is, among other reasons, because by convention functions that create/return instances of a type (factories) should be static, and should be prefixed create. Also, it's more intuitive IMHO.
  • No little hamsters were nor will be harmed during development :P

@samcday
Copy link
Contributor

samcday commented Jan 8, 2013

@jmendeth I was unaware you were pushing for a rewrite of the codebase, if that's the case then I'll probably be recommending you just maintain your own fork going forward. I'm sure we can come to some middle ground to ensure our efforts are combined, but just be aware that the current state of the C codebase and the CoffeeScript bridging code are something I arrived at after months of work, and I'm not too keen on dropping it for a "rewrite" :)

Regarding #1, something you might be missing is the fact that V8 switching from JS to native C code is infinitely more expensive than anything else. It is far more expensive than a couple of memcpys, heap allocations, and conversions from v8::string to char* :) So when you have an oid object that has a constructor calling into C code, the mere switch to C code alone is going to waste a lot more CPU cycles than necessary.

The approach I've been going for is that obviously calling into the C code is inevitable, but when I do it I try to make sure I get everything done in one jump. For example if I was being 100% faithful to the libgit2 API and just making the gitteh codebase a very thin wrapper (which, if I did, I might as well delete my code and recommend people use node-ffi anyway), then I would require API users to first obtain a handle to a git_oid, then a handle to a git_commit, then a handle to the git_commit parent git_oid. If the JS bridge were low level this would be 3 different JS -> C calls. With my current approach there is only one call into the C code and we do as much as we can right there and then, make sure to keep that in mind going forward!

Regarding your approach with static factory methods, I'm fine if we have both construction methods available: myRepository.createCommit() and gitteh.Commit.create(myRepository)

As for the markdown, I was still hoping for something a little higher level than straight markdown authoring for documentation generation, but I'll see what it feels like to just author plain markdown for gitteh.

@mildsunrise
Copy link
Contributor Author

Sorry for the delay. In respect to the rewrite,
something made me think I already told you,
but in either case it wasn't intentional, really.

However, ---as you can see--- I'm doing it heavily
based on your practices. And I'll reuse some parts
of your bridge, and add some other parts like events
and streaming for the new features.

There are some reasons why I think Gitteh needs a rewrite:

  • CCV8 (aka. v8-juice) is a really heavy framework for
    exposing native libs to JS in V8 (not only Node). In the
    rewrite I'm using V8U, which is more like a micro-framework,
    a bunch of shorthands and macros that expand to the
    code you meant
    . That makes it compatible with other
    utility libs, and highly flexible and fast.
  • The build system. The package is supposed to contain
    all the required source and native deps; it is not correct
    to clone the entire libgit2 repository and it always yields
    problems (the branch has changed, the version is not
    the correct, etc.)
  • Some other little details, like API and consitency, etc.

@mildsunrise
Copy link
Contributor Author

@samcday

Regarding #1

Most OIDs come from the library itself.
The rest of OIDs:

  • Are hardcoded. In this case, just a single call to Oid.parseArray(...)
    will suffice to have them all parsed at the start of the program.
    See mildsunrise/node-gitteh@937be27 for details.

  • Come from the user. In this case, it's likely that they aren't only OIDs,
    they are full revision IDs. Then a call to revparse is obligatory even
    with your current source.

    GitHub only accepts revision IDs, not OIDs.
    That's why you can browse mildsunrise/v8u@master
    and master is not an OID. Since the call into C is unevitable, there's
    nothing to lose here.

Bonus! No validation / conversion / parsing stuff is required.

@mildsunrise
Copy link
Contributor Author

@samcday Just reading your last comment, I have to say:

Ideas can't be dropped; only reused.

@iamwilhelm
Copy link
Contributor

@jmendeth and @samcday What's the status on each of your forks? I keep seeing that the codebase is in the middle of a refactor, for libgit2 v0.17 on both the READMEs, so it makes me hesitate to leap in and contribute on some of the little problems, like people not being able to compile it on mac OSX. Esp if things are going to be changed around.

Could either one of you give us an update on the current status? it'd be much appreciated. Thanks!

@mildsunrise
Copy link
Contributor Author

@iamwilhelm Thanks for your interest. First of all, you
should be aware that I'm rewriting gitteh, using the latest
LibGit2 but basing on the current source.

So it'll take long before I finally have a decent chunk of the API
exposed. I'm really busy these days, but hopefully will get on it
this weekend.

I should make it more clear that I'm rewriting. I'll update my README.

@iamwilhelm
Copy link
Contributor

@jmendeth Thanks for the answers. Are you doing the rewrite here in libgit2/node-gitteh? or are you doing it in your own fork?

And if it's in your own fork, are you keeping the method signatures the same, or are you changing them around completely?

@mildsunrise
Copy link
Contributor Author

@iamwilhelm In my own fork, of course.
I don't have write access to libgit2 and even if I did,
I'd prefer to keep things apart until they're ready.

With respect to the signatures (JS API) there will be
a big change (see the point 1) that will unevitably drop compatibility,
which is the presence of an Oid type.
However, it should be easy to migrate.

I'll be trying to keep the previous method names and signatures where possible.

@iamwilhelm
Copy link
Contributor

Ahh, I see. Is there a plan to merge the forks in the future between @jmendeth / gitteh and libgit2/gitteh? Or is that still up in the air, and remains to be seen?

@mildsunrise
Copy link
Contributor Author

That is still up in the air, and remains to be seen.
At least, that's what I understand. @samcday
perspective may differ though.

Think of my rewrite as an experiment --an experiment
I strongly believe will succeed, but not sure.

2013/3/6 Wil Chung [email protected]

Ahh, I see. Is there a plan to merge the forks in the future between
@jmendeth https://github.com/jmendeth / gitteh and libgit2/gitteh? Or
is that still up in the air, and remains to be seen?


Reply to this email directly or view it on GitHubhttps://github.com//issues/29#issuecomment-14525607
.

@mildsunrise
Copy link
Contributor Author

Closed for clarity and inactivity. (read: my fault)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants