-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement an IPFS compatible CID function in Elixir using Multihash SHA256 #11
Comments
Have been looking into CIDs and IPFS. See here for all thoughts captured. CIDs are made up of codecs and multihashes. Multihashes themselves are self describing hashes (e.g. they contain information about the hashing algorithm that was used to hash the data). See this comment for an example of a multihash in elixir. In order to create our own CIDs it appears we needs to be able to create a multihash (something that we have been able to do with ex_multihash and a codec. Looking into codecs now to get a better understanding of what exactly they are. They appear to have a similar role in a CID as the hash_type in a multihash. |
I have been able to recreate the steps listed in this article. This shows that I can at least create the same hashes at I do not fully understand what additional data that IPFS is adding to the data that you store on it...
As you can see above, I have just added However, I do not think that we need to fully understand exactly what extra data IPFS is adding right now. We should (hopefully) be able to use a library that will handle this step for us (otherwise we are just reimplementing the IPFS logic ourselves which doesn't seem logical) The code above only relates to CIDv0.
Next stepTry to recreate the steps to create a |
At the moment I want to create the same hash as the one from the command line. e.g. the result of... :crypto.hash(:sha256, "Hello World") should be the same as
This is not the same hash that IPFS creates but if we can match a more simple sha256 hash that would be a good first step. |
file = File.read!("hello.txt")
:crypto.hash(:sha256, file)
|> Base.encode16(case: :lower)
|> IO.inspect()
"d2a84f4b8b650937ec8f73cd8be2c74add5a911ba64df27458ed8229da804a26" This is the same and the |
@RobStallion CID v0 is irrelevant to us at this stage. We will not use it. |
I thought I would focus on
Also, I have installed IPFS locally and the hash string that I am getting back at the moment is still I do want to get Have I misunderstood this @nelsonic? |
@RobStallion provided the |
iex(1)> codec = "dag-pb" # the codec needed to hash CIDv0
"dag-pb"
iex(2)> file = File.read!("hello.txt") # reading the file with the text of Hello World (same file that was uploaded to IPFS)
"Hello World\n"
iex(3)> digest = :crypto.hash(:sha256, file) # hash the file string with sha256
<<210, 168, 79, 75, 139, 101, 9, 55, 236, 143, 115, 205, 139, 226, 199, 74, 221,
90, 145, 27, 166, 77, 242, 116, 88, 237, 130, 41, 218, 128, 74, 38>>
iex(4)> {:ok, multihash} = Multihash.encode(:sha2_256, digest) # create multihash from hash (see #8 for more info on this step)
{:ok,
<<18, 32, 210, 168, 79, 75, 139, 101, 9, 55, 236, 143, 115, 205, 139, 226, 199,
74, 221, 90, 145, 27, 166, 77, 242, 116, 88, 237, 130, 41, 218, 128, 74,
38>>}
iex(5)> cid = CID.cid!(multihash, codec, 0) # creates a CID struct (just simple struct creation. Something everyone has done 1000 times)
%CID{
codec: "dag-pb",
multihash: <<18, 32, 210, 168, 79, 75, 139, 101, 9, 55, 236, 143, 115, 205,
139, 226, 199, 74, 221, 90, 145, 27, 166, 77, 242, 116, 88, 237, 130, 41,
218, 128, 74, 38>>,
version: 0
}
iex(6)> CID.encode!(cid) # turns the CID struct into a base58 string (this is where the magic is happening)
"QmcWyBPyedDzHFytTX6CAjjpvqQAyhzURziwiBKDKgqx6R" Using this online tool I have converted the
As you can see, this matches my digest from IPFS. This means that the above functions are working as expected. Next stepsMost of the above is pretty straightforward to understand. Need to look into the CID.encode function to get a better understanding of what is happening here and how it works. |
@RobStallion
|
This line was a mistake. It is not the same as the one from IPFS. It is the same as the hash of the file that was created in the terminal here and in iex here. What this means (to me at least) is that all the That is literally it!!! This can be done with the following lines of code... defmodule CidTester do
def read_file(str), do: File.read!(str)
def hash(file), do: :crypto.hash(:sha256, file)
def multihash(digest), do: Multihash.encode(:sha2_256, digest)
def encode({:ok, multihash}), do: Base.encode16(multihash, case: :lower)
def run(filename) do
filename
|> read_file()
|> hash()
|> multihash()
|> encode()
end
end Then run iex(1)> Cid.run("hello.txt")
"1220d2a84f4b8b650937ec8f73cd8be2c74add5a911ba64df27458ed8229da804a26" as you can see this is the same as calling the
Same as "1220d2a84f4b8b650937ec8f73cd8be2c74add5a911ba64df27458ed8229da804a26" when converted to base16. The This does not mean that the This really confused me as both modules are producing the same string (which is just a base58 string of a multihash) and that string is not the same as the one from IPFS. This seems to be because these modules are not adding the data that IPFS adds data when it is added to IPFS. For example... this is the hello text file on my local machine...
Next, I'll add this file to IPFS...
Now if we run the same
I think that the difference in the CIDs is coming from this extra data that IPFS is adding to the file. This means that as it currently stands, the CIDs that these modules are creating can not be used to get data from IPFS as they will not be the correct CID for the data that is on IPFS.
The CIDs that these modules make can be used in our projects and will always produce the same CID for the same data that is passed in. We just cannot integrate them into IPFS right now as they will not be able to that same data that is on IPFS (if my understanding is correct). After speaking with @SimonLab about this problem, he came across this, https://github.com/ipfs/ipfs#protocol-implementations. This seems to be the missing step. I haven't had much of a chance to look into this as of now but on my brief look it says to raise and issue if you want to implement this in a specific language. I looked at the issues and the only issue I saw with a mention of elixir is issue83. This issue has a link to the following repo, https://github.com/tensor-programming/Elixir-Ipfs-Api. I will begin looking into this 'missing step' in more detail. @nelsonic @SimonLab do either of you have any thoughts on this (sorry for the SUPER long comment. Hopefully it makes sense) @nelsonic I believe that in order for us to be able to complete this issue (Implement an IPFS compatible CID function in Elixir) we will need to include this step |
@RobStallion this comment makes sense. 👍 (thanks for adding this detail) |
ipfs/go-cid#77. Someone has had this issue in I have confirmed that I can get a matching CID using
Now in iex iex(1)> file = File.read!("hello.txt")
"Hello World\n"
iex(2)> digest = :crypto.hash(:sha256, file)
<<210, 168, 79, 75, 139, 101, 9, 55, 236, 143, 115, 205, 139, 226, 199, 74, 221,
90, 145, 27, 166, 77, 242, 116, 88, 237, 130, 41, 218, 128, 74, 38>>
iex(3)> {:ok, multihash} = Multihash.encode(:sha2_256, digest)
{:ok,
<<18, 32, 210, 168, 79, 75, 139, 101, 9, 55, 236, 143, 115, 205, 139, 226, 199,
74, 221, 90, 145, 27, 166, 77, 242, 116, 88, 237, 130, 41, 218, 128, 74,
38>>}
iex(4)> cid = CID.cid!(multihash, "raw", 1)
%CID{
codec: "raw",
multihash: <<18, 32, 210, 168, 79, 75, 139, 101, 9, 55, 236, 143, 115, 205,
139, 226, 199, 74, 221, 90, 145, 27, 166, 77, 242, 116, 88, 237, 130, 41,
218, 128, 74, 38>>,
version: 1
}
iex(5)> CID.encode cid
{:ok, "zb2rhkpbfTBtUV1ESqSScrUre8Hh77fhCKDLmX21rCo5xp8J9"} As you can see, the two CIDs created match (for sure this time 🙄🤦♀️)
This IS a step in the right direction but is not a solution. This will not work for all files. It will only work for files that are smaller than a certain size (256kb). Let's repeat the steps above with a larger file...
As you can see this file is 1.11MiB. When we repeat the steps with iex(1)> file = File.read!("elm-slides.pdf")
<<37, 80, 68, 70, 45, 49, 46, 55, 13, 10, 37, 161, 179, 197, 215, 13, 10, 49,
32, 48, 32, 111, 98, 106, 13, 10, 60, 60, 47, 80, 97, 103, 101, 115, 32, 50,
32, 48, 32, 82, 32, 47, 84, 121, 112, 101, 47, 67, 97, 116, ...>>
iex(2)> digest = :crypto.hash(:sha256, file)
<<80, 53, 122, 165, 21, 149, 132, 189, 86, 141, 57, 245, 185, 240, 119, 254,
217, 210, 49, 37, 225, 87, 43, 153, 79, 135, 166, 115, 82, 144, 54, 51>>
iex(3)> {:ok, multihash} = Multihash.encode(:sha2_256, digest)
{:ok,
<<18, 32, 80, 53, 122, 165, 21, 149, 132, 189, 86, 141, 57, 245, 185, 240, 119,
254, 217, 210, 49, 37, 225, 87, 43, 153, 79, 135, 166, 115, 82, 144, 54,
51>>}
iex(4)> cid = CID.cid!(multihash, "raw", 1)
%CID{
codec: "raw",
multihash: <<18, 32, 80, 53, 122, 165, 21, 149, 132, 189, 86, 141, 57, 245,
185, 240, 119, 254, 217, 210, 49, 37, 225, 87, 43, 153, 79, 135, 166, 115,
82, 144, 54, 51>>,
version: 1
}
iex(5)> CID.encode cid
{:ok, "zb2rhc3P77eryPttouAgYrzwuByVmkDSrLRt1UciwUmWmUzCS"} You can see that the 2 CIDs do not match...
|
@RobStallion it's good that you are being thorough with your investigation, We can |
It seems that when we upload a small file to IPFS in version1 with the "raw" codec it doesn't manipulate the data. This can be seen with the following...
As you can see, when we retrieve the file from IPFS and log the data is hasn't added anything new to it like it did when we did this with |
This is now the same as IPFS
The only difference from the first elixir implementation is that first elixir attempt
second attempt
We should easily be able to fix this by just appending a new line to the end of a JSON object in elixir. |
elixir implementation adding new line to end of json... iex(1)> map = %{a: "a"}
%{a: "a"}
iex(2)> json = Jason.encode!(map)
"{\"a\":\"a\"}"
iex(3)> json = json <> "\n"
"{\"a\":\"a\"}\n"
iex(4)> digest = :crypto.hash(:sha256, json)
<<72, 240, 49, 34, 62, 109, 19, 11, 162, 226, 162, 167, 139, 145, 12, 84, 241,
135, 103, 97, 197, 136, 212, 17, 101, 6, 242, 208, 82, 81, 176, 200>>
iex(5)> {:ok, multihash} = Multihash.encode(:sha2_256, digest)
{:ok,
<<18, 32, 72, 240, 49, 34, 62, 109, 19, 11, 162, 226, 162, 167, 139, 145, 12,
84, 241, 135, 103, 97, 197, 136, 212, 17, 101, 6, 242, 208, 82, 81, 176,
200>>}
iex(6)> cid = CID.cid!(multihash, "raw", 1)
%CID{
codec: "raw",
multihash: <<18, 32, 72, 240, 49, 34, 62, 109, 19, 11, 162, 226, 162, 167,
139, 145, 12, 84, 241, 135, 103, 97, 197, 136, 212, 17, 101, 6, 242, 208,
82, 81, 176, 200>>,
version: 1
}
iex(7)> CID.encode(cid)
{:ok, "zb2rhbYzyUJP6euwn89vAstfgG2Au9BSwkFGUJkbujWztZWjZ"} Same as IPFS again. I would say that this is working reliably now. |
Do you think that the following points have been covered...
If so can you check them off in the acceptance criteria please? |
Going to work on the following points from the acceptance criteria...
estimate t25m. |
@RobStallion doctests are good. ✅ |
reorder readme. Updates how section for adding package to app #11
This issue/epic is dedicated exclusively to How i.e. implementation
Todo
Read the JavaScript Implementation of CID: https://github.com/multiformats/js-cid
to understand how it works. If you have questions, please ask them in: https://github.com/dwyl/learn-ipfs/issues
localhost
and try to see if re-ordering elements in an Object or nested Array produces a different CID.(i.e. this is for everyone to learn, not just the one person!)
Read the "not working" Elixir version: https://github.com/nocursor/ex-cid
Implement an "offline" version of CID in Elixir that produces the exact same CID as the JS version.
cid
of aString
should always be the same for a given string.cid
of aMap
should work regardless of the order of content.MVP CID v1
for our MVP we only need a
sha2-256
hash inBase58BTC
which is URL-safe.For this we can use the code from https://github.com/multiformats/ex_multihash (which is maintained) and https://github.com/nocursor/b58 (which is unresponsive) respectively.
doctests
that demonstrate that the code works as expected.Relevant Reading
Help Wanted
We really need help on getting this package built, documented and shipped so we can move forward with our "stack" dwyl/technology-stack#67 and "roadmap" https://github.com/dwyl/product-roadmap
If you have the curiosity, energy and time to help, please comment below! (Thanks!)
The text was updated successfully, but these errors were encountered: