Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated the 'data/data-representation' session to be in line with the OpenEdu methodology. #13

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# ASCII Art

My friend Tommy is really passionate about art. Lately, he got into something called ASCII art and e-mailed me the following attachment.
I am too afraid to tell him that I don't know how to open it. Can you help?
Comment on lines +3 to +4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
My friend Tommy is really passionate about art. Lately, he got into something called ASCII art and e-mailed me the following attachment.
I am too afraid to tell him that I don't know how to open it. Can you help?
My friend Tommy is really passionate about art.
Lately, he got into something called ASCII art and e-mailed me the following attachment.
I am too afraid to tell him that I don't know how to open it.
Can you help?

Use one sentence per line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert this file to linux line endings.

Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import base64

Check failure on line 1 in chapters/data/data-representation/drills/tasks/ascii-art/solution/solution.py

View workflow job for this annotation

GitHub Actions / Checkpatch

ERROR:DOS_LINE_ENDINGS: DOS line endings

Check failure on line 1 in chapters/data/data-representation/drills/tasks/ascii-art/solution/solution.py

View workflow job for this annotation

GitHub Actions / Checkpatch

WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1

Check failure on line 2 in chapters/data/data-representation/drills/tasks/ascii-art/solution/solution.py

View workflow job for this annotation

GitHub Actions / Checkpatch

ERROR:DOS_LINE_ENDINGS: DOS line endings
text = flag = open("tommys_art_project.txt","r").read()

Check failure on line 3 in chapters/data/data-representation/drills/tasks/ascii-art/solution/solution.py

View workflow job for this annotation

GitHub Actions / Checkpatch

ERROR:DOS_LINE_ENDINGS: DOS line endings
text = (base64.b64decode(text.encode('ascii'))).decode('ascii')

Check failure on line 4 in chapters/data/data-representation/drills/tasks/ascii-art/solution/solution.py

View workflow job for this annotation

GitHub Actions / Checkpatch

ERROR:DOS_LINE_ENDINGS: DOS line endings

Check failure on line 5 in chapters/data/data-representation/drills/tasks/ascii-art/solution/solution.py

View workflow job for this annotation

GitHub Actions / Checkpatch

ERROR:DOS_LINE_ENDINGS: DOS line endings
ascii_list = []

Check failure on line 6 in chapters/data/data-representation/drills/tasks/ascii-art/solution/solution.py

View workflow job for this annotation

GitHub Actions / Checkpatch

ERROR:DOS_LINE_ENDINGS: DOS line endings
a = 0

Check failure on line 7 in chapters/data/data-representation/drills/tasks/ascii-art/solution/solution.py

View workflow job for this annotation

GitHub Actions / Checkpatch

ERROR:DOS_LINE_ENDINGS: DOS line endings

Check failure on line 8 in chapters/data/data-representation/drills/tasks/ascii-art/solution/solution.py

View workflow job for this annotation

GitHub Actions / Checkpatch

ERROR:DOS_LINE_ENDINGS: DOS line endings
for line in text:
for x in line:
if x.isdigit() == True:
a = a * 10 + int(x)
if x == ' ' or x == '#':
if a != 0:
ascii_list.append(a)
a = 0

solution = []

for x in ascii_list:
solution.append(chr(x))

for x in solution:
print(x, end="")
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ICA4MyM4MyAgICM4MyMjICAgIzEyMyMgICAgOTcjICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgMTE1ICAgCiAjICAgICAjICMgICAgICMgIyAgICAgIyAgIyAgICAjICAgICMgICM5OSMgMTA1IzEwNSAgICAgICAgIyAgICAjIDk1IzQ5IyA1MyM5NSAgNTMjNDgjICAgICMgIAogIyAgICAgICAjICAgICAgICMgICAgICAgICMgICAgOTUgICAjICMgICAgIyAgICMgICAgICAgICAgICMgICAgIyAjICAgICAgIyAgICAjICMgICAgICAgICAjICAKICA5OSM0OCAgIDQ4IzQ4ICAgNDgjNDggIDQ4ICAgICMgIyAgIyAjICAgICMgICAjICAgICAgICAgICA0OCM0OCMgNDgjNDggICMgICAgIyAjMTA4IyAgICAgIyMgCiAgICAgICAjICAgICAgICMgICAgICAgIyAgIyAgICAjICAjICMgIyAgICAjICAgIyAgICAgICAgICAgIyAgICAjICMgICAgICAjMTI1IyAgIyAgICAgICAgICMgIAogIyAgICAgIyAjICAgICAjICMgICAgICMgICMgICAgIyAgICMjICMgICAgIyAgICMgICAgICAgICAgICMgICAgIyAjICAgICAgIyAgICMgICMgICAgICAgICAjICAKICAjIyMjIyAgICMjIyMjICAgIyMjIyMgICAgIyMjICMgICAgIyAgIyMjIyAgICAjICAgICAgICAgICAjICAgICMgIyMjIyMjICMgICAgIyAjIyMjIyMgIyMjICAgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICMjIyMjIyMgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIA==
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Encoding Train

I wanted to learn encodings, so I changed my e-mail password then I encoded it 20 times with base16, base32, base64 and base85, every time using a random one.
Now I don't know how to get it back and I really need access to my e-mail. Can you take a look?
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import base64
flag = open("my_encodings.txt","rb").read()

def isBase64(s):
try:
return base64.b64encode(base64.b64decode(s)) == s
except Exception:
return False

def isBase32(s):
try:
return base64.b32encode(base64.b32decode(s)) == s
except Exception:
return False
def isBase16(s):
try:
return base64.b16encode(base64.b16decode(s)) == s
except Exception:
return False
def isBase85(s):
try:
return base64.b85encode(base64.b85decode(s)) == s
except Exception:
return False

while b'SSS{' not in flag:
if isBase16(flag):
flag = base64.b16decode(flag)
elif isBase32(flag):
flag = base64.b32decode(flag)
elif isBase64(flag):
flag = base64.b64decode(flag)
elif isBase85(flag):
flag = base64.b85decode(flag)
print(flag)

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Froggified

My human friend got turned into a frog and he's been trying to base his communication with me through a computer, but he couldn't really work his way with Linux and lost the message in all those 0s and 1s.
He jumps around angry on my desk, can you help me find his message?
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Froggified: Solution

First, we observe that we have `binary` text, so we decode it and get the fake flag.
Next up, we see that the text looks like `ASCII`, so we decode it as `decimal`, thus getting the next fake flag.
After that, we see that all digits higher than 7 are gone, so we immediately think of `octal`, which will get us to the next fake flag.
We see a lot of `0x`s, so we know to decode from `hexadecimal` and we get to the last fake flag.
In the end, we observe the text is encoding in `Base64`, so we decode it to get the real flag.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Infinity Hashes

Thanos had 6 Infinity Stones, but I have something better: 6 Infinity Hashes, that store the truth of the whole universe.
Wrap it with `SSS{}` and `_` instead of space, and spread it with everyone!
Comment on lines +3 to +4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Thanos had 6 Infinity Stones, but I have something better: 6 Infinity Hashes, that store the truth of the whole universe.
Wrap it with `SSS{}` and `_` instead of space, and spread it with everyone!
Thanos had 6 Infinity Stones, but I have something better: 6 Infinity Hashes, that store the truth of the whole universe.
Wrap it with `SSS{}`, use `_` instead of space, and spread it with everyone!

Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Infinity Hashes: Solution

Use any tool (like [Crackstation](https://crackstation.net/)) to crack the hashes.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
9dee45a24efffc78483a02cfcfd83433 7d7764888dc13da6436666cd97043a5ead014596cd9aef6d12f95a5602d15993beebd44b92014732c340239e7cd1f07c5ed4035d1a02151916e94d43ffa55cfa 54c37762351ac25b7e86477d8a078faf347fcfabdf044c4376d0728ce6cdcfe4801e0e1bcdbc0e2d71042fbf20779b2dd580975ac6c59d157250d0ce398f7c69 2b167392676cab961cb5d2ed5633ae05 db3d405b10675998c030223177d42e71b4e7a312 9ed1515819dec61fd361d5fdabb57f41ecce1a5fe1fe263b98c0d6943b9b232e
31 changes: 31 additions & 0 deletions chapters/data/data-representation/media/binary_meme.svg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly there was no need for this image to be .svg, since it's a meme, not a constructed image, but it's fine

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions chapters/data/data-representation/media/blender_hashing.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
53 changes: 53 additions & 0 deletions chapters/data/data-representation/media/data_signals.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions chapters/data/data-representation/media/numeral_systems1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions chapters/data/data-representation/media/numeral_systems2.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions chapters/data/data-representation/media/unix_file_permissions1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions chapters/data/data-representation/media/unix_file_permissions2.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
40 changes: 0 additions & 40 deletions chapters/data/data-representation/reading/README.md

This file was deleted.

198 changes: 198 additions & 0 deletions chapters/data/data-representation/reading/character-encoding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
# Character Encoding

## ASCII

ASCII (American Standard Code for Information Interchange):
Going from 0 - 127

```text
DEC HEX ASCII DEC HEX ASCII DEC HEX ASCII DEC HEX ASCII DEC HEX ASCII
0 00 NUL 26 1A SUB 52 34 4 78 4E N 104 68 h
1 01 SOH 27 1B ESC 53 35 5 79 4F O 105 69 i
2 02 STX 28 1C FS 54 36 6 80 50 P 106 6A j
3 03 ETX 29 1D GS 55 37 7 81 51 Q 107 6B k
4 04 EOT 30 1E RS 56 38 8 82 52 R 108 6C l
5 05 ENQ 31 1F US 57 39 9 83 53 S 109 6D m
6 06 ACK 32 20 SPACE 58 3A : 84 54 T 110 6E n
7 07 BEL 33 21 ! 59 3B ; 85 55 U 111 6F o
8 08 BS 34 22 " 60 3C < 86 56 V 112 70 p
9 09 HT 35 23 # 61 3D = 87 57 W 113 71 q
10 0A LF 36 24 $ 62 3E > 88 58 X 114 72 r
11 0B VT 37 25 % 63 3F ? 89 59 Y 115 73 s
12 0C FF 38 26 & 64 40 @ 90 5A Z 116 74 t
13 0D CR 39 27 ' 65 41 A 91 5B [ 117 75 u
14 0E SO 40 28 ( 66 42 B 92 5C \ 118 76 v
15 0F SI 41 29 ) 67 43 C 93 5D ] 119 77 w
16 10 DLE 42 2A * 68 44 D 94 5E ^ 120 78 x
17 11 DC1 43 2B + 69 45 E 95 5F _ 121 79 y
18 12 DC2 44 2C , 70 46 F 96 60 ` 122 7A z
19 13 DC3 45 2D - 71 47 G 97 61 a 123 7B {
20 14 DC4 46 2E . 72 48 H 98 62 b 124 7C |
21 15 NAK 47 2F / 73 49 I 99 63 c 125 7D }
22 16 SYN 48 30 0 74 4A J 100 64 d 126 7E ~
23 17 ETB 49 31 1 75 4B K 101 65 e 127 7F
24 18 CAN 50 32 2 76 4C L 102 66 f
25 19 EM 51 33 3 77 4D M 103 67 g
```

We can see that by adding 32 to an uppercase letter, we get that same letter in lowercase:
e.g. `E + 32 = e`

Below, you can see the built-in Python functions `ord` and `chr` that help us determine what character coresponds to a certain ASCII code and what ASCII code a character has.

```py
>>> ord('E')
69
>>> chr(69)
'E'
>>> chr(ord('E') + 32)
'e'
>>> chr(ord('e') - 32)
'E'
>>>
```

In terms of storage efficiency, we can encode
`UTF-8` for ASCII text (English and other Western languages)
`UTF-16` for non-ASCII text (Chinese and other Asian languages)

Let's say we have a string in Chinese. With Python, we can get the hex bytes of the string, using the built-in function `str.encode()`:

```py
Str = ("老板")
print(Str)

Str = (("老板").encode("utf-8"))
print(Str)
```

The output will be:

```text
老板
b'\xe8\x80\x81\xe6\x9d\xbf'
```

If we want to get back to the original string, we will execute:

```py
print((Str.decode()))
```

And we will get:

```text
老板
```

## Base64

Base64 is a way of representing binary data in sequences of 24 bits (3 bytes) that can be represented by 4 Base64 digits.

Base64 Encoding Table:

```text
Index Char Index Char Index Char Index Char
0 A 16 Q 32 g 48 w
1 B 17 R 33 h 49 x
2 C 18 S 34 i 50 y
3 D 19 T 35 j 51 z
4 E 20 U 36 k 52 0
5 F 21 V 37 l 53 1
6 G 22 W 38 m 54 2
7 H 23 X 39 n 55 3
8 I 24 Y 40 o 56 4
9 J 25 Z 41 p 57 5
10 K 26 a 42 q 58 6
11 L 27 b 43 r 59 7
12 M 28 c 44 s 60 8
13 N 29 d 45 t 61 9
14 O 30 e 46 u 62 +
15 P 31 f 47 v 63 /
```

If we want to convert to and from Base64 in Python we can use the `base64` module:

```py
import base64

message = "Some random message"
message_bytes = message.encode('ascii') # We transform the string to "b'Some random message'", making it a sequence of bytes
base64_bytes = base64.b64encode(message_bytes)
base64_message = base64_bytes.decode('ascii') # We do the opposite of the previous process, now eliminating "b''", to make it a string

print(base64_message)
```

Thus getting us to the base64 string:

```text
U29tZSByYW5kb20gbWVzc2FnZQ==
```

If we would want to decode it, we would have to simply revert the commands as follows:

```py
import base64

base64_message = 'U29tZSByYW5kb20gbWVzc2FnZQ=='
base64_bytes = base64_message.encode('ascii')
message_bytes = base64.b64decode(base64_bytes)
message = message_bytes.decode('ascii')

print(message)
```

Which will get us back to:

```text
Some random message
```

As you can see, some Base64 strings have "=" at the end, some have "==" and others have nothing uncommon.
Since Base64 represents binary data in 3 bytes, we should also know how to treat the case when the length is not divisible by 3.
As a consequence, there is output padding for Base64 as follows:

```text
length % 3 = 1 => "=="
length % 3 = 2 => "="
length % 3 = 0 => no padding
```

The following can be used to better understand output padding:

```py
import base64

text1 = "SecuritySummmerSch"
text2 = "SecuritySummmerScho"
text3 = "SecuritySummmerSchoo"
text4 = "SecuritySummmerSchool"

b64_text1 = (base64.b64encode(text1.encode('ascii'))).decode('ascii')
b64_text2 = (base64.b64encode(text2.encode('ascii'))).decode('ascii')
b64_text3 = (base64.b64encode(text3.encode('ascii'))).decode('ascii')
b64_text4 = (base64.b64encode(text4.encode('ascii'))).decode('ascii')

print(f"Plain text\t\tPT length\tBase64 text\t\t\tB64 length")
print(f"{text1}\t{len(text1)}\t\t{b64_text1}\t{len(b64_text1)}")
print(f"{text2}\t{len(text2)}\t\t{b64_text2}\t{len(b64_text2)}")
print(f"{text3}\t{len(text3)}\t\t{b64_text3}\t{len(b64_text3)}")
print(f"{text4}\t{len(text4)}\t\t{b64_text4}\t{len(b64_text4)}")
```

We get:

```html
Plain text PT length Base64 text B64 length
SecuritySummmerSch 18 U2VjdXJpdHlTdW1tbWVyU2No 24
SecuritySummmerScho 19 U2VjdXJpdHlTdW1tbWVyU2Nobw== 28
SecuritySummmerSchoo 20 U2VjdXJpdHlTdW1tbWVyU2Nob28= 28
SecuritySummmerSchool 21 U2VjdXJpdHlTdW1tbWVyU2Nob29s 28
```

Try decoding yourself!

```text
SGVsbG8gZnJvbSB0aGUgRWFydGgtNjQgIQ==
```
Loading
Loading