Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chady Kamar #9

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 124 additions & 35 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import (
"path/filepath"
"regexp"
"strings"
"sync"
"time"
)

Expand Down Expand Up @@ -43,12 +44,20 @@ func compute() *result {
if err != nil {
fmt.Println(err)
}

var regions int
var linesAdded int
var linesDeleted int
var files []string
functionCalls := make(map[string]int)

// Reader wg
var rwg sync.WaitGroup

// lines receives the lines of the diff files from their respective goroutines
lines := make(chan string, 50)

// Line reader, one goroutine spawned per file
filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
Expand All @@ -57,46 +66,126 @@ func compute() *result {
if info.IsDir() {
return nil
}
rwg.Add(1)
go func() {
defer rwg.Done()
file, err := os.Open(path)
if err != nil {
return
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines <- scanner.Text()
}
}()

file, err := os.Open(path)
if err != nil {
return err
}
defer file.Close()

scanner := bufio.NewScanner(file)
for scanner.Scan() {
line := scanner.Text()

if strings.HasPrefix(line, "@@") {
regions++
} else if strings.HasPrefix(line, "+++") {
// If the file has been renamed or copied we keep the newer name and get rid
// of the prefix "+++ b/"
files = append(files, line[6:])
} else if strings.HasPrefix(line, "+") {
linesAdded++
} else if strings.HasPrefix(line, "-") && !strings.HasPrefix(line, "---") {
linesDeleted++
} else {
matches := re.FindAllString(line, -1)
if matches == nil {
continue
}
for _, match := range matches {
// We'll keep only the function name i.e. remove the parentheses and params
functionCall := match[:len(match)-1]

if _, ok := functionCalls[functionCall]; ok {
functionCalls[functionCall]++
} else {
functionCalls[functionCall] = 1
return nil
})

// Clean up readers
go func() {
rwg.Wait()
close(lines)
}()

// "Processor" wg
var pwg sync.WaitGroup

regionsChan := make(chan int)
linesAddedChan := make(chan int)
linesDeletedChan := make(chan int)
filesChan := make(chan string)
functionCallsChan := make(chan string)

// Receive lines and process them, then send the result to the appropriate channel defined above.
pwg.Add(1)
go func() {
defer pwg.Done()
for line := range lines {

pwg.Add(1)
go func(line string) {
defer pwg.Done()
if strings.HasPrefix(line, "@@") {
regionsChan <- 1
} else if strings.HasPrefix(line, "+++") {
// If the file has been renamed or copied we keep the newer name and get rid
// of the prefix "+++ b/"
filesChan <- line[6:]
} else if strings.HasPrefix(line, "+") {
linesAddedChan <- 1
} else if strings.HasPrefix(line, "-") && !strings.HasPrefix(line, "---") {
linesDeletedChan <- 1
} else if !strings.HasPrefix(line, "-") && !strings.HasPrefix(line[1:], "#") &&
!strings.HasPrefix(line[1:], "//") && !strings.HasPrefix(line[1:], "/*") {

matches := re.FindAllString(line, -1)
for _, match := range matches {
// We'll keep only the function name i.e. remove the parentheses and params
functionCall := match[:len(match)-1]
functionCallsChan <- functionCall
}
}
}(line)
}
}()

// CLose the processing channels
go func() {
pwg.Wait()
close(regionsChan)
close(linesAddedChan)
close(linesDeletedChan)
close(filesChan)
close(functionCallsChan)
}()

// Workers for each type
var wwg sync.WaitGroup

wwg.Add(5)

go func() {
defer wwg.Done()
for range regionsChan {
regions++

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is a bit strange optimization, starting a goroutine to increment a variable seems counterproductive as the cost of switching context between the different tasks will likely be higher than doing the operation locally (assuming you have more goroutines than cores).

Also, the channels you use are unbuffered, this means that goroutines who want to write concurrently on the channel will have to wait for the input of another to be consumed by this routine before it can continue executing.

All in all, while parallellizing the computations on different files at once is likely a cause for improvement in speed, I feel that the other routines are wasted resources as they likely do too little to bring real improvements in performance.

I may however be wrong, if you are able to provide benchmarks to justify these small routines, I'd be happy to take a look at them

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do get better results this way. Just to clarify, are you referring to the 5 goroutines which do the incrementing/appending or the fact that I spawn a goroutine to process every line? My basic reasoning for spawning a routine for every line was that appending to a slice and inserting into a map can be expensive so I would rather not wait for the result to be computed to continue processing the incoming lines. The 5 routines and their respective channels are just an (ever so slightly faster) alternative to mutexes in this case.

}
}()

go func() {
defer wwg.Done()
for range linesAddedChan {
linesAdded++
}
}()

go func() {
defer wwg.Done()
for range linesDeletedChan {
linesDeleted++
}
}()

go func() {
defer wwg.Done()
for file := range filesChan {
files = append(files, file)
}
}()

go func() {
defer wwg.Done()
for functionCall := range functionCallsChan {
if _, ok := functionCalls[functionCall]; ok {
functionCalls[functionCall]++
} else {
functionCalls[functionCall] = 1
}
}
return nil
})
}()

wwg.Wait()

return &result{files, regions, linesAdded, linesDeleted, functionCalls}

Expand Down