-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a valid Neoverse N1 target. #623
Open
everton1984
wants to merge
1
commit into
flame:master
Choose a base branch
from
everton1984:add-n1
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
/* | ||
|
||
BLIS | ||
An object-based framework for developing high-performance BLAS-like | ||
libraries. | ||
|
||
Copyright (C) 2014, The University of Texas at Austin | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are | ||
met: | ||
- Redistributions of source code must retain the above copyright | ||
notice, this list of conditions and the following disclaimer. | ||
- Redistributions in binary form must reproduce the above copyright | ||
notice, this list of conditions and the following disclaimer in the | ||
documentation and/or other materials provided with the distribution. | ||
- Neither the name(s) of the copyright holder(s) nor the names of its | ||
contributors may be used to endorse or promote products derived | ||
from this software without specific prior written permission. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | ||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | ||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | ||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | ||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | ||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | ||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | ||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | ||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
|
||
*/ | ||
|
||
#include "blis.h" | ||
|
||
void bli_cntx_init_neoversen1( cntx_t* cntx ) | ||
{ | ||
blksz_t blkszs[ BLIS_NUM_BLKSZS ]; | ||
|
||
// Set default kernel blocksizes and functions. | ||
bli_cntx_init_neoversen1_ref( cntx ); | ||
|
||
// ------------------------------------------------------------------------- | ||
|
||
// Update the context with optimized native gemm micro-kernels and | ||
// their storage preferences. | ||
bli_cntx_set_l3_nat_ukrs | ||
( | ||
2, | ||
BLIS_GEMM_UKR, BLIS_FLOAT, bli_sgemm_armv8a_asm_8x12, FALSE, | ||
BLIS_GEMM_UKR, BLIS_DOUBLE, bli_dgemm_armv8a_asm_6x8, FALSE, | ||
cntx | ||
); | ||
|
||
// Initialize level-3 blocksize objects with architecture-specific values. | ||
// s d c z | ||
bli_blksz_init_easy( &blkszs[ BLIS_MR ], 8, 6, -1, -1 ); | ||
bli_blksz_init_easy( &blkszs[ BLIS_NR ], 12, 8, -1, -1 ); | ||
bli_blksz_init_easy( &blkszs[ BLIS_MC ], 120, 120, -1, -1 ); | ||
bli_blksz_init_easy( &blkszs[ BLIS_KC ], 640, 240, -1, -1 ); | ||
bli_blksz_init_easy( &blkszs[ BLIS_NC ], 3072, 3072, -1, -1 ); | ||
|
||
// Update the context with the current architecture's register and cache | ||
// blocksizes (and multiples) for native execution. | ||
bli_cntx_set_blkszs | ||
( | ||
BLIS_NAT, 5, | ||
BLIS_NC, &blkszs[ BLIS_NC ], BLIS_NR, | ||
BLIS_KC, &blkszs[ BLIS_KC ], BLIS_KR, | ||
BLIS_MC, &blkszs[ BLIS_MC ], BLIS_MR, | ||
BLIS_NR, &blkszs[ BLIS_NR ], BLIS_NR, | ||
BLIS_MR, &blkszs[ BLIS_MR ], BLIS_MR, | ||
cntx | ||
); | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
/* | ||
|
||
BLIS | ||
An object-based framework for developing high-performance BLAS-like | ||
libraries. | ||
|
||
Copyright (C) 2014, The University of Texas at Austin | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are | ||
met: | ||
- Redistributions of source code must retain the above copyright | ||
notice, this list of conditions and the following disclaimer. | ||
- Redistributions in binary form must reproduce the above copyright | ||
notice, this list of conditions and the following disclaimer in the | ||
documentation and/or other materials provided with the distribution. | ||
- Neither the name(s) of the copyright holder(s) nor the names of its | ||
contributors may be used to endorse or promote products derived | ||
from this software without specific prior written permission. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | ||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | ||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | ||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | ||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | ||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | ||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | ||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | ||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
|
||
*/ | ||
|
||
//#ifndef BLIS_FAMILY_H | ||
//#define BLIS_FAMILY_H | ||
|
||
|
||
// -- MEMORY ALLOCATION -------------------------------------------------------- | ||
|
||
#define BLIS_SIMD_ALIGN_SIZE 16 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# | ||
# | ||
# BLIS | ||
# An object-based framework for developing high-performance BLAS-like | ||
# libraries. | ||
# | ||
# Copyright (C) 2014, The University of Texas at Austin | ||
# | ||
# Redistribution and use in source and binary forms, with or without | ||
# modification, are permitted provided that the following conditions are | ||
# met: | ||
# - Redistributions of source code must retain the above copyright | ||
# notice, this list of conditions and the following disclaimer. | ||
# - Redistributions in binary form must reproduce the above copyright | ||
# notice, this list of conditions and the following disclaimer in the | ||
# documentation and/or other materials provided with the distribution. | ||
# - Neither the name(s) of the copyright holder(s) nor the names of its | ||
# contributors may be used to endorse or promote products derived | ||
# from this software without specific prior written permission. | ||
# | ||
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS | ||
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT | ||
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR | ||
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT | ||
# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, | ||
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT | ||
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | ||
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY | ||
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
# | ||
# | ||
|
||
|
||
# Declare the name of the current configuration and add it to the | ||
# running list of configurations included by common.mk. | ||
THIS_CONFIG := neoversen1 | ||
#CONFIGS_INCL += $(THIS_CONFIG) | ||
|
||
# | ||
# --- Determine the C compiler and related flags --- | ||
# | ||
|
||
# NOTE: The build system will append these variables with various | ||
# general-purpose/configuration-agnostic flags in common.mk. You | ||
# may specify additional flags here as needed. | ||
CPPROCFLAGS := -D_GNU_SOURCE | ||
CMISCFLAGS := | ||
CPICFLAGS := | ||
CWARNFLAGS := | ||
|
||
ifneq ($(DEBUG_TYPE),off) | ||
CDBGFLAGS := -g | ||
endif | ||
|
||
ifeq ($(DEBUG_TYPE),noopt) | ||
COPTFLAGS := -O0 | ||
else | ||
COPTFLAGS := -O2 -mcpu=neoverse-n1 | ||
endif | ||
|
||
# Flags specific to optimized kernels. | ||
CKOPTFLAGS := $(COPTFLAGS) -O3 -ftree-vectorize | ||
ifeq ($(CC_VENDOR),gcc) | ||
CKVECFLAGS := -mcpu=neoverse-n1 | ||
else | ||
ifeq ($(CC_VENDOR),clang) | ||
CKVECFLAGS := -mcpu=neoverse-n1 | ||
else | ||
$(error gcc or clang is required for this configuration.) | ||
endif | ||
endif | ||
|
||
# Flags specific to reference kernels. | ||
CROPTFLAGS := $(CKOPTFLAGS) | ||
ifeq ($(CC_VENDOR),gcc) | ||
CRVECFLAGS := $(CKVECFLAGS) -funsafe-math-optimizations -ffp-contract=fast | ||
else | ||
ifeq ($(CC_VENDOR),clang) | ||
CRVECFLAGS := $(CKVECFLAGS) -funsafe-math-optimizations -ffp-contract=fast | ||
else | ||
CRVECFLAGS := $(CKVECFLAGS) | ||
endif | ||
endif | ||
|
||
# Store all of the variables here to new variables containing the | ||
# configuration name. | ||
$(eval $(call store-make-defs,$(THIS_CONFIG))) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, that's great to see neoverse n1 tuning. Can I ask you how you came up with these blocksize values ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! To be honest I just wanted the compiler to generate tuned
neoverse-n1
code with this patch so blocksize values were taken fromthunderx2
. If BLIS has a standard procedure to generate those value I am all up for it, please just let me know.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I value what you did, however i don't have the answer for this.
@devinamatthews any pointer you could share ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@egaudry Do you think the fine tuning is essential to merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a clear interface and arch detection makes sense indeed, however without proper tuning, mergers/reviewers might not see this as a priority.
Just guessing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jeff Diamond has better tuning parameters for N1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeffhammond Thanks for commenting. Could you please point me to Jeff Diamond so I could ask him if he is able to share his parameters please?