继续阅读Performance Characterization of Mobile GP-GPUs
Performance Characterization of Mobile GP-GPUs
继续阅读Performance Characterization of Mobile GP-GPUs
目前 ( 2019/04/24 ),在 macOS Mojave
(10.14.4
)系统上使用 brew install octave
,安装 Octave 5.1.0
之后,使用 pause()
函数无法在点击键盘之后继续执行,除了 Ctrl + C
之外任意键都不响应。正常情况下,点击任意按键之后,应该继续执行后续的代码。
这个是目前使用 brew
安装的 Octave 5.1.0
在编译的时候,关联的库是 glibc 2.28
之后的版本。这个版本上 glibc 2.28
的某些行为发生变动。具体的讨论信息,参考 bug #55029: pause() with no arguments does not return like kbhit() with glibc 2.28 上的讨论。本质就是 glibc 2.28
之后的版本要求应用程序在接收信息结束( EOF
)之后,主动调用 clearerr (stdin);
,否则会收不到后续的按键通知。这个 BUG
在 Octave 5.2
版本被修复,但是这个版本何时发布,暂时不定。
目前的修复方式为要求 brew
从最新版本的代码编译安装,而不是安装已发布版本,如下:
1 2 3 4 5 6 7 8 9 10 |
$ brew uninstall --ignore-dependencies octave # 安装编译依赖 $ brew install texinfo $ wget https://raw.githubusercontent.com/Homebrew/homebrew-core/master/Formula/octave.rb $ sed -i "" "s/\"--enable-shared\"/\"--enable-shared\",\"--disable-docs\"/g" octave.rb $ brew install --build-from-source --HEAD -v octave.rb |
修改下载的编译配置文件,并且关闭文档编译( 目前文档编译会失败),也就是增加 --disable-docs
这个编译参数。
调整之后的编译脚本如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
class Octave < Formula desc "High-level interpreted language for numerical computing" homepage "https://www.gnu.org/software/octave/index.html" url "https://ftp.gnu.org/gnu/octave/octave-5.1.0.tar.xz" mirror "https://ftpmirror.gnu.org/octave/octave-5.1.0.tar.xz" sha256 "87b4df6dfa28b1f8028f69659f7a1cabd50adfb81e1e02212ff22c863a29454e" revision 2 bottle do sha256 "6bb8497839d6f7872efcd6acad0216f443420e097a9b7fad44835823e1c0e735" => :mojave sha256 "d1de53a30f002d8b7ec3a6065994c46d8cbd4830aa7e199f572baff48723c6e6" => :high_sierra sha256 "7a648cff129ec85a5ee9417a0339a3b804756f7958585b707c015d322d220b15" => :sierra end head do url "https://hg.savannah.gnu.org/hgweb/octave", :branch => "default", :using => :hg depends_on "autoconf" => :build depends_on "automake" => :build depends_on "bison" => :build depends_on "icoutils" => :build depends_on "librsvg" => :build end # Complete list of dependencies at https://wiki.octave.org/Building depends_on "gnu-sed" => :build # https://lists.gnu.org/archive/html/octave-maintainers/2016-09/msg00193.html depends_on :java => ["1.6+", :build] depends_on "pkg-config" => :build depends_on "arpack" depends_on "epstool" depends_on "fftw" depends_on "fig2dev" depends_on "fltk" depends_on "fontconfig" depends_on "freetype" depends_on "gcc" # for gfortran depends_on "ghostscript" depends_on "gl2ps" depends_on "glpk" depends_on "gnuplot" depends_on "graphicsmagick" depends_on "hdf5" depends_on "libsndfile" depends_on "libtool" depends_on "pcre" depends_on "portaudio" depends_on "pstoedit" depends_on "qhull" depends_on "qrupdate" depends_on "qt" depends_on "readline" depends_on "suite-sparse" depends_on "sundials" depends_on "texinfo" depends_on "veclibfort" # Dependencies use Fortran, leading to spurious messages about GCC cxxstdlib_check :skip def install # Default configuration passes all linker flags to mkoctfile, to be # inserted into every oct/mex build. This is unnecessary and can cause # cause linking problems. inreplace "src/mkoctfile.in.cc", /%OCTAVE_CONF_OCT(AVE)?_LINK_(DEPS|OPTS)%/, '""' # Qt 5.12 compatibility # https://savannah.gnu.org/bugs/?55187 ENV["QCOLLECTIONGENERATOR"] = "qhelpgenerator" # These "shouldn't" be necessary, but the build breaks without them. # https://savannah.gnu.org/bugs/?55883 ENV["QT_CPPFLAGS"]="-I#{Formula["qt"].opt_include}" ENV.append "CPPFLAGS", "-I#{Formula["qt"].opt_include}" ENV["QT_LDFLAGS"]="-F#{Formula["qt"].opt_lib}" ENV.append "LDFLAGS", "-F#{Formula["qt"].opt_lib}" system "./bootstrap" if build.head? system "./configure", "--prefix=#{prefix}", "--disable-dependency-tracking", "--disable-silent-rules", "--enable-link-all-dependencies", "--enable-shared","--disable-docs", "--disable-static", "--with-hdf5-includedir=#{Formula["hdf5"].opt_include}", "--with-hdf5-libdir=#{Formula["hdf5"].opt_lib}", "--with-x=no", "--with-blas=-L#{Formula["veclibfort"].opt_lib} -lvecLibFort", "--with-portaudio", "--with-sndfile" system "make", "all" # Avoid revision bumps whenever fftw's or gcc's Cellar paths change inreplace "src/mkoctfile.cc" do |s| s.gsub! Formula["fftw"].prefix.realpath, Formula["fftw"].opt_prefix s.gsub! Formula["gcc"].prefix.realpath, Formula["gcc"].opt_prefix end # Make sure that Octave uses the modern texinfo at run time rcfile = buildpath/"scripts/startup/site-rcfile" rcfile.append_lines "makeinfo_program(\"#{Formula["texinfo"].opt_bin}/makeinfo\");" system "make", "install" end test do system bin/"octave", "--eval", "(22/7 - pi)/pi" # This is supposed to crash octave if there is a problem with veclibfort system bin/"octave", "--eval", "single ([1+i 2+i 3+i]) * single ([ 4+i ; 5+i ; 6+i])" end end |
This is the sequel of the single precision SSE optimized sin, cos, log and exp that I wrote some time ago. Adapted to the NEON fpu of my pandaboard. Precision and range are exactly the same than the SSE version, so I won't repeat them.
command line: gcc -O3 -mfloat-abi=softfp -mfpu=neon -march=armv7-a -mtune=cortex-a9 -Wall -W neon_mathfun_test.c -lm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
exp([ -1000, -100, 100, 1000]) = [ 0, 0, 2.4061436e+38, 2.4061436e+38] exp([ -nan, inf, -inf, nan]) = [ nan, 2.4061436e+38, 0, nan] log([ 0, -10, 1e+30, 1.0005271e-42]) = [ -nan, -nan, 69.077553, -nan] log([ -nan, inf, -inf, nan]) = [ 89.128304, 88.722839, -nan, 89.128304] sin([ -nan, inf, -inf, nan]) = [ nan, nan, -nan, nan] cos([ -nan, inf, -inf, nan]) = [ nan, nan, nan, nan] sin([ -1e+30, -100000, 1e+30, 100000]) = [ inf, -0.035749275, -inf, 0.035749275] cos([ -1e+30, -100000, 1e+30, 100000]) = [ nan, -0.9993608, nan, -0.9993608] benching sinf .. -> 2.0 millions of vector evaluations/second -> 121 cycles/value on a 1000MHz computer benching cosf .. -> 1.8 millions of vector evaluations/second -> 132 cycles/value on a 1000MHz computer benching expf .. -> 1.1 millions of vector evaluations/second -> 221 cycles/value on a 1000MHz computer benching logf .. -> 1.7 millions of vector evaluations/second -> 141 cycles/value on a 1000MHz computer benching cephes_sinf .. -> 2.4 millions of vector evaluations/second -> 103 cycles/value on a 1000MHz computer benching cephes_cosf .. -> 2.0 millions of vector evaluations/second -> 123 cycles/value on a 1000MHz computer benching cephes_expf .. -> 1.6 millions of vector evaluations/second -> 153 cycles/value on a 1000MHz computer benching cephes_logf .. -> 1.5 millions of vector evaluations/second -> 156 cycles/value on a 1000MHz computer benching sin_ps .. -> 5.8 millions of vector evaluations/second -> 43 cycles/value on a 1000MHz computer benching cos_ps .. -> 5.9 millions of vector evaluations/second -> 42 cycles/value on a 1000MHz computer benching sincos_ps .. -> 6.0 millions of vector evaluations/second -> 41 cycles/value on a 1000MHz computer benching exp_ps .. -> 5.6 millions of vector evaluations/second -> 44 cycles/value on a 1000MHz computer benching log_ps .. -> 5.3 millions of vector evaluations/second -> 47 cycles/value on a 1000MHz computer |
So performance is not stellar. I recommend to use gcc 4.6.1 or newer as it generates much better code than previous (gcc 4.5) versions -- almost 20% faster here. I believe rewriting these functions in assembly would improve the performance by 30%, and should not be very hard as the ARM and NEON asm is quite nice and easy to write -- maybe I'll do it. Computing two SIMD vectors at once would also help to improve a lot the performance as there are enough registers on NEON, and it would reduce the dependancies between neon instructions.
Note also that I have no idea of the performance on a Cortex A8 -- it may be extremely bad, I don't know.
command line: cl.exe /arch:SSE /O2 /TP /MD sse_mathfun_test.c (this is msvc 2010)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
benching sinf .. -> 1.3 millions of vector evaluations/second -> 303 cycles/value on a 1600MHz computer benching cosf .. -> 1.3 millions of vector evaluations/second -> 305 cycles/value on a 1600MHz computer benching sincos (x87) .. -> 1.2 millions of vector evaluations/second -> 314 cycles/value on a 1600MHz computer benching expf .. -> 1.6 millions of vector evaluations/second -> 244 cycles/value on a 1600MHz computer benching logf .. -> 1.4 millions of vector evaluations/second -> 276 cycles/value on a 1600MHz computer benching cephes_sinf .. -> 1.4 millions of vector evaluations/second -> 280 cycles/value on a 1600MHz computer benching cephes_cosf .. -> 1.5 millions of vector evaluations/second -> 265 cycles/value on a 1600MHz computer benching cephes_expf .. -> 0.7 millions of vector evaluations/second -> 548 cycles/value on a 1600MHz computer benching cephes_logf .. -> 0.8 millions of vector evaluations/second -> 489 cycles/value on a 1600MHz computer benching sin_ps .. -> 9.2 millions of vector evaluations/second -> 43 cycles/value on a 1600MHz computer benching cos_ps .. -> 9.5 millions of vector evaluations/second -> 42 cycles/value on a 1600MHz computer benching sincos_ps .. -> 8.8 millions of vector evaluations/second -> 45 cycles/value on a 1600MHz computer benching exp_ps .. -> 9.8 millions of vector evaluations/second -> 41 cycles/value on a 1600MHz computer benching log_ps .. -> 8.6 millions of vector evaluations/second -> 46 cycles/value on a 1600MHz computer |
有时需要用Matlab
调试某些C
语言开发的函数库,需要在Matlab
里面查看执行效果。
整个的参考例子如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
#include <mex.h> // Check if some command is really some givent one static bool commandIs(const mxArray* mxCommand, const char* command) { double result; mxArray* plhs1[1]; mxArray* prhs1[1]; mxArray* plhs2[1]; mxArray* prhs2[2]; if (mxCommand == NULL) { mexErrMsgTxt("'mxCommand' is null"); return false; } if (command == NULL) { mexErrMsgTxt("'command' is null"); return false; } if (!mxIsChar(mxCommand)) { mexErrMsgTxt("'mxCommand' is not a string"); return false; } // First trim prhs1[0] = (mxArray*)mxCommand; mexCallMATLAB(1, plhs1, 1, prhs1, "strtrim"); // Then compare prhs2[0] = mxCreateString(command); prhs2[1] = plhs1[0]; mexCallMATLAB(1, plhs2, 2, prhs2, "strcmpi"); // Return comparison result result = mxGetScalar(plhs2[0]); return (result != 0.0); } static void processHelpMessageCommand(void) { mexPrintf("DspMgr('init') init return Handle,return nil if failed. use 'release' free memory\n"); mexPrintf("DspMgr('release',handle) free memory\n"); } static void processInitCommand(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { char* example_buffer = malloc(512); plhs[0] = mxCreateNumericMatrix(1,1,mxUINT64_CLASS,mxREAL); long long *ip = (long long *) mxGetData(plhs[0]); *ip = (long long)example_buffer; } static void processReleaseCommand(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { if(nrhs != 2) { mexErrMsgTxt("release need 1 params"); } else { if(!mxIsUint64(prhs[1])) { mexErrMsgTxt("release handle must be UINT64 format"); return; } int M=mxGetM(prhs[1]); //获得矩阵的行数 int N=mxGetN(prhs[1]); //获得矩阵的列数 if((1 != M) &&(1 != N)) { mexErrMsgTxt("release handle must be 1*1 array format"); return; } long long ip = mxGetScalar(prhs[1]); char* example_buffer = (char*)ip; free(example_buffer); //return true avoid warnning plhs[0] = mxCreateNumericMatrix(1,1,mxINT8_CLASS,mxREAL); char* mx_data = (char *) mxGetData(plhs[0]); mx_data[0] = 1; } } // Mex entry point void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { // Arguments parsing if (nrhs < 1) { mexErrMsgTxt("Not enough input arguments. use 'DspMgr help' for help message."); return; } if (!mxIsChar(prhs[0])) { mexErrMsgTxt("First parameter must be a string."); return; } // Command selection if (commandIs(prhs[0], "HELP")) { processHelpMessageCommand(); } else if (commandIs(prhs[0], "init")) { processInitCommand(nlhs, plhs, nrhs, prhs); } else if (commandIs(prhs[0], "release")) { processReleaseCommand(nlhs, plhs, nrhs, prhs); } else { mexErrMsgTxt("Unknown command or command not implemented yet."); } } |
尤其注意上面例子里我们如何隐藏一个C
里申请的指针并传递给Matlab
。
Matlab
的调用例子如下:
1 2 3 4 5 6 |
mex -output DspMgr 'CFLAGS="\$CFLAGS -std=c99"' '*.c' v = DspMgr('init') DspMgr('release',v) |
泰勒公式是将一个在x=x0处具有n阶导数的函数f(x)利用关于(x-x0)的n次多项式来逼近函数的方法。
若函数f(x)在包含x0的某个闭区间[a,b]上具有n阶导数,且在开区间(a,b)上具有(n+1)阶导数,则对闭区间[a,b]上任意一点x,成立下式:
其中,表示f(x)的n阶导数,等号后的多项式称为函数f(x)在x0处的泰勒展开式,剩余的Rn(x)是泰勒公式的余项,是(x-x0)n的高阶无穷小。
这里需要注意的是,我们规定0的阶乘 " 0!=1 "。
卡尔曼滤波原论文 A New Approach to Linear Filtering and Prediction Problems
继续阅读卡尔曼滤波原论文 A New Approach to Linear Filtering and Prediction Problems
下载Word文档 高斯函数
希腊字母表
|
||||||
序号
|
大写
|
小写
|
英文注音
|
国际音标注音
|
中文读音
|
意义
|
1
|
Α
|
α
|
alpha
|
a:lf
|
阿尔法
|
角度;系数
|
2
|
Β
|
β
|
beta
|
bet
|
贝塔
|
磁通系数;角度;系数
|
3
|
Γ
|
γ
|
gamma
|
ga:m
|
伽马
|
电导系数(小写)
|
4
|
Δ
|
δ
|
delta
|
delt
|
德尔塔
|
变动;密度;屈光度
|
5
|
Ε
|
ε
|
epsilon
|
ep
silon |
艾普西龙
|
对数之基数
|
6
|
Ζ
|
ζ
|
zeta
|
zat
|
截塔
|
系数;方位角;阻抗;相对粘度;原子序数
|
7
|
Η
|
η
|
eta
|
eit
|
艾塔
|
磁滞系数;效率(小写)
|
8
|
Θ
|
θ
|
thet
|
θit
|
西塔
|
温度;相位角
|
9
|
Ι
|
ι
|
iot
|
aiot
|
约塔
|
微小,一点儿
|
10
|
Κ
|
κ
|
kappa
|
kap
|
卡帕
|
介质常数
|
11
|
Λ
|
λ
|
lambda
|
lambd
|
兰布达
|
波长(小写);体积
|
12
|
Μ
|
μ
|
mu
|
mju
|
缪
|
磁导系数微(千分之一)放大因数(小写)
|
13
|
Ν
|
ν
|
nu
|
nju
|
纽
|
磁阻系数
|
14
|
Ξ
|
ξ
|
xi
|
ksi
|
克西
|
数学上的随机变量
|
15
|
Ο
|
ο
|
omicron
|
omikron
|
奥密克戎
|
|
16
|
Π
|
π
|
pi
|
pai
|
派
|
圆周率=圆周÷直径=3.14159 26535 89793
|
17
|
Ρ
|
ρ
|
rho
|
rou
|
肉
|
电阻系数(小写)
|
18
|
Σ
|
σ
|
sigma
|
sigma |
西格马
|
总和(大写),表面密度;跨导(小写)
|
19
|
Τ
|
τ
|
tau
|
tau
|
套
|
时间常数
|
20
|
Υ
|
υ
|
upsilon
|
jupsilon
|
伊普西龙
|
位移
|
21
|
Φ
|
φ
|
phi
|
fai
|
佛爱
|
磁通;角
|
22
|
Χ
|
χ
|
chi
|
phai
|
西
|
|
23
|
Ψ
|
ψ
|
psi
|
psai
|
普西
|
角速;介质电通量(静电力线);角
|
24
|
Ω
|
ω
|
omega
|
o`miga
|
欧米伽
|
欧姆(大写);角速(小写);角
|