深海游弋的鱼 – 第 2 页

Performance Characterization of Mobile GP-GPUs

Performance Characterization of Mobile GP-GPUs
继续阅读

Matrix Multiplication with OpenCL

Matrix Multiplication with OpenCL
继续阅读

macOS Mojave(10.14.4)系统Octave 5.1.0使用pause()函数无法响应按键事件

目前 ( 2019/04/24 )，在 macOS Mojave (10.14.4)系统上使用 brew install octave ，安装 Octave 5.1.0 之后，使用 pause() 函数无法在点击键盘之后继续执行，除了 Ctrl + C 之外任意键都不响应。正常情况下，点击任意按键之后，应该继续执行后续的代码。

这个是目前使用 brew 安装的 Octave 5.1.0 在编译的时候，关联的库是 glibc 2.28 之后的版本。这个版本上 glibc 2.28 的某些行为发生变动。具体的讨论信息，参考 bug #55029: pause() with no arguments does not return like kbhit() with glibc 2.28 上的讨论。本质就是 glibc 2.28 之后的版本要求应用程序在接收信息结束（ EOF ）之后，主动调用 clearerr (stdin); ，否则会收不到后续的按键通知。这个 BUG 在 Octave 5.2 版本被修复，但是这个版本何时发布，暂时不定。

目前的修复方式为要求 brew 从最新版本的代码编译安装，而不是安装已发布版本，如下：

$ brew uninstall --ignore-dependencies octave

# 安装编译依赖
$ brew install texinfo

$ wget https://raw.githubusercontent.com/Homebrew/homebrew-core/master/Formula/octave.rb

$ sed -i "" "s/\"--enable-shared\"/\"--enable-shared\",\"--disable-docs\"/g" octave.rb

$ brew install --build-from-source --HEAD -v octave.rb

$ brew uninstall --ignore-dependencies octave

# 安装编译依赖

$ brew install texinfo

$ wget https://raw.githubusercontent.com/Homebrew/homebrew-core/master/Formula/octave.rb

$ sed -i "" "s/\"--enable-shared\"/\"--enable-shared\",\"--disable-docs\"/g" octave.rb

$ brew install --build-from-source --HEAD -v octave.rb

修改下载的编译配置文件,并且关闭文档编译( 目前文档编译会失败），也就是增加 --disable-docs 这个编译参数。

调整之后的编译脚本如下：

class Octave < Formula
  desc "High-level interpreted language for numerical computing"
  homepage "https://www.gnu.org/software/octave/index.html"
  url "https://ftp.gnu.org/gnu/octave/octave-5.1.0.tar.xz"
  mirror "https://ftpmirror.gnu.org/octave/octave-5.1.0.tar.xz"
  sha256 "87b4df6dfa28b1f8028f69659f7a1cabd50adfb81e1e02212ff22c863a29454e"
  revision 2

  bottle do
    sha256 "6bb8497839d6f7872efcd6acad0216f443420e097a9b7fad44835823e1c0e735" => :mojave
    sha256 "d1de53a30f002d8b7ec3a6065994c46d8cbd4830aa7e199f572baff48723c6e6" => :high_sierra
    sha256 "7a648cff129ec85a5ee9417a0339a3b804756f7958585b707c015d322d220b15" => :sierra
  end

  head do
    url "https://hg.savannah.gnu.org/hgweb/octave", :branch => "default", :using => :hg

    depends_on "autoconf" => :build
    depends_on "automake" => :build
    depends_on "bison" => :build
    depends_on "icoutils" => :build
    depends_on "librsvg" => :build
  end

  # Complete list of dependencies at https://wiki.octave.org/Building
  depends_on "gnu-sed" => :build # https://lists.gnu.org/archive/html/octave-maintainers/2016-09/msg00193.html
  depends_on :java => ["1.6+", :build]
  depends_on "pkg-config" => :build
  depends_on "arpack"
  depends_on "epstool"
  depends_on "fftw"
  depends_on "fig2dev"
  depends_on "fltk"
  depends_on "fontconfig"
  depends_on "freetype"
  depends_on "gcc" # for gfortran
  depends_on "ghostscript"
  depends_on "gl2ps"
  depends_on "glpk"
  depends_on "gnuplot"
  depends_on "graphicsmagick"
  depends_on "hdf5"
  depends_on "libsndfile"
  depends_on "libtool"
  depends_on "pcre"
  depends_on "portaudio"
  depends_on "pstoedit"
  depends_on "qhull"
  depends_on "qrupdate"
  depends_on "qt"
  depends_on "readline"
  depends_on "suite-sparse"
  depends_on "sundials"
  depends_on "texinfo"
  depends_on "veclibfort"

  # Dependencies use Fortran, leading to spurious messages about GCC
  cxxstdlib_check :skip

  def install
    # Default configuration passes all linker flags to mkoctfile, to be
    # inserted into every oct/mex build. This is unnecessary and can cause
    # cause linking problems.
    inreplace "src/mkoctfile.in.cc",
              /%OCTAVE_CONF_OCT(AVE)?_LINK_(DEPS|OPTS)%/,
              '""'

    # Qt 5.12 compatibility
    # https://savannah.gnu.org/bugs/?55187
    ENV["QCOLLECTIONGENERATOR"] = "qhelpgenerator"
    # These "shouldn't" be necessary, but the build breaks without them.
    # https://savannah.gnu.org/bugs/?55883
    ENV["QT_CPPFLAGS"]="-I#{Formula["qt"].opt_include}"
    ENV.append "CPPFLAGS", "-I#{Formula["qt"].opt_include}"
    ENV["QT_LDFLAGS"]="-F#{Formula["qt"].opt_lib}"
    ENV.append "LDFLAGS", "-F#{Formula["qt"].opt_lib}"

    system "./bootstrap" if build.head?
    system "./configure", "--prefix=#{prefix}",
                          "--disable-dependency-tracking",
                          "--disable-silent-rules",
                          "--enable-link-all-dependencies",
                          "--enable-shared","--disable-docs",
                          "--disable-static",
                          "--with-hdf5-includedir=#{Formula["hdf5"].opt_include}",
                          "--with-hdf5-libdir=#{Formula["hdf5"].opt_lib}",
                          "--with-x=no",
                          "--with-blas=-L#{Formula["veclibfort"].opt_lib} -lvecLibFort",
                          "--with-portaudio",
                          "--with-sndfile"
    system "make", "all"

    # Avoid revision bumps whenever fftw's or gcc's Cellar paths change
    inreplace "src/mkoctfile.cc" do |s|
      s.gsub! Formula["fftw"].prefix.realpath, Formula["fftw"].opt_prefix
      s.gsub! Formula["gcc"].prefix.realpath, Formula["gcc"].opt_prefix
    end

    # Make sure that Octave uses the modern texinfo at run time
    rcfile = buildpath/"scripts/startup/site-rcfile"
    rcfile.append_lines "makeinfo_program(\"#{Formula["texinfo"].opt_bin}/makeinfo\");"

    system "make", "install"
  end

  test do
    system bin/"octave", "--eval", "(22/7 - pi)/pi"
    # This is supposed to crash octave if there is a problem with veclibfort
    system bin/"octave", "--eval", "single ([1+i 2+i 3+i]) * single ([ 4+i ; 5+i ; 6+i])"
  end
end

100

101

102

103

104

105

106

107

108

109

110

111

class Octave < Formula

desc "High-level interpreted language for numerical computing"

homepage "https://www.gnu.org/software/octave/index.html"

url "https://ftp.gnu.org/gnu/octave/octave-5.1.0.tar.xz"

mirror "https://ftpmirror.gnu.org/octave/octave-5.1.0.tar.xz"

sha256 "87b4df6dfa28b1f8028f69659f7a1cabd50adfb81e1e02212ff22c863a29454e"

revision 2

bottle do

sha256 "6bb8497839d6f7872efcd6acad0216f443420e097a9b7fad44835823e1c0e735" => :mojave

sha256 "d1de53a30f002d8b7ec3a6065994c46d8cbd4830aa7e199f572baff48723c6e6" => :high_sierra

sha256 "7a648cff129ec85a5ee9417a0339a3b804756f7958585b707c015d322d220b15" => :sierra

end

head do

url "https://hg.savannah.gnu.org/hgweb/octave", :branch => "default", :using => :hg

depends_on "autoconf" => :build

depends_on "automake" => :build

depends_on "bison" => :build

depends_on "icoutils" => :build

depends_on "librsvg" => :build

end

# Complete list of dependencies at https://wiki.octave.org/Building

depends_on "gnu-sed" => :build # https://lists.gnu.org/archive/html/octave-maintainers/2016-09/msg00193.html

depends_on :java => ["1.6+", :build]

depends_on "pkg-config" => :build

depends_on "arpack"

depends_on "epstool"

depends_on "fftw"

depends_on "fig2dev"

depends_on "fltk"

depends_on "fontconfig"

depends_on "freetype"

depends_on "gcc" # for gfortran

depends_on "ghostscript"

depends_on "gl2ps"

depends_on "glpk"

depends_on "gnuplot"

depends_on "graphicsmagick"

depends_on "hdf5"

depends_on "libsndfile"

depends_on "libtool"

depends_on "pcre"

depends_on "portaudio"

depends_on "pstoedit"

depends_on "qhull"

depends_on "qrupdate"

depends_on "qt"

depends_on "readline"

depends_on "suite-sparse"

depends_on "sundials"

depends_on "texinfo"

depends_on "veclibfort"

# Dependencies use Fortran, leading to spurious messages about GCC

cxxstdlib_check :skip

def install

# Default configuration passes all linker flags to mkoctfile, to be

# inserted into every oct/mex build. This is unnecessary and can cause

# cause linking problems.

inreplace "src/mkoctfile.in.cc",

/%OCTAVE_CONF_OCT(AVE)?_LINK_(DEPS|OPTS)%/,

'""'

# Qt 5.12 compatibility

# https://savannah.gnu.org/bugs/?55187

ENV["QCOLLECTIONGENERATOR"] = "qhelpgenerator"

# These "shouldn't" be necessary, but the build breaks without them.

# https://savannah.gnu.org/bugs/?55883

ENV["QT_CPPFLAGS"]="-I#{Formula["qt"].opt_include}"

ENV.append "CPPFLAGS", "-I#{Formula["qt"].opt_include}"

ENV["QT_LDFLAGS"]="-F#{Formula["qt"].opt_lib}"

ENV.append "LDFLAGS", "-F#{Formula["qt"].opt_lib}"

system "./bootstrap" if build.head?

system "./configure", "--prefix=#{prefix}",

"--disable-dependency-tracking",

"--disable-silent-rules",

"--enable-link-all-dependencies",

"--enable-shared","--disable-docs",

"--disable-static",

"--with-hdf5-includedir=#{Formula["hdf5"].opt_include}",

"--with-hdf5-libdir=#{Formula["hdf5"].opt_lib}",

"--with-x=no",

"--with-blas=-L#{Formula["veclibfort"].opt_lib} -lvecLibFort",

"--with-portaudio",

"--with-sndfile"

system "make", "all"

# Avoid revision bumps whenever fftw's or gcc's Cellar paths change

inreplace "src/mkoctfile.cc" do |s|

s.gsub! Formula["fftw"].prefix.realpath, Formula["fftw"].opt_prefix

s.gsub! Formula["gcc"].prefix.realpath, Formula["gcc"].opt_prefix

end

# Make sure that Octave uses the modern texinfo at run time

rcfile = buildpath/"scripts/startup/site-rcfile"

rcfile.append_lines "makeinfo_program(\"#{Formula["texinfo"].opt_bin}/makeinfo\");"

system "make", "install"

end

test do

system bin/"octave", "--eval", "(22/7 - pi)/pi"

# This is supposed to crash octave if there is a problem with veclibfort

system bin/"octave", "--eval", "single ([1+i 2+i 3+i]) * single ([ 4+i ; 5+i ; 6+i])"

end

参考链接

Simple ARM NEON optimized sin, cos, log and exp

This is the sequel of the single precision SSE optimized sin, cos, log and exp that I wrote some time ago. Adapted to the NEON fpu of my pandaboard. Precision and range are exactly the same than the SSE version, so I won't repeat them.

The code

The functions below are licensed under the zlib license, so you can do basically what you want with them.

neon_mathfun.h source code for sin_ps, cos_ps, sincos_ps, exp_ps, log_ps, as straight C.
neon_mathfun_test.c Validation+Bench program for those function. Do not forget to run it once.

Performance

Results on a pandaboard with a 1GHz dual-core ARM Cortex A9 (OMAP4), using gcc 4.6.1

command line: gcc -O3 -mfloat-abi=softfp -mfpu=neon -march=armv7-a -mtune=cortex-a9 -Wall -W neon_mathfun_test.c -lm

exp([        -1000,          -100,           100,          1000]) = [            0,             0, 2.4061436e+38, 2.4061436e+38]
exp([         -nan,           inf,          -inf,           nan]) = [          nan, 2.4061436e+38,             0,           nan]
log([            0,           -10,         1e+30, 1.0005271e-42]) = [         -nan,          -nan,     69.077553,          -nan]
log([         -nan,           inf,          -inf,           nan]) = [    89.128304,     88.722839,          -nan,     89.128304]
sin([         -nan,           inf,          -inf,           nan]) = [          nan,           nan,          -nan,           nan]
cos([         -nan,           inf,          -inf,           nan]) = [          nan,           nan,           nan,           nan]
sin([       -1e+30,       -100000,         1e+30,        100000]) = [          inf,  -0.035749275,          -inf,   0.035749275]
cos([       -1e+30,       -100000,         1e+30,        100000]) = [          nan,    -0.9993608,           nan,    -0.9993608]
benching                 sinf .. ->    2.0 millions of vector evaluations/second -> 121 cycles/value on a 1000MHz computer
benching                 cosf .. ->    1.8 millions of vector evaluations/second -> 132 cycles/value on a 1000MHz computer
benching                 expf .. ->    1.1 millions of vector evaluations/second -> 221 cycles/value on a 1000MHz computer
benching                 logf .. ->    1.7 millions of vector evaluations/second -> 141 cycles/value on a 1000MHz computer
benching          cephes_sinf .. ->    2.4 millions of vector evaluations/second -> 103 cycles/value on a 1000MHz computer
benching          cephes_cosf .. ->    2.0 millions of vector evaluations/second -> 123 cycles/value on a 1000MHz computer
benching          cephes_expf .. ->    1.6 millions of vector evaluations/second -> 153 cycles/value on a 1000MHz computer
benching          cephes_logf .. ->    1.5 millions of vector evaluations/second -> 156 cycles/value on a 1000MHz computer
benching               sin_ps .. ->    5.8 millions of vector evaluations/second ->  43 cycles/value on a 1000MHz computer
benching               cos_ps .. ->    5.9 millions of vector evaluations/second ->  42 cycles/value on a 1000MHz computer
benching            sincos_ps .. ->    6.0 millions of vector evaluations/second ->  41 cycles/value on a 1000MHz computer
benching               exp_ps .. ->    5.6 millions of vector evaluations/second ->  44 cycles/value on a 1000MHz computer
benching               log_ps .. ->    5.3 millions of vector evaluations/second ->  47 cycles/value on a 1000MHz computer

exp([ -1000, -100, 100, 1000]) = [ 0, 0, 2.4061436e+38, 2.4061436e+38]

exp([ -nan, inf, -inf, nan]) = [ nan, 2.4061436e+38, 0, nan]

log([ 0, -10, 1e+30, 1.0005271e-42]) = [ -nan, -nan, 69.077553, -nan]

log([ -nan, inf, -inf, nan]) = [ 89.128304, 88.722839, -nan, 89.128304]

sin([ -nan, inf, -inf, nan]) = [ nan, nan, -nan, nan]

cos([ -nan, inf, -inf, nan]) = [ nan, nan, nan, nan]

sin([ -1e+30, -100000, 1e+30, 100000]) = [ inf, -0.035749275, -inf, 0.035749275]

cos([ -1e+30, -100000, 1e+30, 100000]) = [ nan, -0.9993608, nan, -0.9993608]

benching sinf .. -> 2.0 millions of vector evaluations/second -> 121 cycles/value on a 1000MHz computer

benching cosf .. -> 1.8 millions of vector evaluations/second -> 132 cycles/value on a 1000MHz computer

benching expf .. -> 1.1 millions of vector evaluations/second -> 221 cycles/value on a 1000MHz computer

benching logf .. -> 1.7 millions of vector evaluations/second -> 141 cycles/value on a 1000MHz computer

benching cephes_sinf .. -> 2.4 millions of vector evaluations/second -> 103 cycles/value on a 1000MHz computer

benching cephes_cosf .. -> 2.0 millions of vector evaluations/second -> 123 cycles/value on a 1000MHz computer

benching cephes_expf .. -> 1.6 millions of vector evaluations/second -> 153 cycles/value on a 1000MHz computer

benching cephes_logf .. -> 1.5 millions of vector evaluations/second -> 156 cycles/value on a 1000MHz computer

benching sin_ps .. -> 5.8 millions of vector evaluations/second -> 43 cycles/value on a 1000MHz computer

benching cos_ps .. -> 5.9 millions of vector evaluations/second -> 42 cycles/value on a 1000MHz computer

benching sincos_ps .. -> 6.0 millions of vector evaluations/second -> 41 cycles/value on a 1000MHz computer

benching exp_ps .. -> 5.6 millions of vector evaluations/second -> 44 cycles/value on a 1000MHz computer

benching log_ps .. -> 5.3 millions of vector evaluations/second -> 47 cycles/value on a 1000MHz computer

So performance is not stellar. I recommend to use gcc 4.6.1 or newer as it generates much better code than previous (gcc 4.5) versions -- almost 20% faster here. I believe rewriting these functions in assembly would improve the performance by 30%, and should not be very hard as the ARM and NEON asm is quite nice and easy to write -- maybe I'll do it. Computing two SIMD vectors at once would also help to improve a lot the performance as there are enough registers on NEON, and it would reduce the dependancies between neon instructions.

Note also that I have no idea of the performance on a Cortex A8 -- it may be extremely bad, I don't know.

Comparison with an Intel Atom

For comparison purposes, here is the performance of the SSE version on a single core Intel Atom N270 running at 1.66GHz

command line: cl.exe /arch:SSE /O2 /TP /MD sse_mathfun_test.c (this is msvc 2010)

benching                 sinf .. ->    1.3 millions of vector evaluations/second -> 303 cycles/value on a 1600MHz computer
benching                 cosf .. ->    1.3 millions of vector evaluations/second -> 305 cycles/value on a 1600MHz computer
benching         sincos (x87) .. ->    1.2 millions of vector evaluations/second -> 314 cycles/value on a 1600MHz computer
benching                 expf .. ->    1.6 millions of vector evaluations/second -> 244 cycles/value on a 1600MHz computer
benching                 logf .. ->    1.4 millions of vector evaluations/second -> 276 cycles/value on a 1600MHz computer
benching          cephes_sinf .. ->    1.4 millions of vector evaluations/second -> 280 cycles/value on a 1600MHz computer
benching          cephes_cosf .. ->    1.5 millions of vector evaluations/second -> 265 cycles/value on a 1600MHz computer
benching          cephes_expf .. ->    0.7 millions of vector evaluations/second -> 548 cycles/value on a 1600MHz computer
benching          cephes_logf .. ->    0.8 millions of vector evaluations/second -> 489 cycles/value on a 1600MHz computer
benching               sin_ps .. ->    9.2 millions of vector evaluations/second ->  43 cycles/value on a 1600MHz computer
benching               cos_ps .. ->    9.5 millions of vector evaluations/second ->  42 cycles/value on a 1600MHz computer
benching            sincos_ps .. ->    8.8 millions of vector evaluations/second ->  45 cycles/value on a 1600MHz computer
benching               exp_ps .. ->    9.8 millions of vector evaluations/second ->  41 cycles/value on a 1600MHz computer
benching               log_ps .. ->    8.6 millions of vector evaluations/second ->  46 cycles/value on a 1600MHz computer

benching sinf .. -> 1.3 millions of vector evaluations/second -> 303 cycles/value on a 1600MHz computer

benching cosf .. -> 1.3 millions of vector evaluations/second -> 305 cycles/value on a 1600MHz computer

benching sincos (x87) .. -> 1.2 millions of vector evaluations/second -> 314 cycles/value on a 1600MHz computer

benching expf .. -> 1.6 millions of vector evaluations/second -> 244 cycles/value on a 1600MHz computer

benching logf .. -> 1.4 millions of vector evaluations/second -> 276 cycles/value on a 1600MHz computer

benching cephes_sinf .. -> 1.4 millions of vector evaluations/second -> 280 cycles/value on a 1600MHz computer

benching cephes_cosf .. -> 1.5 millions of vector evaluations/second -> 265 cycles/value on a 1600MHz computer

benching cephes_expf .. -> 0.7 millions of vector evaluations/second -> 548 cycles/value on a 1600MHz computer

benching cephes_logf .. -> 0.8 millions of vector evaluations/second -> 489 cycles/value on a 1600MHz computer

benching sin_ps .. -> 9.2 millions of vector evaluations/second -> 43 cycles/value on a 1600MHz computer

benching cos_ps .. -> 9.5 millions of vector evaluations/second -> 42 cycles/value on a 1600MHz computer

benching sincos_ps .. -> 8.8 millions of vector evaluations/second -> 45 cycles/value on a 1600MHz computer

benching exp_ps .. -> 9.8 millions of vector evaluations/second -> 41 cycles/value on a 1600MHz computer

benching log_ps .. -> 8.6 millions of vector evaluations/second -> 46 cycles/value on a 1600MHz computer

The number of cycles is quite similar -- but the atom has a higher clock..

Last modified: 2011/05/29

参考链接

Simple ARM NEON optimized sin, cos, log and exp

Matlab调用C程序

有时需要用Matlab调试某些C语言开发的函数库，需要在Matlab里面查看执行效果。

整个的参考例子如下：

#include <mex.h>

// Check if some command is really some givent one
static bool commandIs(const mxArray* mxCommand, const char* command)
{
    double result;
    mxArray* plhs1[1];
    mxArray* prhs1[1];
    mxArray* plhs2[1];  
    mxArray* prhs2[2];

    if (mxCommand == NULL) { mexErrMsgTxt("'mxCommand' is null"); return false; }
    if (command == NULL) { mexErrMsgTxt("'command' is null"); return false; }
    if (!mxIsChar(mxCommand)) { mexErrMsgTxt("'mxCommand' is not a string"); return false; }

    // First trim
    prhs1[0] = (mxArray*)mxCommand;
    mexCallMATLAB(1, plhs1, 1, prhs1, "strtrim");

    // Then compare
    prhs2[0] = mxCreateString(command);
    prhs2[1] = plhs1[0];
    mexCallMATLAB(1, plhs2, 2, prhs2, "strcmpi");

    // Return comparison result
    result = mxGetScalar(plhs2[0]);  
    return (result != 0.0);
}

static void processHelpMessageCommand(void)
{
    mexPrintf("DspMgr('init') init return Handle,return nil if failed. use 'release' free memory\n"); 
    mexPrintf("DspMgr('release',handle) free memory\n");     
}

static void processInitCommand(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{        
    char* example_buffer = malloc(512);
    plhs[0] = mxCreateNumericMatrix(1,1,mxUINT64_CLASS,mxREAL);
    long long *ip = (long long *) mxGetData(plhs[0]);
    *ip = (long long)example_buffer;
}

static void processReleaseCommand(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    if(nrhs != 2) {
        mexErrMsgTxt("release need 1 params"); 
    } else {
        if(!mxIsUint64(prhs[1])) {
           mexErrMsgTxt("release handle must be UINT64 format");
           return;
        }
        
        int M=mxGetM(prhs[1]); //获得矩阵的行数 
        int N=mxGetN(prhs[1]);  //获得矩阵的列数 
        if((1 != M) &&(1 != N)) {
           mexErrMsgTxt("release handle must be 1*1 array format");
           return; 
        }
        
        long long ip = mxGetScalar(prhs[1]);
        char* example_buffer = (char*)ip;
        free(example_buffer);
        
        //return true avoid warnning
        plhs[0] = mxCreateNumericMatrix(1,1,mxINT8_CLASS,mxREAL);
        char* mx_data = (char *) mxGetData(plhs[0]);
        mx_data[0] = 1;
    }    
}

// Mex entry point
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    // Arguments parsing
    if (nrhs < 1) { mexErrMsgTxt("Not enough input arguments. use 'DspMgr help' for help message."); return; }
    if (!mxIsChar(prhs[0])) { mexErrMsgTxt("First parameter must be a string."); return; }

    // Command selection
    if (commandIs(prhs[0], "HELP")) { processHelpMessageCommand(); }
    else if (commandIs(prhs[0], "init")) { processInitCommand(nlhs, plhs, nrhs, prhs); }
    else if (commandIs(prhs[0], "release")) { processReleaseCommand(nlhs, plhs, nrhs, prhs); }
    else { mexErrMsgTxt("Unknown command or command not implemented yet."); }
}

#include <mex.h>

// Check if some command is really some givent one

static bool commandIs(const mxArray* mxCommand, const char* command)

{

double result;

mxArray* plhs1[1];

mxArray* prhs1[1];

mxArray* plhs2[1];

mxArray* prhs2[2];

if (mxCommand == NULL) { mexErrMsgTxt("'mxCommand' is null"); return false; }

if (command == NULL) { mexErrMsgTxt("'command' is null"); return false; }

if (!mxIsChar(mxCommand)) { mexErrMsgTxt("'mxCommand' is not a string"); return false; }

// First trim

prhs1[0] = (mxArray*)mxCommand;

mexCallMATLAB(1, plhs1, 1, prhs1, "strtrim");

// Then compare

prhs2[0] = mxCreateString(command);

prhs2[1] = plhs1[0];

mexCallMATLAB(1, plhs2, 2, prhs2, "strcmpi");

// Return comparison result

result = mxGetScalar(plhs2[0]);

return (result != 0.0);

}

static void processHelpMessageCommand(void)

{

mexPrintf("DspMgr('init') init return Handle,return nil if failed. use 'release' free memory\n");

mexPrintf("DspMgr('release',handle) free memory\n");

}

static void processInitCommand(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])

{

char* example_buffer = malloc(512);

plhs[0] = mxCreateNumericMatrix(1,1,mxUINT64_CLASS,mxREAL);

long long *ip = (long long *) mxGetData(plhs[0]);

*ip = (long long)example_buffer;

}

static void processReleaseCommand(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])

{

if(nrhs != 2) {

mexErrMsgTxt("release need 1 params");

} else {

if(!mxIsUint64(prhs[1])) {

mexErrMsgTxt("release handle must be UINT64 format");

return;

}

int M=mxGetM(prhs[1]); //获得矩阵的行数

int N=mxGetN(prhs[1]); //获得矩阵的列数

if((1 != M) &&(1 != N)) {

mexErrMsgTxt("release handle must be 1*1 array format");

return;

}

long long ip = mxGetScalar(prhs[1]);

char* example_buffer = (char*)ip;

free(example_buffer);

//return true avoid warnning

plhs[0] = mxCreateNumericMatrix(1,1,mxINT8_CLASS,mxREAL);

char* mx_data = (char *) mxGetData(plhs[0]);

mx_data[0] = 1;

}

// Mex entry point

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])

{

// Arguments parsing

if (nrhs < 1) { mexErrMsgTxt("Not enough input arguments. use 'DspMgr help' for help message."); return; }

if (!mxIsChar(prhs[0])) { mexErrMsgTxt("First parameter must be a string."); return; }

// Command selection

if (commandIs(prhs[0], "HELP")) { processHelpMessageCommand(); }

else if (commandIs(prhs[0], "init")) { processInitCommand(nlhs, plhs, nrhs, prhs); }

else if (commandIs(prhs[0], "release")) { processReleaseCommand(nlhs, plhs, nrhs, prhs); }

else { mexErrMsgTxt("Unknown command or command not implemented yet."); }

}

尤其注意上面例子里我们如何隐藏一个C里申请的指针并传递给Matlab。

Matlab的调用例子如下：

mex -output DspMgr 'CFLAGS="\$CFLAGS -std=c99"' '*.c'

v = DspMgr('init')

DspMgr('release',v)

mex -output DspMgr 'CFLAGS="\$CFLAGS -std=c99"' '*.c'

v = DspMgr('init')

DspMgr('release',v)

参考链接

泰勒公式

泰勒公式是将一个在x=x0处具有n阶导数的函数f(x)利用关于(x-x0)的n次多项式来逼近函数的方法。

若函数f(x)在包含x0的某个闭区间[a,b]上具有n阶导数，且在开区间(a,b)上具有(n+1)阶导数，则对闭区间[a,b]上任意一点x，成立下式：

其中，表示f(x)的n阶导数，等号后的多项式称为函数f(x)在x₀处的泰勒展开式，剩余的R_n(x)是泰勒公式的余项，是(x-x₀)ⁿ的高阶无穷小。

这里需要注意的是，我们规定0的阶乘 " 0！=1 "。

希腊字母表
序号	大写	小写	英文注音	国际音标注音	中文读音	意义
1	Α	α	alpha	a:lf	阿尔法	角度；系数
2	Β	β	beta	bet	贝塔	磁通系数；角度；系数
3	Γ	γ	gamma	ga:m	伽马	电导系数（小写）
4	Δ	δ	delta	delt	德尔塔	变动；密度；屈光度
5	Ε	ε	epsilon	ep`silon`	艾普西龙	对数之基数
6	Ζ	ζ	zeta	zat	截塔	系数；方位角；阻抗；相对粘度；原子序数
7	Η	η	eta	eit	艾塔	磁滞系数；效率（小写）
8	Θ	θ	thet	θit	西塔	温度；相位角
9	Ι	ι	iot	aiot	约塔	微小，一点儿
10	Κ	κ	kappa	kap	卡帕	介质常数
11	Λ	λ	lambda	lambd	兰布达	波长（小写）；体积
12	Μ	μ	mu	mju	缪	磁导系数微（千分之一）放大因数（小写）
13	Ν	ν	nu	nju	纽	磁阻系数
14	Ξ	ξ	xi	ksi	克西	数学上的随机变量
15	Ο	ο	omicron	omikron	奥密克戎
16	Π	π	pi	pai	派	圆周率=圆周÷直径=3.14159 26535 89793
17	Ρ	ρ	rho	rou	肉	电阻系数（小写）
18	Σ	σ	sigma	`sigma`	西格马	总和（大写），表面密度；跨导（小写）
19	Τ	τ	tau	tau	套	时间常数
20	Υ	υ	upsilon	jupsilon	伊普西龙	位移
21	Φ	φ	phi	fai	佛爱	磁通；角
22	Χ	χ	chi	phai	西
23	Ψ	ψ	psi	psai	普西	角速；介质电通量（静电力线）；角
24	Ω	ω	omega	o`miga	欧米伽	欧姆（大写）；角速（小写）；角

分类：数学

Performance Characterization of Mobile GP-GPUs

Matrix Multiplication with OpenCL

macOS Mojave(10.14.4)系统Octave 5.1.0使用pause()函数无法响应按键事件

参考链接

Simple ARM NEON optimized sin, cos, log and exp

The code

The functions below are licensed under the zlib license, so you can do basically what you want with them.

Performance

Results on a pandaboard with a 1GHz dual-core ARM Cortex A9 (OMAP4), using gcc 4.6.1

Comparison with an Intel Atom

For comparison purposes, here is the performance of the SSE version on a single core Intel Atom N270 running at 1.66GHz

The number of cycles is quite similar -- but the atom has a higher clock..

Last modified: 2011/05/29

参考链接

Matlab调用C程序

参考链接

泰勒公式

参考链接

卡尔曼滤波原论文 A New Approach to Linear Filtering and Prediction Problems

高斯函数

常用数学符号希腊字母表

2025 年 4 月
一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30