Uncategorized – Page 3

DIP Image Segmentation

Thresholding

Origin:

Basic Global Thresholding:

Mean:

Median:

Optimum Global Thresholding Using Otsu’s Method:

Kapur:

Kittler–Illingworth:

自适应中值滤波 adaptive median filter

四图的顺序是原图，0.25椒盐噪声污染后图，7×7中值滤波，Smax=7自适应中值滤波。

效果太好

直方图均衡化 Histogram equalization

课程作业直方图均衡化——维基百科

原图 http://imgur.com/IekvLAJ

均衡化处理前

均衡化处理后

灰度，均衡化处理前

灰度，均衡化处理后

还是挺有用的

国内NTP！

203.195.247.192 / ntp-gz.chl.la

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+time-a.nist.gov .ACTS.           1 u  117  128  375  219.880   -3.162   9.807
+time-b.nist.gov .ACTS.           1 u   63  128  377  212.481    4.291  10.382
+time-c.nist.gov .ACTS.           1 u   72  128  373  221.437   -4.065  15.729
+time-d.nist.gov .ACTS.           1 u  123  128  377  224.083   -4.272  13.544
+24.56.178.140   .ACTS.           1 u   80  128  377  190.612   -7.805   4.921
x128.138.141.172 .ACTS.           1 u  114  128  377  316.636   48.359   8.748
+131.107.13.100  .ACTS.           1 u  251  128  376  164.240  -11.402   8.598
+223.255.185.2   .MRS.            1 u  855  128  100   12.286    5.916  11.465
+ntp-a2.nict.go. .NICT.           1 u   14  128  173   54.171    4.101   7.690
#ntp0.nc.u-tokyo .GPS.            1 u  500  128  370  115.024  -35.263   8.081
+ntp-sop.inria.f .GPS.            1 u   14  128  253  222.405   -2.242  23.575
+ntp-p1.obspm.fr .TS-3.           1 u  114  128  377  317.704   -1.992  14.829
*ts0.itsc.cuhk.e .GPS.            1 u   43  128  377    6.798    4.658   6.391
+ntp.ix.ru       .PPS.            1 u  124  128  373  424.717   -9.539  10.470
+ntp.neu.edu.cn  95.222.122.210   2 u    8  128  377   52.216    1.041   5.245

remote refid st t when poll reach delay offset jitter

==============================================================================

+time-a.nist.gov .ACTS. 1 u 117 128 375 219.880 -3.162 9.807

+time-b.nist.gov .ACTS. 1 u 63 128 377 212.481 4.291 10.382

+time-c.nist.gov .ACTS. 1 u 72 128 373 221.437 -4.065 15.729

+time-d.nist.gov .ACTS. 1 u 123 128 377 224.083 -4.272 13.544

+24.56.178.140 .ACTS. 1 u 80 128 377 190.612 -7.805 4.921

x128.138.141.172 .ACTS. 1 u 114 128 377 316.636 48.359 8.748

+131.107.13.100 .ACTS. 1 u 251 128 376 164.240 -11.402 8.598

+223.255.185.2 .MRS. 1 u 855 128 100 12.286 5.916 11.465

+ntp-a2.nict.go. .NICT. 1 u 14 128 173 54.171 4.101 7.690

#ntp0.nc.u-tokyo .GPS. 1 u 500 128 370 115.024 -35.263 8.081

+ntp-sop.inria.f .GPS. 1 u 14 128 253 222.405 -2.242 23.575

+ntp-p1.obspm.fr .TS-3. 1 u 114 128 377 317.704 -1.992 14.829

*ts0.itsc.cuhk.e .GPS. 1 u 43 128 377 6.798 4.658 6.391

+ntp.ix.ru .PPS. 1 u 124 128 373 424.717 -9.539 10.470

+ntp.neu.edu.cn 95.222.122.210 2 u 8 128 377 52.216 1.041 5.245

120.24.166.46 / ntp-sz.chl.la

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        .LOCL.          10 l  46h   64    0    0.000    0.000   0.000
#time-a.nist.gov .ACTS.           1 u   15  256  377  378.818  -83.364  17.463
#time-b.nist.gov .ACTS.           1 u   70  256  277  476.310  -41.924  35.146
#time-c.nist.gov .ACTS.           1 u    7  256  377  395.963  -43.723  34.185
#time-d.nist.gov .ACTS.           1 u   12  256  377  483.643  -34.337  30.085
#24.56.178.140   .ACTS.           1 u  221  256  371  291.978   50.514  46.840
-131.107.13.100  .ACTS.           1 u  136  256  377  177.669   -6.028 139.785
*118.143.17.82   .MRS.            1 u  188  256  357   13.079    0.081   0.350
+ntp-b3.nict.go. .NICT.           1 u  142  256  377   63.631   -0.085   0.274
+ntp0.nc.u-tokyo .GPS.            1 u  161  256  377   63.430   -0.350  24.636
-ntp-sop.inria.f .GPS.            1 u   20  256  377  243.417    2.893   1.610
-ntp-p1.obspm.fr .TS-3.           1 u  113  256  377  302.075   -4.333  37.451
#ts0.itsc.cuhk.e .GPS.            1 u  151  256  365   75.393   32.617  54.784
-ntp.ix.ru       .PPS.            1 u  211  256  377  414.779    5.133   9.858
-ntp.neu.edu.cn  .PPS.            1 u  150  256  377   65.779    0.803   7.601
#ptbtime1.ptb.de .PTB.            1 u   63  256  377  354.657   20.788  16.343
-ptbtime2.ptb.de .PTB.            1 u   13  256  377  333.716    8.016   1.129
-ptbtime3.ptb.de .PTB.            1 u   59  256  377  348.999    1.459  19.869

remote refid st t when poll reach delay offset jitter

==============================================================================

LOCAL(0) .LOCL. 10 l 46h 64 0 0.000 0.000 0.000

#time-a.nist.gov .ACTS. 1 u 15 256 377 378.818 -83.364 17.463

#time-b.nist.gov .ACTS. 1 u 70 256 277 476.310 -41.924 35.146

#time-c.nist.gov .ACTS. 1 u 7 256 377 395.963 -43.723 34.185

#time-d.nist.gov .ACTS. 1 u 12 256 377 483.643 -34.337 30.085

#24.56.178.140 .ACTS. 1 u 221 256 371 291.978 50.514 46.840

-131.107.13.100 .ACTS. 1 u 136 256 377 177.669 -6.028 139.785

*118.143.17.82 .MRS. 1 u 188 256 357 13.079 0.081 0.350

+ntp-b3.nict.go. .NICT. 1 u 142 256 377 63.631 -0.085 0.274

+ntp0.nc.u-tokyo .GPS. 1 u 161 256 377 63.430 -0.350 24.636

-ntp-sop.inria.f .GPS. 1 u 20 256 377 243.417 2.893 1.610

-ntp-p1.obspm.fr .TS-3. 1 u 113 256 377 302.075 -4.333 37.451

#ts0.itsc.cuhk.e .GPS. 1 u 151 256 365 75.393 32.617 54.784

-ntp.ix.ru .PPS. 1 u 211 256 377 414.779 5.133 9.858

-ntp.neu.edu.cn .PPS. 1 u 150 256 377 65.779 0.803 7.601

#ptbtime1.ptb.de .PTB. 1 u 63 256 377 354.657 20.788 16.343

-ptbtime2.ptb.de .PTB. 1 u 13 256 377 333.716 8.016 1.129

-ptbtime3.ptb.de .PTB. 1 u 59 256 377 348.999 1.459 19.869

TheB

用Scrapy及Gensim对哔哩哔哩弹幕网的标签进行Word2Vec语义分析

本来是课程作业, 但也还是放出来吧.

Scrapy是一个爬虫，Gensim则是一个语义分析软件。Word2Vec是一个“深度学习”的，将一个个单词变为一个个向量的算法。

实际操作过程很简单, Scrapy抓Bilibili, Gensim对结果做Word2Vec分析, 然后用Tkinter写UI界面.

import codecs
import scrapy
from b.items import CHL
import re

class MySpider(scrapy.Spider):
    name = 'bili'
    allowed_domains = ['www.bilibili.com']
    amy = ['http://www.bilibili.com/']
    for i in range(7,2878725):
        amy.append("http://www.bilibili.com/video/av"+str(i))
    start_urls = amy
    
    def parse(self, response):
        item = CHL();
        try:
            item['tag'] = response.xpath('//meta[@name=\'keywords\']/@content').extract()[0]
            thestr = item['tag'].replace(',',' ') + '\n'
            with codecs.open('2878725', 'a', 'utf-8') as f:
                f.write(thestr)
        except IndexError:
            pass
        return item

import codecs

import scrapy

from b.items import CHL

import re

class MySpider(scrapy.Spider):

name = 'bili'

allowed_domains = ['www.bilibili.com']

amy = ['http://www.bilibili.com/']

for i in range(7,2878725):

amy.append("http://www.bilibili.com/video/av"+str(i))

start_urls = amy

def parse(self, response):

item = CHL();

try:

item['tag'] = response.xpath('//meta[@name=\'keywords\']/@content').extract()[0]

thestr = item['tag'].replace(',',' ') + '\n'

with codecs.open('2878725', 'a', 'utf-8') as f:

f.write(thestr)

except IndexError:

pass

return item

因为前几天上Bili发现编号只到2878725，所以就到2878725了。作业的版本是用CrawlSpider的，但其实B站视频编号连续，顺序爬就可以。顺序爬的速度比用爬虫爬快好多，而且占用资源也少。之前爬虫爬到结果大概不足100M，Linode那VPS就已经内存不足了，现在完整结果有316M。

由于有很多投稿失效或者是”只有会员知道的世界”,爬完后的结果里面,会有很多B站的默认Tag.为避免影响结果,要删去.

grep -v 'B站 弹幕 字幕 AMV MAD MTV ANIME 动漫 动漫音乐 游戏 游戏解说 ACG galgame 动画 番组 新番 初音 洛天依 vocaloid' 2878725 > 2878725fl

1	grep -v 'B站弹幕字幕 AMV MAD MTV ANIME 动漫动漫音乐游戏游戏解说 ACG galgame 动画番组新番初音洛天依 vocaloid' 2878725 > 2878725fl

2878725fl大约比2878725小一半…….

下一步之前还要把2878725fl改个名叫input

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import gensim, logging, codecs
class MySentences(object):
    def __init__(self, dirname):
        self.dirname = dirname

    def __iter__(self):
        for line in codecs.open(self.dirname, 'r', 'utf-8'):
            yield line.split()

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
sentences = MySentences('input')
model = gensim.models.Word2Vec(sentences, size=400, workers=24, min_count=3)
model.save('w2v')
model.init_sims(replace=True)
model.save('w2v.trim')

#!/usr/bin/env python

# -*- coding: utf-8 -*-

import gensim, logging, codecs

class MySentences(object):

def __init__(self, dirname):

self.dirname = dirname

def __iter__(self):

for line in codecs.open(self.dirname, 'r', 'utf-8'):

yield line.split()

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

sentences = MySentences('input')

model = gensim.models.Word2Vec(sentences, size=400, workers=24, min_count=3)

model.save('w2v')

model.init_sims(replace=True)

model.save('w2v.trim')

由于init_sims能极大减少内存占用，但却会令模型不能继续训练，所以就开启前后分别保存了一份，以备不时之需。这里的Size取了400，400就是向量的维度。

8192维的耗时

training on 19973050 raw words took 1883.2s, 10036 trained words/s

1	training on 19973050 raw words took 1883.2s, 10036 trained words/s

400维的耗时

training on 19973050 raw words took 153.5s, 123083 trained words/s

1	training on 19973050 raw words took 153.5s, 123083 trained words/s

到准备重现UI时，发现32位的Python吃不了那么大的数据（1024维），去搞64bit的，所有包都要重新装….

http://www.lfd.uci.edu/~gohlke/pythonlibs/
上面这个网址有编译好的windows包可以直接用

举个栗子，下载numpy, 先把
numpy-1.9.2+mkl-cp27-none-win_amd64.whl下下来
然后

C:\Users\CHL\Downloads>pip install "numpy-1.9.2+mkl-cp27-none-win_amd64.whl"
Unpacking c:\users\chl\downloads\numpy-1.9.2+mkl-cp27-none-win_amd64.whl
Installing collected packages: numpy
Successfully installed numpy
Cleaning up...

C:\Users\CHL\Downloads>

C:\Users\CHL\Downloads>pip install "numpy-1.9.2+mkl-cp27-none-win_amd64.whl"

Unpacking c:\users\chl\downloads\numpy-1.9.2+mkl-cp27-none-win_amd64.whl

Installing collected packages: numpy

Successfully installed numpy

Cleaning up...

C:\Users\CHL\Downloads>

搞定，然后就Scipy和Gensim，同理。

今天中大东校区IPv6废了，网速直线下降（我一直挂着v6跑）

下面就是TheB跑出来的结果截图

相似查询概念很简单，就是找出向量距离近（相关度高）的结果

类比查询则比较有意思，一个简单的例子就是，queen对于girl相当于king对于boy。运算: boy=girl-queen+king。

TheB类比查询截图6 — TheB类比查询，MC和敖厂长的关系，相当于GTA5和老E等人的关系…

TheB类比查询截图7 — TheB类比查询，MC和敖厂长的关系，相当于元首和张全蛋等人的关系….

TheB类比查询截图8 — TheB类比查询，老E和敖厂长的关系，相当于吃素的狮子和伊丽莎白鼠的关系….

TheB就是这样了

我的PDF其实是用word生成的,为了在Word中加入高光,可以用以下这个网址
https://tohtml.com/python/
贴过来后将字体换成Consolas, 完美.

pdf

MFC会在窗口显示前调用OnSize

窗口未显示，控件未生成。所以如果在OnSize中使用了GetDlgItem, 要注意返回是否为空指针。似乎在64位Win7上，程序会吞下这个异常，而在32位Win7，程序将会在启动时崩溃。

在Ubuntu 14.04 64bit 上装 ns-3.23

首先准备环境

sudo apt-get install gcc g++ python python-dev qt4-dev-tools mercurial bzr cmake libc6-dev libc6-dev-i386 g++-multilib gdb valgrind gsl-bin libgsl0-dev libgsl0ldbl flex bison libfl-dev tcpdump sqlite sqlite3 libsqlite3-dev libxml2 libxml2-dev libgtk2.0-0 libgtk2.0-dev vtun lxc libboost-signals-dev libboost-filesystem-dev

sudo apt-get install gcc g++ python python-dev qt4-dev-tools mercurial bzr cmake libc6-dev libc6-dev-i386 g++-multilib gdb valgrind gsl-bin libgsl0-dev libgsl0ldbl flex bison libfl-dev tcpdump sqlite sqlite3 libsqlite3-dev libxml2 libxml2-dev libgtk2.0-0 libgtk2.0-dev vtun lxc libboost-signals-dev libboost-filesystem-dev

然后Build

./build.py --enable-examples --enable-tests

1	./build.py --enable-examples --enable-tests

然后（可选）waf设置参数

./waf clean

1	./waf clean

上面这句话会清空前面的生成

./waf --build-profile=optimized --enable-examples --enable-tests configure

1	./waf --build-profile=optimized --enable-examples --enable-tests configure

配置生成，以生成optimized的，带有examples和tests的ns-3

./waf

./waf

生成。

比urandom快到不知道那里去的覆盖磁盘用随机数生成器

openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" < /dev/zero > clean

1	openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null \| base64)" < /dev/zero > clean

Source

去掉了-nosalt

Category: Uncategorized