Born Too Late

Ruby で BinaryTree (二分木) を実装してみた

2011-02-06 23:11:29

この記事は個人的なメモです.

を読んでいて, 構造体の章に出てきた BinaryTree がよくわからないので, とりあえず Ruby で実装してみた.
単語をキーに, その回数を数えるだけの単純なプログラム.

BinaryTree::Node クラス

ツリー構造のノードを表すクラス.
単語, 回数, そして左/右のノードをプロパティとして持ちます.

RSpec

せっかくなので BDD (振る舞い駆動開発) の練習.

実行するとこのようになります.

$ rspec binarytree_spec.rb --format doc
BinaryTree::Node
  with word "foo" and no children
    it should behave like Node with word "foo"
      word
        should == "foo"
    it should behave like Node with no children
      left
        should be nil
      right
        should be nil
    size
      should == 1
    count
      should == 1
    words
      should == ["foo"]
    count_all
      should == 1
  with word "foo" and inserted a node with word "bar"
    it should behave like Node with word "foo"
      word
        should == "foo"
    it should behave like Node with one child with no children
      size
        should == 2
    right
      should be nil
    words
      should == ["bar", "foo"]
    count_all
      should == 2
    its child on the left
      it should behave like Node with no children
        left
          should be nil
        right
          should be nil
      word
        should == "bar"
      size
        should == 1
      count
        should == 1
      count_all
        should == 1
  with word "foo" and inserted a node with word "hoge"
    it should behave like Node with word "foo"
      word
        should == "foo"
    it should behave like Node with one child with no children
      size
        should == 2
    left
      should be nil
    words
      should == ["foo", "hoge"]
    count_all
      should == 2
    its child on the right
      it should behave like Node with no children
        left
          should be nil
        right
          should be nil
      word
        should == "hoge"
      size
        should == 1
      count
        should == 1
      count_all
        should == 1
  with word "foo" and inserted a node with word "foo"
    it should behave like Node with word "foo"
      word
        should == "foo"
    it should behave like Node with no children
      left
        should be nil
      right
        should be nil
    count
      should == 2
    words
      should == ["foo"]
    count_all
      should == 2
  with word "foo" and inserted a node with word "bar" 3 times
    it should behave like Node with word "foo"
      word
        should == "foo"
    right
      should be nil
    count
      should == 1
    size
      should == 2
    words
      should == ["bar", "foo"]
    count_all
      should == 4
    its child on the left
      word
        should == "bar"
      size
        should == 1
      count
        should == 3
      count_all
        should == 3
  with word "foo" and inserted a node with word "hoge" 3 times
    it should behave like Node with word "foo"
      word
        should == "foo"
    left
      should be nil
    count
      should == 1
    size
      should == 2
    words
      should == ["foo", "hoge"]
    count_all
      should == 4
    its child on the right
      word
        should == "hoge"
      size
        should == 1
      count
        should == 3
      count_all
        should == 3
  with word "a" and inserted nodes with word "b" ~ "j"
    size
      should == 10
    words
      should == ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]

Finished in 0.03146 seconds
57 examples, 0 failures

いやー, RSpec って楽しいですね.
PHP にもこういうの欲しい.

実行してみる

"to be or not to be" というフレーズ中に含まれる単語と, その回数を計算します.
結果は以下のとおり.

$ ruby example.rb
be (2)
not (1)
or (1)
to (2)

4 words.
6 nodes.

今度は C/C++ でも実装してみよう.
そして BTree や B+Tree についても勉強してみよう.

というわけで今日はここまで.

PHP でオブジェクトの集合を扱うなら Array より SplObjectStorage

2011-02-06 22:42:24

1 年半ほど前の記事ですが.

A Set of Objects in PHP: Arrays vs. SplObjectStorage

この記事を簡単にまとめると, 以下のような感じです.

PHP でオブジェクトのユニークな集合を扱う方法には, 普通の Array を使う方法と, SplObjectStorage を使う方法がある.
Array を使う場合は, spl_object_hash() でオブジェクトのハッシュを取り, それをキーに, オブジェクトを値にしたハッシュ変数として実装できる.
SplObjectStorage は, PHP 5.2 以降に搭載された便利なコンテナクラスである.
SplObjectStorage は, オブジェクトをユニークに格納することを保証する.
SplObjectStorege は, C により実装されているので高速である. (ベンチマークでもそういう結果が出た)
メモリの使用量においても, SplObjectStorage の方が少なくて済む.

というわけで, 手製のベンチマーキングフレームワークを使って, 自分でも計測してみました.

コンテナへの挿入

コンテナに 10,000 のユニークなオブジェクトを挿入する, というのを 100 回ずつ繰り返し, 計測してみます.

結果は以下のとおり.

array():          2.8187828sec
SplObjectStorage: 2.5617448sec

割と近い数値ですが, SplObjectStorage の方が速いようです.

オブジェクトの存在確認

10,000 個のオブジェクトを格納したコンテナを生成し, 新たに作ったオブジェクトの存在確認を 10,000 回行う.
つまり, 「存在しない値」なので, 全体に対してチェックが走ることになります.
とはいっても, 何らかのデータ構造を利用するはずなので 10,000 回のチェックが走るわけでは無いと思いますが.

そして実行結果.

array():          0.0226268sec
SplObjectStorage: 0.0183608sec

こちらも SplObjectStorage に軍配.

コンテナのイテレーション

コンテナに 10,000 個のオブジェクトを格納し, それらを foreach でイテレーションする, というのを 100 回繰り返す.

結果は以下のとおり.

array():          0.6820778sec
SplObjectStorage: 1.8508288sec

これについては Array が圧倒的な速さを見せました !

まとめ

PHP でオブジェクトのユニークな集合を扱うコンテナは, SplObjectStorage が便利.
SplObjectStorage は格納/存在確認ともに速い.
ただし, イテレーションは Array の方が速い.
最終的にどっちを使うのが速いかはアプリケーションに依存する. (と思います)

個人的には, Observer Pattern の実装をするときに, Observer を格納するコンテナとしてよく利用します.
API もシンプルなので, まだ使ってない人も, 試してみてはいかがでしょう.

PHP 用ベンチマーキングフレームワーク Joshimane というのを作った

2011-02-06 20:02:33

あくまでもプロトタイプですが.

gist: 813246 - Joshimane: A Tiny PHP Benchmarking Framework... But it's just a prototype yet.- GitHub

プロトタイプなので, API は大幅に変わると思います.

使用イメージ

以下のようにして使用します.

これを実行すると, 以下のような結果が得られます.

$ php example.php
Job A: 0.52975106
Job B: 0.26065207

一応補足すると, Job A ではループ時に毎回 (つまり 100 回) count($range) が実行されるのに対して, Job B ではループの開始前に一度だけしか count($range) が実行されないので, これだけの差が出ます.

名前について

女子マネです.

なんかこう... 陸上部の女子マネージャが測ってくれているんだ ! っていう気にさせる名前にしたいけど, それを即思いつくだけの語彙を持たないので辛い.less than a minute ago via TweetDeckYuya Takeyama
yuya_takeyama

@yuya_takeyama そのまま Joshimane とかｗless than a minute ago via モバツイheavenshell
heavenshell

作成動機

PEAR の Benchmark はいい加減レガシー臭が...
$obj =& new Foo; とかやめたいですよね.
テスト対象のジョブ間の依存関係を無くしたい.
無名関数を使うことで, その中で生成した変数はローカル変数となり, Job A のせいで Job B がおかしなことに ! ということはありません. (多分)
10,000 回実行してその平均時間を取得... とかいうのはフレームワークにやらせたい.
ある程度繰り返し実行しないと, ベンチマーク結果として信頼できないでしょう.
これも無名関数で実現しています.

問題点

コンテキスト依存のベンチマークが取り辛い.
例えば既にバッチ処理を行うスクリプトがあって, その中でどこに時間がかかっているのか, というのを計測するのには向きません.
そういうときは PEAR の Benchmark をスクリプト中のあちこちに埋め込む方が簡単かもしれません.
しかし, Joshimane としては, 単体テスト性を意識してこうなっているので, これは気にしない方向です.

課題

ベンチマーク結果をより詳細なものにする.
今回は N 回実行時の総経過時間のみとなっていますが, 平均時間や, メモリ使用量なんかも取れるといいのではと思っています.
フォーマッタの実装.
ベンチマーク結果を見やすく整形してくれるフォーマッタがあると便利でしょう.
プラガブルな感じで, CSV や XML の出力にも対応するのが理想.
TestSuite 方式の実装.
要はテスト対象のジョブのグループ化です.
そのためには, ジョブごとに ID なり表題なりをつけて, 識別するための仕組みが必要でしょう.
ただし, これについては実装するか迷い中.
より実装をモダンに.
とりあえずは名前空間を使用して階層化しようと思っています.
普段の仕事が PHP 5.2 ということもあり, 構文等を思い出すのに時間がかかるので, プロトタイプの段階では使用しませんでした.
命名.
このままだと Openpear に登録する上ではさすがにアレだと思うので, もっと一般的な名前にする方向で.
Joshimane はあくまでもコードネームということで...
(良い名前をつけることの重要性はでも, 森田創さんとまつもとゆきひろさんに言及されていますね.)

気が向いたときにでも少しずつ開発を進めようと思います.