nunojob:~ dscape/08$ echo The Black Sheep

Posts tagged ‘Database’

IBM DB2 Express-C em versão mac

DB2 for Mac

DB2 for Mac

É oficial. A versão GRÁTIS do DB2 está disponivel para download para mac.

Acabaram-se as desculpas do não quero outra maquina virtual para correr isso, nem sequer experimento.

Eu sei que sou suspeito para falar já que faço parte da equipa DB2. A análise que vou aqui fazer é muito influenciada pelo meu dia a dia no trabalho mas o que escrevo aqui é a minha opinião pessoal .

A IBM não trabalha no DB2 para pessoas como nós que têm uns sites jeitosos com alguns milhares de hits diários (com sorte). Eles fazem isto para aguentar soluções de escala gigante, algumas com standards pesados em  XML de agências governamentais, financeiras, health-care, etc, que transaccionam quantidades enormes de informação diáriamente. Essas empresas não só tem que minar os dados como fazer queries sobre eles de uma forma bastante intensiva. Estou a falar das maiores empresas americanas, e não o digo decor. Ouvi-o da boca de DBAs da Merrill Lynch, Barclays, ONU, Morgan Stanley, etc.. Que tem eles em comum? Todos eles usam DB2 e estão interessados em usar as funcionalidades XML do produto.

Já agora ninguém confia que seja possivel ter performance em XML certo? Bem a IBM tem pessoas inteligentes (como eu, lol) a trabalhar em tornar isso possivel. Deixo este link para vos aguçar o apetite. Claro que a performance não será a mesma que SQL mas comparado com os parsers xml que andam a usar… eheh. Exprimentem. :P

Como já descrevi o cliente normal do DB2 é facil constactar que não é feito para vender a José, ao Joaquim. Nem sequer a pequena empresa da Josefina. A versão Express-C é gratís para todos por isso mesmo. As limitações são um máximo de 16Gb de ram e 4 processadores na maquina.

Se isto parece razoavel:

DB2 for Mac Download

DB2 for Mac Download

Depois contem como correu e se precisarem de umas dicas podem sempre entrar em contacto.

Footnote: Para os interessados se estão a desenvolver algo com um standard xml estranho  a probabilidade desse standard ser suportado pela ibm é grande e pode ser consultado aqui.

Mondrian Multidimensional K-Anonymity in Ruby

Article: Mondrian Multidimensional K-Anonymity

Lame Ruby Implementation:

# ==================================================================================
# anonymization: group.rb
# ==================================================================================
ENVIRONMENT = ‘release’ #’release’

require ‘set’
require ‘rubygems’
require ‘ruby-debug’ if ENVIRONMENT == ‘debug’

# ==================================================================================
# class group
#
# usage:
# require ‘group’
#
# g = Group.new ,
# g.anonymize
#
# example:
#
# lefevre.db
#
# 0 2 < -- quasi_ids # # |age| sex | zipc | disease | #---+---+-------+------+--------------+-- # 0 | 25 Male 53711 Flu | # 1 | 25 Female 53712 Hepatitis | # 2 | 26 Male 53711 Bronchitis | # 3 | 27 Male 53710 Broken_Arm | # 4 | 27 Female 53712 AIDS | # 5 | 28 Male 53711 Hang_Nail | #---+---+-------+------+--------------+-- # # irb # >> require ‘group’
# >> g = Group.new [0,2], ‘lefevre.db’
# >> g.anonymize 2, ‘degen’
# ==================================================================================
class Group
# create a setter method for @tuples, @filename
# so that g.tuples = x works
attr_writer :tuples, :filename

@@debug = { ‘best_attribute’ => ENVIRONMENT == ‘debug’,
‘intersection’ => ENVIRONMENT == ‘debug’,
‘split’ => ENVIRONMENT == ‘debug’,
‘ordering’ => ENVIRONMENT == ‘debug’,
‘vars’ => ENVIRONMENT == ‘debug’,
‘args’ => ENVIRONMENT == ‘debug’
}
# ================================================================================
# to create a new group with Group.new
# ================================================================================
# needs to remove the full_ids from the read.
def initialize(quasi_ids, filename, depth=0, available_ids=nil)
# if no valid attributes are given quasi are used
available_ids = quasi_ids if available_ids.nil?

# initialize the instance vars
@tuples = []
@quasi_ids = quasi_ids
@available_ids = available_ids
@depth = depth

# serves as wilcard so that no file is read on recursion
filename == ‘*wc’ ? @filename = nil : @filename = filename

if @@debug[‘args’] and @depth == 0
debug_puts “args : file => #{@filename}”
debug_puts “args : k => #{@k}”
debug_puts “args : quasi_ids => #{@quasi_ids.to_s}”
end

# run the read and backup procedures
read
end

# ================================================================================
# anonymization
# ================================================================================
def anonymize(k, heuristic=’degen’, partial_order=[])

if @@debug[‘vars’]
#debug_puts “dvars : @tuples #{@tuples}”
debug_puts “dvars : @available_ids #{@available_ids},”
debug_puts “dvars : @depth #{@depth}”
end

# stop case
if isnt_splittable? k
debug_puts “dsplit: no split available for k-level #{k} with size” +
” #{@tuples.size}” if @@debug[‘split’]

# sort and generalize remaining attributes
@available_ids.each do |attribute|
sort attribute
generalize attribute
end

# exit
return
end

# where and in what attribute should we split
# these functions have a heavy effect on the usefulness of the information
# for the k-anonymity table
split_attribute = find_split_attribute @available_ids, heuristic, partial_order
split_pos = find_split_position split_attribute

# create the groups for the
# recursion
group1 = Group.new @quasi_ids, ‘*wc’, @depth + 1, @available_ids.clone
group2 = Group.new @quasi_ids, ‘*wc’, @depth + 1, @available_ids.clone

# split at the given position
split split_pos, group1, group2

if split_groups_satisfy_k_anonymity?(k,group1,group2)

debug_puts “dsplit: no more split available with attribute” +
” #{split_attribute} (g1: #{group1.size}, g2: #{group2.size})” if @@debug[‘split’]

# generalize by split_attribute and then remove it from the available
# attributes array
generalize split_attribute
@available_ids.delete split_attribute

# anonymize remaining available attributes
anonymize k, heuristic, partial_order

else # splitting successful
debug_puts “dsplit: splitting on attribute #{split_attribute} at” +
” position #{split_pos} of #{@tuples.size}” if @@debug[‘split’]

# assign the two groups to this instance
@group1 = group1
@group2 = group2

group1.anonymize k, heuristic, partial_order
group2.anonymize k, heuristic, partial_order

#@tuples = []
end
end

# ================================================================================
# io and backup related
# ================================================================================
# read @tuples from @filename
def read
unless @filename.nil?
f = File.open @filename
f.each_line do |line|
@tuples < < line.rstrip.split("\t\t") end f.close end end # reset the class to reuse def reset @available_ids = @originally_available_ids @tuples = [] read end # ================================================================================ # overrides # ================================================================================ # number of tuples def size @tuples.size end # ================================================================================ # aux # ================================================================================ # to_s def to_s str = "" unless @tuples.empty? @tuples.each do |line| @tuples[0].size.times { |i| str << line[i].to_s + "\t\t"} str << "\n" end end str end # shows a yaml representation of internal object def to_y require 'yaml' y self end private def debug_puts(message) ident='' @depth.times {|i| ident+=" "} puts ident + message end # ================================================================================ # aux for anonymization # ================================================================================ # finds the attribute with the largest range. According to LeFevre this is a good # heuristic to find the attribute on def find_split_attribute(attributes_list, heuristic, partial_order) debug_puts "dorder: choosing from" + " #{attributes_list.to_s}" if @@debug['ordering'] best_attrib = -1 best_attrib_count = 0.0 attributes_list = find_minimal_elements partial_order, attributes_list debug_puts "dorder: minimal list is" + " #{attributes_list.to_s}" if @@debug['ordering'] attributes_list.each do |attribute| values = @tuples.map{|t| t[attribute]}.to_set # degen heuristic: split on the attribute that had more degeneracy if heuristic == 'degen' if values.size < best_attrib_count or best_attrib == -1 best_attrib = attribute best_attrib_count = @tuples.size.to_f / values.size.to_f end elsif heuristic == 'single' if values.size < best_attrib_count or best_attrib == -1 best_attrib = attribute best_attrib_count = values.size end else #default if values.size > best_attrib_count
best_attrib = attribute
best_attrib_count = values.size
end
end
end

debug_puts “dbest : best atribute is #{best_attrib} with” +
” count #{best_attrib_count}” if @@debug[‘best_attribute’]

return best_attrib
end

# returns the position of the leftmost or rightmost median element.
# used to split in lhs and rhs
def find_split_position(attribute_id)
sort attribute_id

median_pos = @tuples.size / 2
median = @tuples[median_pos][attribute_id]

split_pos_high = median_pos
split_pos_low = median_pos

# split point correspond to highest index that has median value
split_pos_high += 1 while (@tuples.size >= split_pos_high + 2) and
(@tuples[split_pos_high + 1][attribute_id] == median)

high_smaller_group_size =
[split_pos_high + 1, @tuples.size – split_pos_high – 1].min

# split point correspond to lowest index that has median value
split_pos_low -= 1 while (split_pos_low > 1) and
(@tuples[split_pos_low – 1][attribute_id] == median)

low_smaller_group_size =
[split_pos_low, @tuples.size – split_pos_low].min

# choose the one with the largest group
if high_smaller_group_size > low_smaller_group_size
split_pos = split_pos_high
else
split_pos = split_pos_low – 1
end

return split_pos
end

# finds minimal elements from the list of the given attribute list according to
# partial order specified in partial_order. partial_order contains all complete chains.
def find_minimal_elements(partial_order, possible_elements)

if partial_order.empty?
debug_puts “dorder: no ordering specified” if @@debug[‘ordering’]

return possible_elements
end

# choose all possible_elements that arent in partial_order
# those are minimal
minimal_list = possible_elements.select { |element| !partial_order.flatten.member?(element) }

# haskell goodies ^^
# restrict partial_order to values in possible_elements
restricted_partial_order = partial_order.map { |l| l.select { |element| possible_elements.member?(element) } }

if @@debug[‘ordering’]
debug_puts “dorder: possible_elements list is” +
” #{possible_elements.to_s}”
debug_puts “dorder: partial_order list is” +
” #{partial_order.to_s}”
debug_puts “dorder: restricted_partial_order is” +
” #{restricted_partial_order.to_s}”
end

non_zero_chains = restricted_partial_order.select { |chain| not chain.empty? }

non_zero_chains.each do |c|
candidate = c[0]

minimal = !restricted_partial_order.any? do |chain|
chain.member?(candidate) and chain[0] != candidate
end

if minimal and not minimal_list.member?(candidate)
minimal_list << candidate end end return minimal_list end # replaces attribute value with generalization that cover all tuples. # Expects tuples to be sorted by attribute. def generalize(attribute) min_val = @tuples[0][attribute] max_val = @tuples[-1][attribute] unless min_val == max_val @tuples.each do |t| t[attribute] = [min_val, max_val] end end end def split(split_pos, group1, group2) group1.tuples = @tuples[0..split_pos] group2.tuples = @tuples[split_pos+1..@tuples.size] end def sort(attribute) @tuples = @tuples.sort_by { |t| t[attribute] } end # ================================================================================ # verbose conditions # ================================================================================ def isnt_splittable?(k) k < 2 or group_cant_be_split_for_level?(k) or no_split_attributes_are_available? end def group_cant_be_split_for_level?(k) @tuples.size < 2*k end def no_split_attributes_are_available? @available_ids.empty? end def split_groups_satisfy_k_anonymity?(k,group1,group2) group1.size < k or group2.size < k end end # hack on array to display lists correctly class Array def to_s "[" + self.join(',') + "]" end end [/sourcecode]

Ruby on Rails vs Java

Software Engineer at Critical Software?

Last Monday I attended an job interview at Critical Software. Had some troubles finding tecmaia facilities and got there 30 minutes late! (I know, need a GPS)

They started the interview with some general questions about my background and the work I did at Mobicomp. Then a more technical part, where I was to respond (like if it was an exam) to some questions about object-oriented design, database, Threading, Linux, C++, XML, UML, mySQL. Then a final part of the interview was reached, where my psychological strengths were measured, and I was able to speak for myself and tell them what I like to do.

This is the second interview I’d been in since summer break, the first was at Edigma.com, but that one did not went so well. It’s a shame, as I feel I would be a very good addition to that team. I know the responsibility was not entirely mine, as the interview was very bad. Had no structure, they didn’t take a single note about what I said, they didn’t have a script to follow. I could continue this list as I feel strongly disappointed with them. Don’t get me wrong what they do is cool, but the recruiting process is not.