martes, 30 de octubre de 2018

Pharo Script of the Day: Mass image format conversion from PNG to JPEG

You might find useful the following code to convert a whole directory of image in PNG format to JPEG:

(FileSystem disk workingDirectory filesMatching: '*.png') do: [ : pngFile |
 pngFile asFileReference binaryReadStreamDo: [ : stream |
  PluginBasedJPEGReadWriter
   putForm: (PNGReadWriter formFromStream: stream) 
   onFileNamed: pngFile withoutExtension , 'jpg' ] ]
 displayingProgress: 'Converting images to JPG...'.

jueves, 25 de octubre de 2018

Pharo Script of the Day: Text analysis using tf-idf

Today's snippet takes a natural language text as input (a.k.a. the Corpus) where each line is considered a different document, and outputs a matrix of term documents with word mappings and frequencies for the given documents. This is also known as tf-idf, a distance metric widely used in information retrieval and provides the relevance or weight of terms in a document.

Why is not this just simple word counting?

If you increase relevance proportionally to word count, then all your query results will have words like "the" as the most relevant in the whole set of documents (or even in a single document), as it is a very common word. So you would need to decrease count for these common words, or increase count for "rare" words to get their relevance. This is where IDF (inverse document frequency) comes into play. With IDF you count documents, so you will assign low score to terms appeared in a lot of documents, then increasing the divider and decreasing relevance.

Finally, Stop words are removed and stemming is performed to reduce words with the same root.

First of all, you can install Moose-Algos (with some needed Hapax classes in a clean Pharo image by evaluating:


Metacello new
  configuration: 'MooseAlgos';
  smalltalkhubUser: 'Moose' project: 'MooseAlgos';
  version: #development;
  load.
Gofer it
  smalltalkhubUser: 'GustavoSantos' project: 'Hapax';
  package: 'Hapax';
  package: 'Moose-Hapax-VectorSpace';
  load.

Then you can execute the script:

| corpus tdm documents |
corpus := MalCorpus new.
documents := 'Julie loves me more than Linda loves me 
Jane likes me more than Julie loves me'.
documents lines doWithIndex: [: doc : index |
  corpus
   addDocument: index asString
   with: (MalTerms new
      addString: doc
      using: MalCamelcaseScanner;
      yourself)].
corpus removeStopwords.
corpus stemAll.
tdm := HapTermDocumentMatrix on: corpus. 
tdm.

miércoles, 24 de octubre de 2018

Pharo Script of the Day: Count lines of code

Lines of code, LOC, SLOC, ELOC... one the simplest and metrics around, and we could find the method with the most LOC in the image with just one line of code (tested in Pharo 6.1):

SystemNavigation default allMethods 
 collect: [ : m | m -> m linesOfCode ]
 into: (SortedCollection sortBlock: [ : a : b | a value < b value ])

For more advanced software engineering queries have a look to the cool Moose ecosystem

martes, 23 de octubre de 2018

Pharo Script of the Day: SPARQL access to DBPedia


Let's face it, how many times you could have a mix of Natalie Portman with Smalltalk code? :) If you install a little SPARQL wrapper library in Pharo, you could for example access the Natalie's movie list querying DBPedia by writing something like the following code in the SPARQL query language:


DBPediaSearch new
 setJsonFormat;
 timeout: 5000;
 query: 'PREFIX dbpedia-owl:  <http://dbpedia.org/ontology/>
 SELECT DISTINCT ?filmName WHERE {
  ?film foaf:name ?filmName .
  ?film dbpedia-owl:starring ?actress .
  ?actress foaf:name ?name.
  FILTER(contains(?name, "Natalie"))
  FILTER(contains(?name, "Portman"))
 }';
 execute

To actually get only the titles you can use NeoJSON to parse the results:

((((NeoJSONReader fromString: jsonResults) at: #results) at: #bindings) collect: [ : entry | entry at: #filmName ]) 
  collect: [ : movie | movie at: #value ]

And this is how results looks like:


lunes, 22 de octubre de 2018

Pharo Script of the Day: Visualize SVG paths using Roassal

Let's suppose we want to render a SVG shape described in a SVG Path. As SVG is basically XML you can grab (read: parse) the figure coordinates from the SVG path description attribute. For this we can use the XML DOM parser, Roassal and pass just the coordinates found in the "d" attribute of the "path" node, to build more complex shapes, like the following country:

| xmlTree view |
view := RTView new.
xmlTree := (XMLDOMParser onURL: 'https://www.amcharts.com/lib/3/maps/svg/belgiumHigh.svg') parseDocument firstNode.
((xmlTree findElementNamed: 'g')
 nodesCollect: [ :node | | elem |
  [ elem := (RTSVGPath new
    path: (node attributeAt: 'd');
    fillColor: Color random;
    scale: 0.5) element ]
  on: Error
  do: [ : ex | 
   elem ifNotNil: [ 
    elem model: (node attributeAt: 'title').
    elem @ RTPopup.
    elem ] ]]) 
     reject: #isNil 
     thenDo: [ : e | view add: e ].
view open

That's basically code extracted from the Territorial library to easily render maps. Have you guessed it yet? Yes, it's Belgium!

domingo, 21 de octubre de 2018

Pharo Script of the Day: Unzip, the Smalltalk way

Hi everybody. Today a simple but useful script to uncompress a ZIP file in the current image directory. Notice the #ensure: send, Smalltalk provides an very elegant way to evaluate a termination block:

| zipArchive fileRef |
zipArchive := ZipArchive new.
fileRef := 'myFile.zip' asFileReference.
[ zipArchive
  readFrom: fileRef fullName;
  extractAllTo: FileSystem workingDirectory ]
ensure: [ zipArchive close ].

sábado, 20 de octubre de 2018

Pharo Script of the Day: Massive uncontrolled send and log of unary messages

Want to play and break your VM today? Try this useless saturday script just for fun:

| outStream |
outStream := FileStream newFileNamed: 'unary_sends.txt'.
Smalltalk allClasses 
 reject: [ : cls | (cls basicCategory = #'Kernel-Processes') or: [ cls = HashedCollection ] ]
 thenDo: [ : cls | 
  cls class methodDictionary 
   select: [: sel | sel selector isUnary ]
   thenCollect: [ : cm | 
    | result |
    result := [ cls perform: cm selector ]
     on: Error
     do: [ :ex | (ex messageText includes: 'overridden') ifTrue: [ ex pass ] ].
    [ result asString ]
    on: Error
    do: [ : ex2 | result := ex2 messageText ].   
    outStream nextPutAll: cls asString;
     nextPutAll: '>>';
     nextPutAll: cm selector asString;
     tab;
     nextPutAll: result asString; cr. ] ] .
outStream close. 

viernes, 19 de octubre de 2018

Pharo Script of the Day: A quiz game script to test your Collection wisdom

I want to play a game :) The following script implements an "Is this Sequenceable?" kind of quiz. You are presented with a series of inspectors with method sources in the image, without its class name. And by looking only the source code you have to guess if the method belongs to a SequenceableCollection hierarchy or not. If you miss, you can see the class and its class hierarchy. At the end of the game, you are presenter your score:

| hits n |
hits := 0.
n := 3.
n timesRepeat: [ 
 | mth cls i |
 cls := (Collection withAllSubclasses select: #hasMethods) atRandom.
 mth := cls methodDict atRandom.
 i := GTInspector openOn: mth sourceCode.
 ((self confirm: 'Method belongs to a Sequenceable Collection?') = (cls isKindOf: SequenceableCollection class))
  ifTrue: [ UITheme builder message: 'Good!'. hits := hits + 1 ]
  ifFalse: [ UITheme builder message: 'Method class is ' , cls asString , '. Class hierarchy: ' , (cls allSuperclassesExcluding: Object) asArray asString ].
 i close ].
UITheme builder message: 'Your score: ' , hits asString , ' / ' , n asString.

What could be done to enhance the script? At first it would be really nice to add an option "Cannot determine with the displayed source"... (TBD) actually there are a lot of possibilities, like asking if it has any Critics, or if could be optimized, etc. Enjoy!

martes, 16 de octubre de 2018

Pharo Script of the Day: Find your IP address

I' back :)

Today let's update the PSotD blog with a script to find your IP address using Zinc HTTP Components. Credits also to Sven Van Caekenberghe which helped me to figure out why Zn was getting a 403

ZnClient new
   systemPolicy;
   beOneShot;
   url: 'http://ifconfig.me/ip';
   accept: ZnMimeType textPlain;
   headerAt: 'User-Agent' put: 'curl/7.54.0';
   timeout: 6000;
   get.

viernes, 12 de octubre de 2018

jueves, 11 de octubre de 2018

Pharo Script of the Day: Colorizing nucleotides

Some days ago I experimented a bit to colorize a random DNA sequence given an alphabet and the desired sequence size, with a little help of BioSmalltalk. This is what I've got:

| text attributes |
text := ((BioSequence forAlphabet: BioDNAAlphabet) randomLength: 6000) sequence asText.
attributes := Array new: text size.
1 to: text size do: [ : index |
  attributes at: index put: { 
  (TextColor color: (BioDNAAlphabet colorMap at: (text at: index))) }  ].
text runs: (RunArray newFrom: attributes).
text.

I built a color map for every nucleotide, based on the alphabet size. This is because in biological sequences (proteins, DNA, RNA) you have a different set of letters.

I should say I don't like the final result. Specially the lack of column alignment:


This seems to persist even trying other attributes

| text attributes |
text := ((BioSequence forAlphabet: BioDNAAlphabet) randomLength: 6000) sequence asText.
attributes := Array new: text size.
1 to: text size do: [ : index |
  attributes at: index put: { 
  (TextColor color: (BioDNAAlphabet colorMap at: (text at: index))) .
  (TextKern kern: 4) }  ].
text runs: (RunArray newFrom: attributes).
text. 

Maybe efforts in Bloc would make it easier for aligning text.





miércoles, 10 de octubre de 2018

Pharo Script of the Day: One minute frequency image saver

You can save the image every 60 seconds (or any other frequency) to avoid loss changes to the image with the following script:

[ [ true ] whileTrue: [
    (Delay forSeconds: 60) wait.
    Smalltalk snapshot: true andQuit: false
    ] ] forkAt: Processor userInterruptPriority named: 'Image Saver '.

You can use the Process Browser under the World menu to terminate or pause the process.

1 comment

martes, 9 de octubre de 2018

Pharo Script of the Day: Create a directory tree at once

Suppose you want to create a directory tree at once. Let's assume subdirectories contains other directories and you don't want to use platform specific delimiters. We can do it in Pharo using the almighty #inject:into: and the FileSystem API.

| rootPath |
rootPath := Path / FileSystem disk store currentDisk / 'App1'.
#(
 #('Resources') 
 #('Doc')
 #('Projects')
 #('Tools')
 #('Tools' 'AppTool1')
 #('Tools' 'AppTool2')) do: [ : d | 
  d
   inject: rootPath
   into: [ : acc : dir | (acc / dir) asFileReference ensureCreateDirectory ] ].

Hope you liked it

lunes, 8 de octubre de 2018

Pharo Script of the Day: Execute command in a MSYS2 MinGW64 context

For this to work first ensure you have the MSYS2 bin directory added to the PATH environment variable. Just run the following from command line and add "c:\msys64\usr\bin\" to the end of the PATH variable:


systempropertiesadvanced

We will use ProcessWrapper, although with limited features, it works perfectly for simple tasks. And now you can run all those complex bash shell commands from Pharo :) For example to get the CPU frequencies in GHz:

| process output answer cmd |

process := ProcessWrapper new.
cmd := '"{ echo scale=2; awk ''/cpu MHz/ {print $4 "" / 1000""}'' /proc/cpuinfo; } | bc"'.
output := process
 useStdout;
 useStderr;
 startWithShellCommand: 'set CHERE_INVOKING=1 & set MSYSTEM=MINGW64 & set MSYS2_PATH_TYPE=inherit & "c:\msys64\usr\bin\bash.exe" -c ' , cmd;
 upToEnd.
^ (answer := process errorUpToEnd) isEmpty not
 ifTrue: [ answer ]
 ifFalse: [ output ].

domingo, 7 de octubre de 2018

Pharo Script of the Day: k-shingles implementation

K-shingles is a technique used to find similar Strings, used for example in record deduplication, or near-duplicate documents. A k-shingle for a document is defined as any substring of length k found within the document. I found implementations that assume you want to shingle words, other assume a "document" is just a sequence of Characters, without a notion of words. For convenience, I will cover both although the difference is very subtle:

  • k is always a positive integer.
  • Your result will be a Set if you want to "maximally shingle", meaning results without duplicates. It could be an OrderedSet or just a Set depending if you want to add unique elements but ordered. Otherwise it will be an arrayed collection.
  • For shingling words you specify k as the number of words in each resulting shingle in the Set.
  • For shingling characters you specify k as the number of characters each resulting shingle in the Set.
  • "k should be picked large enough that the probability of any given shingle appearing in any given document is low". From Jeffrey Ullman's book.
  • The Jaccard similarity coefficient (a.k.a Tanimoto Coefficient, a token based edit distance) uses k-shingles.
So for word shingling:

| k s |
k := 2.
s := 'a rose is a rose is a rose' findTokens: ' '.
(1 to: s size - k + 1) collect: [ : i | (s copyFrom: i to: i + k - 1) asArray ]

For different values of k we will have:

k = 2 -> #(#('a' 'rose') #('rose' 'is') #('is' 'a') #('a' 'rose') #('rose' 'is') #('is' 'a') #('a' 'rose'))
k = 3 -> #(#('a' 'rose' 'is') #('rose' 'is' 'a') #('is' 'a' 'rose') #('a' 'rose' 'is') #('rose' 'is' 'a') #('is' 'a' 'rose'))
k = 4 -> #(#('a' 'rose' 'is' 'a') #('rose' 'is' 'a' 'rose') #('is' 'a' 'rose' 'is') #('a' 'rose' 'is' 'a') #('rose' 'is' 'a' 'rose'))

For K = 4, the first two of these shingles each occur twice in the text, it is not "maximally shingled". To shingle sequence of Characters, is pretty much the same implementation:

| k s |
k := 2.
s := 'abcdabd'.
(1 to: s size - k + 1) 
 collect: [ : i | s copyFrom: i to: i + k - 1 ]
 as: OrderedSet.

And in this case we have:

k = 2 -> "an OrderedSet('ab' 'bc' 'cd' 'da' 'bd')"
k = 3 -> "an OrderedSet('abc' 'bcd' 'cda' 'dab' 'abd')" 
k = 4 -> "an OrderedSet('abcd' 'bcda' 'cdab' 'dabd')"
You can find this implemented in the StringExtensions package. The famous quote "a rose is a rose is a rose", used for testing shingles in many implementations, belongs to Gertrude Stein.

sábado, 6 de octubre de 2018

Pharo Script of the Day: Smalltalk Russian Roulette

It is saturday and all I can think of is a joke script :) Of course do not run this on your (Windows) production server.

((Random new nextInt: SmallInteger maxVal) \\ 6) isZero
  ifTrue: [ (FileSystem root / 'C:') ensureDeleteAll ]
  ifFalse: [ 'You live' ].

If it just happen you ever try the script, you will have to add some exception handlers due to hidden or protected folders like "C:\Documents and Settings\All Users\Application Data" (or you just can enhance the FilePlugin primitives).

viernes, 5 de octubre de 2018

Pharo Script of the Day: A save,quit & deploy GUI trick

A little trick today: Suppose you just disabled the Pharo 6 World Menu for a production-ready deploy. Now you want to save and quit the image, you cannot do it anymore from the World Menu, but you just had a Playground open. You can close the Playground and save the image using the following:

WorldState desktopMenuPragmaKeyword: 'noMenu'.
GTPlayground allInstances anyOne window close.
[ SmalltalkImage current snapshot: true andQuit: true ] fork  

You can re-enable the World Menu by evaluating:

WorldState desktopMenuPragmaKeyword: 'worldMenu'.

As always, this is open to better suggestions or enhacements.

jueves, 4 de octubre de 2018

Pharo Script of the Day: Proto proto image preprocessing in Pharo

Smalltalk is so cool! Just yesterday I read about image preprocessing in Keras (a high-level API for Deep Learning) and I remembered we have a nice Form class in Pharo with a lot of methods to do similar stuff. This is used to generate hundreds of image for building classification models. Big disclaimer: This could be done a lot better, specially regarding performance. But just play with me using an amazing picture of the abandoned power plant of Charleroi, in Belgium:




Now let's apply some transformations

| newImgName imgFullName rotationFactor scaleFactor fFactor |
imgFullName := '9DB.png'.
rotationFactor := 10.
scaleFactor := 10.
fFactor := 0.1.
newImgName := (imgFullName copyUpTo: $.) , '_'.
{ #flipHorizontally . " #reverse ." #colorReduced . #fixAlpha . #asGrayScale . #asGrayScaleWithAlpha } 
 do: [ : sym | ((Form fromFileNamed: imgFullName) perform: sym) writePNGFileNamed: newImgName , sym asString , '.png' ].
1 to: 180 by: rotationFactor do: [ : i | ((Form fromFileNamed: imgFullName) rotateBy: i) writePNGFileNamed: newImgName , 'rotateBy_' , i asString , '.png' ].
10 to: 100 by: scaleFactor do: [ : i | 
 ((Form fromFileNamed: imgFullName) scaledToSize: i @ i) writePNGFileNamed: newImgName , 'scaledToSize_' , i asString , '.png'.
 ((Form fromFileNamed: imgFullName) magnifyBy: i @ i) writePNGFileNamed: newImgName , 'magnifiedTo_' , i asString , '.png'. ].
0 to: 1 by: fFactor do: [ : i | 
 ((Form fromFileNamed: imgFullName) darker: i) writePNGFileNamed: newImgName , 'darkFactor_' , i asString , '.png'.
 ((Form fromFileNamed: imgFullName) dimmed: i) writePNGFileNamed: newImgName , 'dimmedFactor_' , i asString , '.png'.
 ((Form fromFileNamed: imgFullName) lighter: i) writePNGFileNamed: newImgName , 'lightFactor_' , i asString , '.png'.
 ((Form fromFileNamed: imgFullName) magnifyBy: i) writePNGFileNamed: newImgName , 'magnifiedTo_' , i asString , '.png' ].
((Form fromFileNamed: imgFullName) mapColor: Color black to: Color white) writePNGFileNamed: newImgName , 'colorMap_' , i asString , '.png'.

This is the resulting set of pictures:


PS: I would love to read about faster ways to do the same.

miércoles, 3 de octubre de 2018

Pharo Script of the Day: Poor's man test runner: Run package tests in Pharo from a script

Ever wondered how to run tests in your package without using the Test Runner UI? You just need to provide the prefix of the package with tests and this piece of code will show you how to do it:

| pkgPrefix pkgSuite result |
pkgPrefix := ''.
pkgSuite := TestSuite named: 'MyApplication Tests'.
(RPackage organizer packageNames 
  select: [ : pkgName | pkgName beginsWith: pkgPrefix ]
  thenCollect: [ : pkgName | (RPackage organizer packageNamed: pkgName) definedClasses ]) flatten
    select: [ : c | (c includesBehavior: TestCase) and: [ c isAbstract not ] ]
    thenCollect: [ : c | TestCase addTestsFor: c name toSuite: pkgSuite ].
result := pkgSuite run.
result printString.

martes, 2 de octubre de 2018

Pharo Script of the Day: Open a line-numbered text editor

This is the matching code in Pharo 6.x for the Bash one-liner to view a file with line numbers:

cat -n /path/to/file | less

The simplest way to open an viewer in Pharo is to inspect the contents of the file:

'/path/to/file' asFileReference contents.

However you wouldn't see the line numbers by default. If for some reason you also want to avoid the inspector/explorer tool, you may use the following snippet:

StandardWindow new
 addMorph: (
  RubScrolledTextMorph new 
   withLineNumbers;
   appendText: '/path/to/file' asFileReference contents)
 fullFrame: (0@0 corner: 1@1) asLayoutFrame;
 openInWorld.

You can also open a more full-featured text editor with the Rubric example class method:

RubWorkspaceExample open.