Difference between revisions of "Silme:Tutorial:Concepts"

From Braniecki's Wiki
Jump to navigation Jump to search
Line 1: Line 1:
 
Silme is built around several abstract concepts that allow the library to support any possible localization format, from DTD, GetText or XLIFF, to MySQL and SQLite, from JAR and normal directory to SVN, CVS or any other Revision Control System.
 
Silme is built around several abstract concepts that allow the library to support any possible localization format, from DTD, GetText or XLIFF, to MySQL and SQLite, from JAR and normal directory to SVN, CVS or any other Revision Control System.
In this section I will explain the basic concepts that will allow you to understand the architecture of the library.
+
In this section I will explain the basic concepts that will allow you to understand the architecture of the library. Of course it is just an introduction, Silme allows you to extend each class with your data, but here we will focus on the simplest cases.
  
 
== Objects ==
 
== Objects ==
 +
 +
=== Entity ===
  
 
Silme's most core and atom unit is Entity. Entity is a class that stores single pair of ID<-->VALUE in an abstract model. It is a representation of DTD's <pre><!ENTITY ID "VALUE"></pre>, Gettext's <pre>msgid "ID"\nmsgstr "VALUE"</pre>, MySQL's ID column and VALUE column in L10n table etc., etc...
 
Silme's most core and atom unit is Entity. Entity is a class that stores single pair of ID<-->VALUE in an abstract model. It is a representation of DTD's <pre><!ENTITY ID "VALUE"></pre>, Gettext's <pre>msgid "ID"\nmsgstr "VALUE"</pre>, MySQL's ID column and VALUE column in L10n table etc., etc...
  
 
It's very important to understand that you can serialize any localization list to use Entity as long as you can generate unique ID across one list and assign it a value.
 
It's very important to understand that you can serialize any localization list to use Entity as long as you can generate unique ID across one list and assign it a value.
 +
 +
=== EntityList ===
  
 
Group of Entity objects is stored as a EntityList object. EntityList is a list (in fact, a '''dict''' structure in Python) that stores list of Entities and nothing more. The easiest way to imagine it is a localization SQL table containing two columns - ID and VALUE. The single row is Entity, the whole table is EntityList.
 
Group of Entity objects is stored as a EntityList object. EntityList is a list (in fact, a '''dict''' structure in Python) that stores list of Entities and nothing more. The easiest way to imagine it is a localization SQL table containing two columns - ID and VALUE. The single row is Entity, the whole table is EntityList.
 +
 +
=== L10nObject ===
  
 
Above that, in some abstract sense, there is L10nObject class. L10nObject extends EntityList and is a representation of any L10n file. So beside of list of Entity objects it also contains Comment objects and normal Strings between them.
 
Above that, in some abstract sense, there is L10nObject class. L10nObject extends EntityList and is a representation of any L10n file. So beside of list of Entity objects it also contains Comment objects and normal Strings between them.
Line 40: Line 46:
 
String('\n\n')
 
String('\n\n')
 
</pre>
 
</pre>
 +
 +
L10nObject is more like a file, EntityList like a SQL table. You can get EntityList out of L10nObject or you can get EntityList out of a file directly if you don't want to use the other elements of the structure.
 +
 +
L10nObject stores whole content of the file and should always represent the full file, which means that dumping this structure back to the same format will produce identical file as a source one. In the middle you can operate, move, remove, add strings, comments and entities.
 +
 +
=== Object ===
 +
 +
Beside of L10nObject we have similar structure called Object. Object is used to store data about files that we cannot parse. If, for example, your application will be prepared to parse DTD/PO/Properties and will get HTML file or JPEG it will store it as an Object. Object has an ID and '''source''' properties. Not very useful but will allow us to build a full structure above it:
 +
 +
=== L10nPackage ===
 +
 +
L10nPackage is a representation of list of L10nObjects/Object/EntityLists and potentially other L10nPackages. In the file system world, the nearest similar thing is a directory. Directory can store DTD files, JPEG files, and other directories. Another similar structure is MySQL database which stores tables (EntityLists in our case).
 +
 +
=== Summary ===
 +
 +
That's all. Currently the scope of the library is to present all potential localization structures using those classes and build an API to operate on them easily.
 +
 +
== Diff module ==
 +
 +
Each and every of the objects - Entity, EntityList, L10nObject, Object, L10nPackage has it's mirror class in the Diff land. So in result we have EntityDiff, EntityListDiff, L10nObjectDiff, ObjectDiff, L10nPackageDiff.
 +
Diff module allows you to store a difference between two objects of the same type and apply it later. It's like a '''diff''' tool in Linux, beside that it is aware of the syntax of the files/structures and stores the diff in appropriate way. For example if a diff between two EntityLists is a value of one entity, it'll store it as EntityDiff with ID of that entity and (oldvalue,newvalue) tuple.
 +
 +
In case of an API, it'll usually go down to:
 +
 +
<pre>
 +
l10nPackageDiff = l10nPackage1.diffTo(l10nPackage2)
 +
 +
l10nPackage3.applyDiff(l10nPackageDiff)
 +
</pre>
 +
 +
but of course you will be able to manually operate on all structures by adding/removing/modifying the content of each object.
 +
 +
<small>Note: Currently, L10nObjectDiff is experimental, as it's a pretty complex structure. It's stored in playground module, but you can simply use EntityListDiff on L10nObjects if you want to store differences between entities, and don't need diff of structure of the file</small>
 +
 +
== I/O ==
 +
 +
Because we want to support multiply methods of accessing entities lists, we need to abstract the layer of Input-Output.
 +
In Silme, IOManager is a class that manages all IO classes called IOClients.
 +
Example three IOClients:
 +
<pre>
 +
ioClient = IOManager.get('file')
 +
l10nObject = ioClient.getL10nObject(path='./test/example.dtd')
 +
 +
ioClient = IOManager.get('svn')
 +
l10nObject = ioClient.getL10nObject(path='svn://svn.server.net/project/trunk/example.dtd')
 +
 +
ioClient = IOManager.get('mysql')
 +
entityList = ioClient.getEntityList(path='mysql://localhost:8908/l10ndb', table='l10nList')
 +
</pre>
 +
 +
Of course you can also use getEntityList on file and SVN, but you cannot get L10nObject from MySQL.
 +
 +
You can also go the "raw" way by asking ioClient to give you the source of the file, and then manually create L10nObject out of it. To parse the simple string into L10nObject or EntityList we use:
 +
 +
== FormatParsers ==
 +
 +
FormatParser is a class that can parse string into L10nObject or L10nObject to string. Example FormatParsers are DTD/GetText/Properties/XLIFF/L20n. They're managed by FPManager:
 +
 +
<pre>
 +
ioClient = IOManager.get('file')
 +
fp = FPManager.get('dtd')
 +
string = ioClient.getSource(path='./test/example.dtd')
 +
l10nObject = fp.buildL10nObject(string)
 +
 +
l10nObject.addEntity(Entity('id','value'), pos=('after','test.id'))
 +
 +
string = fp.dumpL10nObject(l10nObject)
 +
ioClient.writeToFile(string, path='./test/example2.dtd')
 +
</pre>
 +
 +
=== Object and Diff parsers ===
 +
 +
FormatParsers have two types - for objects and for diffs.
 +
It means that when you're loading fp from FPManager you can select if you want a formatparser for Diff class or an Object class.
 +
You can for example dump L10nObject to DTD, Properties, GetText or simply Text, or you can dump L10nObjectDiff to XML, Text or CSV.
 +
 +
== Summary ==
 +
 +
That's all for now. This article explained the basic concepts behind the library and I hope you'll find the library useful enough to experiment with writing apps on top of it and/or working with the library itself.
 +
 +
Now, how to set up an [[Silme:Tutorial:Setting up environment|environment]].

Revision as of 16:17, 27 July 2008

Silme is built around several abstract concepts that allow the library to support any possible localization format, from DTD, GetText or XLIFF, to MySQL and SQLite, from JAR and normal directory to SVN, CVS or any other Revision Control System. In this section I will explain the basic concepts that will allow you to understand the architecture of the library. Of course it is just an introduction, Silme allows you to extend each class with your data, but here we will focus on the simplest cases.

Objects

Entity

Silme's most core and atom unit is Entity. Entity is a class that stores single pair of ID<-->VALUE in an abstract model. It is a representation of DTD's

<!ENTITY ID "VALUE">

, Gettext's

msgid "ID"\nmsgstr "VALUE"

, MySQL's ID column and VALUE column in L10n table etc., etc...

It's very important to understand that you can serialize any localization list to use Entity as long as you can generate unique ID across one list and assign it a value.

EntityList

Group of Entity objects is stored as a EntityList object. EntityList is a list (in fact, a dict structure in Python) that stores list of Entities and nothing more. The easiest way to imagine it is a localization SQL table containing two columns - ID and VALUE. The single row is Entity, the whole table is EntityList.

L10nObject

Above that, in some abstract sense, there is L10nObject class. L10nObject extends EntityList and is a representation of any L10n file. So beside of list of Entity objects it also contains Comment objects and normal Strings between them. It's easiest to imagine it as a full representation of simple DTD file:


<!ENTITY myapp.title "MyApp Title">
<!--
Not used anymore
<!ENTITY title.old "Some Title">
-->
<!ENTITY notify.msg "Please, click OK to continue">
<!ENTITY notify.btn "OK">

will look like this:

String('\n')
Entity(id:'myapp.title',value:'MyApp Title')
String('\n')
Comment(
  String('\nNot used anymore\n')
  Entity(id:'title.old', value:'Some Title')
)
String('\n')
Entity(id:'notify.msg',value:'Please, click OK to continue')
String('\n')
Entity(id:'notify.btn',value:'OK')
String('\n\n')

L10nObject is more like a file, EntityList like a SQL table. You can get EntityList out of L10nObject or you can get EntityList out of a file directly if you don't want to use the other elements of the structure.

L10nObject stores whole content of the file and should always represent the full file, which means that dumping this structure back to the same format will produce identical file as a source one. In the middle you can operate, move, remove, add strings, comments and entities.

Object

Beside of L10nObject we have similar structure called Object. Object is used to store data about files that we cannot parse. If, for example, your application will be prepared to parse DTD/PO/Properties and will get HTML file or JPEG it will store it as an Object. Object has an ID and source properties. Not very useful but will allow us to build a full structure above it:

L10nPackage

L10nPackage is a representation of list of L10nObjects/Object/EntityLists and potentially other L10nPackages. In the file system world, the nearest similar thing is a directory. Directory can store DTD files, JPEG files, and other directories. Another similar structure is MySQL database which stores tables (EntityLists in our case).

Summary

That's all. Currently the scope of the library is to present all potential localization structures using those classes and build an API to operate on them easily.

Diff module

Each and every of the objects - Entity, EntityList, L10nObject, Object, L10nPackage has it's mirror class in the Diff land. So in result we have EntityDiff, EntityListDiff, L10nObjectDiff, ObjectDiff, L10nPackageDiff. Diff module allows you to store a difference between two objects of the same type and apply it later. It's like a diff tool in Linux, beside that it is aware of the syntax of the files/structures and stores the diff in appropriate way. For example if a diff between two EntityLists is a value of one entity, it'll store it as EntityDiff with ID of that entity and (oldvalue,newvalue) tuple.

In case of an API, it'll usually go down to:

l10nPackageDiff = l10nPackage1.diffTo(l10nPackage2)

l10nPackage3.applyDiff(l10nPackageDiff)

but of course you will be able to manually operate on all structures by adding/removing/modifying the content of each object.

Note: Currently, L10nObjectDiff is experimental, as it's a pretty complex structure. It's stored in playground module, but you can simply use EntityListDiff on L10nObjects if you want to store differences between entities, and don't need diff of structure of the file

I/O

Because we want to support multiply methods of accessing entities lists, we need to abstract the layer of Input-Output. In Silme, IOManager is a class that manages all IO classes called IOClients. Example three IOClients:

ioClient = IOManager.get('file')
l10nObject = ioClient.getL10nObject(path='./test/example.dtd')

ioClient = IOManager.get('svn')
l10nObject = ioClient.getL10nObject(path='svn://svn.server.net/project/trunk/example.dtd')

ioClient = IOManager.get('mysql')
entityList = ioClient.getEntityList(path='mysql://localhost:8908/l10ndb', table='l10nList')

Of course you can also use getEntityList on file and SVN, but you cannot get L10nObject from MySQL.

You can also go the "raw" way by asking ioClient to give you the source of the file, and then manually create L10nObject out of it. To parse the simple string into L10nObject or EntityList we use:

FormatParsers

FormatParser is a class that can parse string into L10nObject or L10nObject to string. Example FormatParsers are DTD/GetText/Properties/XLIFF/L20n. They're managed by FPManager:

ioClient = IOManager.get('file')
fp = FPManager.get('dtd')
string = ioClient.getSource(path='./test/example.dtd')
l10nObject = fp.buildL10nObject(string)

l10nObject.addEntity(Entity('id','value'), pos=('after','test.id'))

string = fp.dumpL10nObject(l10nObject)
ioClient.writeToFile(string, path='./test/example2.dtd')

Object and Diff parsers

FormatParsers have two types - for objects and for diffs. It means that when you're loading fp from FPManager you can select if you want a formatparser for Diff class or an Object class. You can for example dump L10nObject to DTD, Properties, GetText or simply Text, or you can dump L10nObjectDiff to XML, Text or CSV.

Summary

That's all for now. This article explained the basic concepts behind the library and I hope you'll find the library useful enough to experiment with writing apps on top of it and/or working with the library itself.

Now, how to set up an environment.