Archived CMSimple Support Forum

The Old CMSimple User Community
It is currently Thu Sep 02, 2010 4:38 pm

This archived CMSimple Support Forum will be locked primo June 2008. Users with a commercial licence are advised to register and use the new Official Support Forum at CMSimple.com instead. A community driven forum with free registration is found at cmsimpleforum.com.

All times are UTC




Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 12 posts ] 
Author Message
 Post subject: FIX: On printing human readable URLs on different charsets
PostPosted: Thu Jan 19, 2006 3:13 pm 
Offline

Joined: Thu Jan 19, 2006 2:45 pm
Posts: 2
Hello all...
I've recently downloaded cmsimple for a site and today decided to look at it and do something. I'm from Greece and wanted to create pages in Greek language.
BTW, cmsimple looks cool doing what I was looking for. Ability to move already created menu items (up and down) would be nice, but I suppose it is already implemented and I haven't find out how to do it or it is planned and maybe already implemented in a newer still development version.

I started using cmsimple and found 2 important problems while using Greek language.
- First of all all greek characters entered in the editor was escaped to their numeric escape sequence. 'View source' in every page was problematic, since I wanted pages to display just Greek (not escaped entities). To fix this the only thing I had to do was to edit language in config.php and set it to "gr" (Greek). After changing this all new content was not escaped in the source of the pages and interface was in Greek. I wanted to use the English interface without escaping Greek characters entered in the editor, but that is ok, it is my mother language after all and now page source displays correct Greek. Nice that a Greek translation already exists.
- The second one was that URLs created by cmsimple were unreadable. After entering Greek text in headings that become different web pages, this resulted in url escaping those charactes and this resulted in practically unreadable URLs (not friendly to the users at all). I didn't want this so I prepared a fix and I share it with you in case you also have the same problem. This can be customized in other languages, too.
I've changed the source code of cms.php as follows:
Code:
function uenc($s){$s = dirify($s); return str_replace('+','_',urlencode($s));}

function dirify($s) {
     #$s = convert_high_ascii($s);  ## convert high-ASCII chars to 7bit.
     $a = $s;
       $s = remove_accents($s);
     $s = strtolower($s);           ## lower-case.
     $s = strip_tags($s);       ## remove HTML tags.
     $s = preg_replace('!&[^;\s]+;!','',$s);         ## remove HTML entities.
     $s = preg_replace('![^\w\s]!','',$s);           ## remove non-word/space chars.
     $s = preg_replace('!\s+!','_',$s);               ## change space chars to underscores.
     return "$s";   
}

function remove_accents($str)
{
   return strtr($str, "αβγδεζηθικλμνξοπρσςτυφχψωάέήίϊΐόύϋΰώΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ¶ΈΉΊΪΌΎΫΏ", "abgdezhqiklmnjoprsstyfxcwaehiiiouuuwABGDEZHQIKLMNJOPRSTYFXCWAEHIIOUUW");
}


What it does is that it simple trasliterates greek chars to latin. So ( a Greek word follows): "Καλημέρα" is converted to "kalhmera".

Now in the function remove_accents() I've setup a transliteration of Greek to Latin characters. If you are using other languages (eg. Russian) you just have to change this mapping to fit your needs. The first long string in remove_accents strstr call is Greek characters that might be found and the second long string is their mapping to latin characters (1 to 1 relation).

I've tested this and seems fine. Everything seems to be working as I wanted.
A question: Do you think that I have to change something else except the above? It is my first contact with cmsimple and the code is not exactly developer friendly, so I don't know if I'm missing something and I don't want to break anything (maybe a call to dirify() must be included somewhere else, too?)

Regards,
bender


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jan 20, 2006 9:45 am 
Offline
Site Admin

Joined: Mon May 12, 2003 12:36 pm
Posts: 3091
Location: Rutsker, Bornholm, Denmark
Really cool!

You could simply modify function uenc($s)

Some of your preg_replace's are not needed - ie. the change space chars should already be done by urlencode - also, I don't understand why you strtolower? Is remove_accents needed?

I am sure that a russian remove_accents will be more than welcome by many russian users.

Maybe I should add the possibility to insert an $tx['urlchars']['before'] and $tx['urlchars']['after'] in the language file - and then insert the strtr function in uenc ...


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jan 20, 2006 11:43 am 
Offline

Joined: Thu Jan 19, 2006 2:45 pm
Posts: 2
:)

- strtolower() call is there, because personally I prefer lowercase URLs. This way you can add uppercase headers (eg. "DISCLAIMER") but the URL that will be generated will be lowercase (http://...../...disclaimer). I think all uppercase URLs are not nice and not something common. This can be added as an option in config.php.
- remove_accents() function is where the main lob is being done and non latin characters are converted to latin but kept readable. One can combine dirify() and remove_accents() in one function call of course. It would be nice to add it in the language file as as options you say. I would say ['urlchars']['native'] and ['urlchars']['latin'] to describe better its purpose.
- I didn't look what urlencode() does. Some things in dirify() might be redundant, but nevertheless they are not harmful until a final version from you is available.

Regards...


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jan 23, 2006 10:13 pm 
Offline
Site Admin

Joined: Mon May 12, 2003 12:36 pm
Posts: 3091
Location: Rutsker, Bornholm, Denmark
OK - I've added this to http://www.cmsimple.dk/?Downloads:Future_development - but maybe it will be less priority than version 3


Top
 Profile  
 
 Post subject:
PostPosted: Tue Jan 24, 2006 8:38 pm 
Offline

Joined: Tue Jul 19, 2005 7:59 pm
Posts: 625
Location: Behind your ...
see here http://www.cmsimple.dk/forum/viewtopic. ... 5849#15849


Top
 Profile  
 
 Post subject:
PostPosted: Tue Jan 24, 2006 8:49 pm 
Offline
Site Admin

Joined: Mon May 12, 2003 12:36 pm
Posts: 3091
Location: Rutsker, Bornholm, Denmark
Thanx alot - I have overseen that posting.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jan 26, 2006 5:55 pm 
Offline

Joined: Fri Aug 05, 2005 10:18 am
Posts: 103
Location: Old Europe
Hi!

The described solution only substitutes ONE char with another ONE. This may work for greek language. But in german language there is a common way to substitute special chars with TWO regular chars

Ä = AE
Ö = OE
Ü = UE
ß = SS

"Schöne Grüße" (Sch%26ouml%3Bne_Gr%26uuml%3B%26szlig%3Be) should become "Schoene Gruesse" and not "Schone Gruse".

...just a consideration to get a general solution for all languages :-)

Greets Flo


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jan 26, 2006 11:49 pm 
Offline

Joined: Tue Jul 19, 2005 7:59 pm
Posts: 625
Location: Behind your ...
Example answer : you can modify the function to handle convert 1 to 2 bytes chars, but in fact better is you use one byte code: type oe directly, not ö

And look here http://www.cs.tut.fi/~jkorpela/chars.ht ... onal-ascii for ideas.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jan 27, 2006 9:52 am 
Offline

Joined: Fri Aug 05, 2005 10:18 am
Posts: 103
Location: Old Europe
It's very bad style to write "oe" instead of "ö" within a text/heading. What do you think about this way to have human readable URLs and correct orthography. And you can add any other conversion-rules ("/" -> "-", ":-)" -> " " ...) I think it's a smart solution to handle both - 1 byte and 2 byte (or more) conversion :-)

1) add this two lines to language file:

(for example de.php):

Code:
$tx['urlchars']['before']="ß,ä,ö,ü,Ä,Ö,Ü,/";
$tx['urlchars']['after']="ss,ae,oe,ue,Ae,Oe,Ue,-";


(or for example gr.php):

Code:
$tx['urlchars']['before']="α,β,γ,δ,ε,ζ,η,θ,ι,κ,λ,μ,ν,ξ,ο,π,ρ,σ,ς,τ,υ,φ,χ,ψ,ω,ά,έ,ή,ί,ϊ,ΐ,ό,ύ,ϋ,ΰ,ώ,
Α,Β,Γ,Δ,Ε,Ζ,Η,Θ,Ι,Κ,Λ,Μ,Ν,Ξ,Ο,Π,Ρ,Σ,Τ,Υ,Φ,Χ,Ψ,Ω,¶,Έ,Ή,Ί,Ϊ,Ό,Ύ,Ϋ,Ώ";

$tx['urlchars']['after']="a,b,g,d,e,z,h,q,i,k,l,m,n,j,o,p,r,s,s,t,y,f,x,c,w,a,e,
h,i,i,i,o,u,u,u,w,A,B,G,D,E,Z,H,Q,I,K,L,M,N,J,O,P,R,S,T,Y,F,X,C,W,A,E,H,I,I,O,U,U,W";


2) add this function to cms.php:

Code:
function iso2ascii($text)
{
global $tx;

$array1 = explode(",",$tx['urlchars']['before']);
$array2 = explode(",",$tx['urlchars']['after']);

// Check the array and return original URL if conversion-array is wrong. (Incorrect edited by user)

if (count($array1)<>count($array2))
   {return $text;}

// Change chars

$text = str_replace($array1,$array2,$text);

// lowercase (optional)

$text = strtolower($text);

return $text;
}



3) find and modify function uenc($s) in cms.php
from:
Code:
function uenc($s){return str_replace('+','_',urlencode($s));}


to:
Code:
function uenc($s){return str_replace('+','_',urlencode(iso2ascii($s)));}



But there is one problem: Changing the URLs of an existing webproject will break all indexed links (on other websites and searchengines)!!!

Greets Flo


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jan 27, 2006 10:24 pm 
Offline

Joined: Tue Jul 19, 2005 7:59 pm
Posts: 625
Location: Behind your ...
every good programmer know the url rewriting with 301 8)
then it points to new url which get indexed in place of the old url.

in fact i don't know good programmers wich use non-iso chars in url :lol:

:arrow: floho i encourage you to continue your intelligent work !


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jan 27, 2006 10:52 pm 
Offline

Joined: Fri Aug 05, 2005 10:18 am
Posts: 103
Location: Old Europe
You're right, a 301 redirect is a solution.

I tested the script and it works fine... do you have any suggestion for improvement?


Top
 Profile  
 
 Post subject:
PostPosted: Mon Mar 06, 2006 9:12 pm 
Offline
Site Admin

Joined: Mon May 12, 2003 12:36 pm
Posts: 3091
Location: Rutsker, Bornholm, Denmark
Pls see http://www.cmsimple.dk/forum/viewtopic.php?t=3672


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 12 posts ] 

All times are UTC


Who is online

Users browsing this forum: MSN [Bot] and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group