Return-Path: megacz@cs.berkeley.edu Received: from 216.237.119.187 (GODEL.MEGACZ.COM) by null (org.ibex.mail.protocol.SMTP) with ESMTP for ; Tue, 26 Sep 2006 14:39:44 -0700 Received: from 127.0.0.1 (GODEL.MEGACZ.COM) by null (org.ibex.mail.protocol.SMTP) with SMTP for ; Tue, 26 Sep 2006 14:39:42 -0700 Received: by godel.megacz.com (sSMTP sendmail emulation); Tue, 26 Sep 2006 14:39:42 -0700 To: Simon Hay Cc: sbp-interest@research.cs.berkeley.edu Subject: [sbp-interest] Re: Quick reference? References: <99F37304-CFDF-4FBB-ABA2-25CFC079A685@lincoln.ox.ac.uk> From: Adam Megacz Organization: UC Berkeley X-Home-Page: http://www.megacz.com/ Date: Tue, 26 Sep 2006 14:39:42 -0700 In-Reply-To: <99F37304-CFDF-4FBB-ABA2-25CFC079A685@lincoln.ox.ac.uk> (Simon Hay's message of "Fri, 22 Sep 2006 08:50:17 +0100") Message-ID: User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Envelope-To: sbp-interest@research.cs.berkeley.edu List-Id: The Scannerless Boolean Parser Simon Hay writes: > Is there a cheat-sheet/quick reference guide type thing for SBP's > metagrammar? Unfortunately no, although I really should write one. The metagrammar (tests/meta.g) is a pretty concise reference of all the operators, although it doesn't say what they do. Part of my reason for stalling is that I don't yet consider meta.g to be official/frozen yet (whereas the programmatic API is officially frozen). However, I don't expect meta.g to change much. See the end of this message for further explanation. > From looking through your examples, it seems like you can use / to > intersperse things - i.e. /ws means 'with any amount of whitespace > in between all these previous things', Yes, although "ws" needs to be defined as a production, ie ws = [\r\n ]** > but when I try to actually use it like that it doesn't work. Hrm, in what way does it not work? The semantics of the "/" operator are this: it binds more weakly than almost any other operator except "|", so it acts on the largest possible subexpression to its left. It then inserts the "spacer" production between each of the subexpression productions. So a b c /x Means a x b x c However, note that (a b c) /x Is just (a b c) ... because the "/" operator doesn't do anything to single-element subexpressions, and parenthesis create a single element out of a sequence of elements. Also, note that "*/" is a single operator, as is "**/", "++/", and "+/". These operators act like the repetition operator, but with a spacer "factored in" to the recursive production that is generated for repetition. So, for example (A=a+) expands to something like (A=B B= a | a B) and (A=a+/x) expands to something like (A=B B= a | a x B) Although these expansions aren't precisely correct because the "virtual production" that the repetition operator creates ("B" in the examples above) special magical abilities that let it hoist its output into a single parent with N children, rather than a parse tree of depth N where every node has only a single left child (which is what you would get with the "literal" expansion). The "special magical abilities" mentioned in the preceding paragraph are, in my opinion, a major "wart" on how the metagrammar works, and one of the reasons why I'm not yet ready to freeze it the way I have frozen all APIs in the edu.berkeley.sbp.* root package. Thanks again for your questions and comments! - a -- PGP/GPG: 5C9F F366 C9CF 2145 E770 B1B8 EFB1 462D A146 C380