Question: regex to exclude a sentence which contains a specific word in java

Question

regex to exclude a sentence which contains a specific word in java

Answers 3
Added at 2017-01-04 06:01
Tags
Question

I am reading a file which contains lots of information like shown below:

    type dw_3 from u_dw within w_pg6p0012_01
    boolean visible = false
    integer x = 1797
    integer y = 388
    integer width = 887
    integer height = 112
    integer taborder = 0
    boolean bringtotop = true
    string dataobject = "d_pg6p0012_14"
    end type

    type dw_3 from u_dw within w_pg6p0012_01
    integer x = 1797
    integer y = 388
    integer width = 887
    integer height = 112
    integer taborder = 0
    boolean bringtotop = true
    string dataobject = "d_pg6p0012_14"
    end type

I made regex :(?i)type dw_\d\s+(.*?)\s+within(.*?)\s+(?!boolean visible = false)(.*) I want to extract all the strings which do not contain "boolean visible = false" but mine one is returning all. I also tried many similar posts on stack but the result is similar to mine, please suggest a way.

solution :(?i)type\\s+dw_(\\d+|\\w+)\\s+from\\s+.*?within\\s+.*?\\s+(string|integer)?\\s+.*\\s+.*\\s+.*\\s+.*?\\s+.*?\\s+.*?\\s*string\\s+dataobject\\s+=\\s+(.*?)\\s+end\\s+type")

This is working well on regex checker but when i tried it on java it keep on running without giving any output

Answers to

regex to exclude a sentence which contains a specific word in java

nr: #1 dodano: 2017-01-04 06:01

You can use this RegEx

(\s*boolean visible = false)|(.*)

DEMO

This basically defines 2 capture groups

  1. First capture group (\s*boolean visible = false) will catch boolean visible = false.

  2. Second Capture group (.*) will capture everything else except all that's capture by first capture group.

Now when you're extracting it, just capture second group and ignore first one.


Edit

Here's an example for clarification:

In this example,

  • getOriginalFileContents() method gets the content of the file as shown in the program.
  • Notice how we're getting both the groups, but ignoring the first group and printing only the second one.

See the output, which is without that line boolean visible = false.

Output

 type dw_3 from u_dw within w_pg6p0012_01
 integer x = 1797
 integer y = 388
 integer width = 887
 integer height = 112
 integer taborder = 0
 boolean bringtotop = true
 string dataobject = "d_pg6p0012_14"
 end type


 type dw_3 from u_dw within w_pg6p0012_01
 integer x = 1797
 integer y = 388
 integer width = 887
 integer height = 112
 integer taborder = 0
 boolean bringtotop = true
 string dataobject = "d_pg6p0012_14"
 end type

Java Implementation

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTut3 {

    public static void main(String args[]) {
        String file = getOriginalFileContents();
        Pattern pattern = Pattern.compile("(\\s*boolean visible = false)|(.*)");
        Matcher matcher = pattern.matcher(file);
        while (matcher.find()) {
            //System.out.print(matcher.group(1)); //ignore this group
            if (matcher.group(2) != null) System.out.println(matcher.group(2));
        }
    }

    //this method just get's the file contents as displayed in the
    //question. 
    private static String getOriginalFileContents() {
        String s = "     type dw_3 from u_dw within w_pg6p0012_01\n" +
            "     boolean visible = false\n" +
            "     integer x = 1797\n" +
            "     integer y = 388\n" +
            "     integer width = 887\n" +
            "     integer height = 112\n" +
            "     integer taborder = 0\n" +
            "     boolean bringtotop = true\n" +
            "     string dataobject = \"d_pg6p0012_14\"\n" +
            "     end type\n" +
            "     \n" +
            "     type dw_3 from u_dw within w_pg6p0012_01\n" +
            "     integer x = 1797\n" +
            "     integer y = 388\n" +
            "     integer width = 887\n" +
            "     integer height = 112\n" +
            "     integer taborder = 0\n" +
            "     boolean bringtotop = true\n" +
            "     string dataobject = \"d_pg6p0012_14\"\n" +
            "     end type";

        return s;
    }
}
nr: #2 dodano: 2017-01-04 06:01

It will be much easier (and more readable) if you make a regex to match "boolean visible = false" and then exclude those lines that contain a match for it.

Pattern pattern = Pattern.compile("boolean visible = false");

Files.lines(filepath)
     .filter(line -> !pattern.matcher(line).find())  // note the "!"
     .forEach(/* do stuff */);

Notes:

  • Because we are using Files#lines(String), it is not necessary to break apart separate lines in the regex. This is already done for us.
  • The Matcher#find() method returns whether the given character sequence contains a match for the regex anywhere in it. I believe this is what you want.

EDIT:

Now, if you are just really intent on using a pure regex, then try this:

^((?!boolean visible = false).)+$

This will match an entire (non-empty) line if-and-only-if it does not contain "boolean visible = false" anywhere within it. No fancy backreferences / capture group semantics needed to extract the desired text.

See proof by unit tests here: https://regex101.com/r/dbzdMB/1


EDIT #2:

Alternatively, if all you are trying to do is to get the file text without any "boolean visible = false", then you could simply replace every instance of that target string with the empty string.

Pattern pattern = Pattern.compile("boolean visible = false");
Matcher matcher = pattern.matcher(fileAsCharSequence);  // e.g. StringBuilder
String output = matcher.replaceAll("");
nr: #3 dodano: 2017-01-04 07:01
type dw_\d\s+(.*?)\s+within(.*)\n(?!\s*boolean visible = false\s*)[\s\S]*?\s+end type

Try this.See demo.

https://regex101.com/r/Heex8W/1

Source Show
◀ Wstecz